From Dataset to Deployment: A Complete PyTorch Classification Pipeline

Building a machine learning model can be an exciting and rewarding journey, especially when you're bridging the gap from raw data to an operational deployment. In this article, we will explore a complete PyTorch-based pipeline to perform classification from a dataset to deployment. PyTorch is a dynamic and flexible deep learning framework, which gives you control at every stage of developing a deep learning model.

Preparing Your Dataset
Creating the Model
Training the Model
Evaluating the Model
Persisting and Deploying the Model

Preparing Your Dataset

Every successful machine learning project begins with a well-prepared dataset. Let’s assume you're working with a dataset of images, stored in respective class folders. For illustration, we'll use the CIFAR-10 dataset, which consists of 60,000 32x32 color images in 10 different classes. The first step is to load and preprocess your data:

from torchvision import datasets, transforms
from torch.utils.data import DataLoader

transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])

trainset = datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)
trainloader = DataLoader(trainset, batch_size=32, shuffle=True)

testset = datasets.CIFAR10(root='./data', train=False, download=True, transform=transform)
testloader = DataLoader(testset, batch_size=32, shuffle=False)

Creating the Model

A key PyTorch strength is its model definition flexibility. You typically define your model by subclassing the torch.nn.Module:

import torch.nn as nn
import torch.nn.functional as F

class SimpleCNN(nn.Module):
    def __init__(self):
        super(SimpleCNN, self).__init__()
        self.conv1 = nn.Conv2d(3, 16, 3, 1)
        self.conv2 = nn.Conv2d(16, 32, 3, 1)
        self.fc1 = nn.Linear(32 * 6 * 6, 128)
        self.fc2 = nn.Linear(128, 10)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        x = F.max_pool2d(x, 2)
        x = F.relu(self.conv2(x))
        x = F.max_pool2d(x, 2)
        x = x.view(-1, 32 * 6 * 6)
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        return F.log_softmax(x, dim=1)

Training the Model

To train your model, choose a loss function and an optimizer. You'll also implement a step to iterate through the data batches:

import torch.optim as optim

model = SimpleCNN()
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9)

def train(epoch):
    model.train()
    for batch_idx, (data, target) in enumerate(trainloader):
        optimizer.zero_grad()
        output = model(data)
        loss = criterion(output, target)
        loss.backward()
        optimizer.step()
        if batch_idx % 100 == 0:
            print(f'Train Epoch: {epoch} [{batch_idx * len(data)}/{len(trainloader.dataset)}]	Loss: {loss.item()}')

Evaluating the Model

Next, it’s crucial to evaluate how well your model performs on unseen data. The aim is to adjust weights and biases so that the model generalizes well:

def test():
    model.eval()
    test_loss = 0
    correct = 0
    with torch.no_grad():
        for data, target in testloader:
            output = model(data)
            test_loss += criterion(output, target).item()  # sum up batch loss
            pred = output.argmax(dim=1, keepdim=True)
            correct += pred.eq(target.view_as(pred)).sum().item()

    test_loss /= len(testloader.dataset)

    print(f'\nTest set: Average loss: {test_loss:.4f}, Accuracy: {correct}/{len(testloader.dataset)} ({100. * correct / len(testloader.dataset):.0f}%)\n')

Persisting and Deploying the Model

Finally, save the trained model for deployment and use.

torch.save(model.state_dict(), 'cnn_model.pth')

# To load the model elsewhere
model = SimpleCNN()
model.load_state_dict(torch.load('cnn_model.pth'))
model.eval()

To deploy this model, you can utilize a web framework like Flask or Django, convert it for mobile deployment using PyTorch Mobile, or even embed it into larger analytics pipelines using cloud services. By following these steps, you'll bridge theory and practice, transforming data into valuable insights through a powerful and flexible model deployment.

Next Article: Advanced Techniques for Improving PyTorch Classification Models

Previous Article: Mastering Multiclass Classification Using PyTorch and Neural Networks

Series: PyTorch Neural Network Classification

PyTorch