Creating Your First Dataset with Linear Regression in PyTorch

Creating a dataset and implementing linear regression in PyTorch can seem daunting if you're new to the library or to deep learning concepts. This article will guide you through creating your first dataset and applying linear regression on it using PyTorch.

Understanding Linear Regression
Setting Up Your PyTorch Environment
Creating the Dataset
Building the Linear Regression Model
Training the Model
Testing the Model

Understanding Linear Regression

Linear regression is a method to model the relationship between a scalar response and one or more explanatory variables (or features). The goal is to find the linear function that best fits the given data points. The simplest form is a linear equation:

y = mx + c

where m is the slope and c is the y-intercept.

Setting Up Your PyTorch Environment

To get started, make sure you have Python and PyTorch installed. If not, you can follow the official installation guide.

Creating the Dataset

Let's create a synthetic dataset for our linear regression model. We'll use NumPy to generate this dataset.

import numpy as np
import torch

def create_dataset(num_samples=100):
    X = np.linspace(0, 100, num_samples)
    m = 2  # slope
    c = 3  # intercept
    Y = m * X + c + np.random.randn(num_samples) * 10  # Adding noise
    return X, Y

X, Y = create_dataset()

The function above generates a simple linear function Y = 2X + 3 with some noise.

Building the Linear Regression Model

Now that we have data, let's build a simple linear regression model using PyTorch. We will use PyTorch's built-in capabilities to define our model:

import torch.nn as nn

class LinearRegressionModel(nn.Module):
    def __init__(self):
        super(LinearRegressionModel, self).__init__()
        self.linear = nn.Linear(1, 1)  # 1 input and 1 output
    
    def forward(self, x):
        return self.linear(x)

model = LinearRegressionModel()

This simple neural network consists of a single layer that implements the linear transformation.

Training the Model

Let's move on to training the model. We will use a loss function and an optimizer to adjust the model's parameters.

# Convert data to tensors
tensor_x = torch.tensor(X, dtype=torch.float32).view(-1, 1)
tensor_y = torch.tensor(Y, dtype=torch.float32).view(-1, 1)

# Loss and optimizer
criterion = nn.MSELoss()
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)

# Training loop
num_epochs = 1000
for epoch in range(num_epochs):
    model.train()
    # Forward pass
    outputs = model(tensor_x)
    loss = criterion(outputs, tensor_y)
    
    # Backward pass and optimization
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    
    if (epoch+1) % 100 == 0:
        print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.4f}')

In this code, we run the training loop for a defined number of epochs (1000 in this case). The optimizer adjusts the weights to reduce the loss, computed by MSELoss, a common loss function for regression tasks.

Testing the Model

After training, it's a good idea to validate and test the model's predictions against unseen or test data to assess how well the model has learned.

# Evaluation
model.eval()
with torch.no_grad():
    predicted = model(tensor_x).detach().numpy()

You can visualize the result using a plotting library like Matplotlib to see how well the line fits your dataset:

import matplotlib.pyplot as plt

plt.scatter(X, Y, label='Original data')
plt.plot(X, predicted, label='Fitted line', color='red')
plt.xlabel('X (Input Feature)')
plt.ylabel('Y (Target Variable)')
plt.legend()
plt.show()

This plot will help you visually determine the model's accuracy in predicting outputs in relationship to input data. By following the steps in this tutorial, you should have a working linear regression model with PyTorch that accurately represents the underlying data.

Next Article: How to Split Your Dataset into Training and Test Sets in PyTorch

Previous Article: Your First Steps into the World of PyTorch

Series: The First Steps with PyTorch

PyTorch