Sling Academy
Home/PyTorch/Creating Your First Dataset with Linear Regression in PyTorch

Creating Your First Dataset with Linear Regression in PyTorch

Last updated: December 14, 2024

Creating a dataset and implementing linear regression in PyTorch can seem daunting if you're new to the library or to deep learning concepts. This article will guide you through creating your first dataset and applying linear regression on it using PyTorch.

Understanding Linear Regression

Linear regression is a method to model the relationship between a scalar response and one or more explanatory variables (or features). The goal is to find the linear function that best fits the given data points. The simplest form is a linear equation:

y = mx + c

where m is the slope and c is the y-intercept.

Setting Up Your PyTorch Environment

To get started, make sure you have Python and PyTorch installed. If not, you can follow the official installation guide.

Creating the Dataset

Let's create a synthetic dataset for our linear regression model. We'll use NumPy to generate this dataset.

import numpy as np
import torch

def create_dataset(num_samples=100):
    X = np.linspace(0, 100, num_samples)
    m = 2  # slope
    c = 3  # intercept
    Y = m * X + c + np.random.randn(num_samples) * 10  # Adding noise
    return X, Y

X, Y = create_dataset()

The function above generates a simple linear function Y = 2X + 3 with some noise.

Building the Linear Regression Model

Now that we have data, let's build a simple linear regression model using PyTorch. We will use PyTorch's built-in capabilities to define our model:

import torch.nn as nn

class LinearRegressionModel(nn.Module):
    def __init__(self):
        super(LinearRegressionModel, self).__init__()
        self.linear = nn.Linear(1, 1)  # 1 input and 1 output
    
    def forward(self, x):
        return self.linear(x)

model = LinearRegressionModel()

This simple neural network consists of a single layer that implements the linear transformation.

Training the Model

Let's move on to training the model. We will use a loss function and an optimizer to adjust the model's parameters.

# Convert data to tensors
tensor_x = torch.tensor(X, dtype=torch.float32).view(-1, 1)
tensor_y = torch.tensor(Y, dtype=torch.float32).view(-1, 1)

# Loss and optimizer
criterion = nn.MSELoss()
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)

# Training loop
num_epochs = 1000
for epoch in range(num_epochs):
    model.train()
    # Forward pass
    outputs = model(tensor_x)
    loss = criterion(outputs, tensor_y)
    
    # Backward pass and optimization
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    
    if (epoch+1) % 100 == 0:
        print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.4f}')

In this code, we run the training loop for a defined number of epochs (1000 in this case). The optimizer adjusts the weights to reduce the loss, computed by MSELoss, a common loss function for regression tasks.

Testing the Model

After training, it's a good idea to validate and test the model's predictions against unseen or test data to assess how well the model has learned.

# Evaluation
model.eval()
with torch.no_grad():
    predicted = model(tensor_x).detach().numpy()

You can visualize the result using a plotting library like Matplotlib to see how well the line fits your dataset:

import matplotlib.pyplot as plt

plt.scatter(X, Y, label='Original data')
plt.plot(X, predicted, label='Fitted line', color='red')
plt.xlabel('X (Input Feature)')
plt.ylabel('Y (Target Variable)')
plt.legend()
plt.show()

This plot will help you visually determine the model's accuracy in predicting outputs in relationship to input data. By following the steps in this tutorial, you should have a working linear regression model with PyTorch that accurately represents the underlying data.

Next Article: How to Split Your Dataset into Training and Test Sets in PyTorch

Previous Article: Your First Steps into the World of PyTorch

Series: The First Steps with PyTorch

PyTorch

You May Also Like

  • Addressing "UserWarning: floor_divide is deprecated, and will be removed in a future version" in PyTorch Tensor Arithmetic
  • In-Depth: Convolutional Neural Networks (CNNs) for PyTorch Image Classification
  • Implementing Ensemble Classification Methods with PyTorch
  • Using Quantization-Aware Training in PyTorch to Achieve Efficient Deployment
  • Accelerating Cloud Deployments by Exporting PyTorch Models to ONNX
  • Automated Model Compression in PyTorch with Distiller Framework
  • Transforming PyTorch Models into Edge-Optimized Formats using TVM
  • Deploying PyTorch Models to AWS Lambda for Serverless Inference
  • Scaling Up Production Systems with PyTorch Distributed Model Serving
  • Applying Structured Pruning Techniques in PyTorch to Shrink Overparameterized Models
  • Integrating PyTorch with TensorRT for High-Performance Model Serving
  • Leveraging Neural Architecture Search and PyTorch for Compact Model Design
  • Building End-to-End Model Deployment Pipelines with PyTorch and Docker
  • Implementing Mixed Precision Training in PyTorch to Reduce Memory Footprint
  • Converting PyTorch Models to TorchScript for Production Environments
  • Deploying PyTorch Models to iOS and Android for Real-Time Applications
  • Combining Pruning and Quantization in PyTorch for Extreme Model Compression
  • Using PyTorch’s Dynamic Quantization to Speed Up Transformer Inference
  • Applying Post-Training Quantization in PyTorch for Edge Device Efficiency