Building an Anomaly Detection Pipeline on Time-Series Data in PyTorch

Anomaly detection is a critical task in many industries, from monitoring network traffic for cybersecurity threats to detecting fraudulent transactions in finance. Time-series data, which consists of data points indexed in time order, is particularly pertinent for anomaly detection tasks because temporal patterns can highlight deviations that indicate unusual and potentially dangerous events. In this article, we will walk through building an anomaly detection pipeline specific to time-series data using PyTorch, a powerful and flexible deep learning framework.

Getting Started with PyTorch
Dataset Preparation
Designing the Model
Training the Model
Detecting Anomalies
Conclusion

Getting Started with PyTorch

First, ensure that PyTorch is installed in your Python environment. You can install it via pip:

pip install torch torchvision

Dataset Preparation

Before building our model, we need a time-series dataset. For demonstration purposes, let's generate synthetic time-series data. For a production pipeline, you would typically source this data from databases, sensors, or API calls.

import numpy as np
import matplotlib.pyplot as plt

# Generate synthetic time-series data
np.random.seed(42)
date_range = np.arange(0, 100, 0.1)
data = np.sin(date_range) + 0.1 * np.random.normal(size=date_range.size)

# Plotting the dataset
plt.figure(figsize=(10, 6))
plt.plot(date_range, data)
plt.title('Synthetic Time-Series Data')
plt.xlabel('Time')
plt.ylabel('Value')
plt.show()

Designing the Model

For detecting anomalies, we will use a simple neural network model called an Autoencoder. Autoencoders are designed to learn a compressed representation of input data, and when trained on normal data, reconstruct them well. Deviations at reconstruction typically indicate anomalies.

import torch
import torch.nn as nn

class AnomalyDetector(nn.Module):
    def __init__(self):
        super(AnomalyDetector, self).__init__()
        # Encoder
        self.encoder = nn.Sequential(
            nn.Linear(1, 64),
            nn.ReLU(),
            nn.Linear(64, 32),
            nn.ReLU(),
            nn.Linear(32, 16),
            nn.ReLU())
        # Decoder
        self.decoder = nn.Sequential(
            nn.Linear(16, 32),
            nn.ReLU(),
            nn.Linear(32, 64),
            nn.ReLU(),
            nn.Linear(64, 1))

    def forward(self, x):
        x = self.encoder(x)
        x = self.decoder(x)
        return x

Training the Model

We bind our model to normal data, allowing it to effectively learn and reconstruct these patterns but fail on anomalies. Let's proceed to train this network.

def train_model(model, data, num_epochs=100, learning_rate=1e-3):
    criterion = nn.MSELoss()
    optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
    data = torch.from_numpy(data).float().view(-1, 1)

    for epoch in range(num_epochs):
        optimizer.zero_grad()
        outputs = model(data)
        loss = criterion(outputs, data)
        loss.backward()
        optimizer.step()

        if (epoch + 1) % 10 == 0:
            print(f'Epoch [{epoch + 1}/{num_epochs}], Loss: {loss.item():.4f}')

    print('Training complete!')

# Instantiate and train the model
model = AnomalyDetector()
train_model(model, data)

Detecting Anomalies

Once the model is trained, we can detect anomalies. Typically, anomalies will show a higher reconstruction error than non-anomalous points. Thus, a threshold for the reconstruction loss can be set to flag these errors.

def detect_anomalies(model, data, threshold=0.05):
    model.eval()
    data = torch.from_numpy(data).float().view(-1, 1)
    reconstructed = model(data).detach().numpy().flatten()
    loss = np.mean((data.numpy().flatten() - reconstructed) ** 2)
    anomalies = loss > threshold
    return anomalies

anomalies_detected = detect_anomalies(model, data)
print('Anomalies:', np.where(anomalies_detected)[0])

Conclusion

Building an anomaly detection pipeline using PyTorch involves several key steps: data preparation, model design, training, and anomaly detection based on reconstruction error. The flexibility of PyTorch permits complex model customization, enabling tailored solutions for various kinds of time-series anomaly detection tasks. As you leverage this framework, keep experimenting with different model architectures and hyperparameters to enhance detection performance.

Next Article: Applying Self-Supervised Learning to Time-Series Representations with PyTorch

Previous Article: Implementing Multivariate Forecasting Using GRUs in PyTorch

Series: Time-Series and Forecasting in PyTorch

PyTorch