Graph classification is a rapidly evolving area in machine learning, especially with the rise of graph convolutional networks (GCNs). PyTorch Geometric, a library built on PyTorch that specializes in graph neural networks, makes developing graph classification models more accessible and efficient.
Understanding Graph Classification
Graph classification involves assigning a label to a graph from a set of predefined categories. This task is crucial in several domains, like molecular analysis, social network classification, or recommendation systems.
Setting Up Your Environment
To begin developing with PyTorch Geometric, ensure your environment is set up correctly. We recommend using a virtual environment, such as venv or conda, to manage dependencies:
# Install PyTorch with CUDA support if you have a compatible GPU
pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu113
# Install torch-geometric
pip install torch-geometricData Preparation
PyTorch Geometric includes datasets for benchmarking GCNs. Let's use the IMDB-Binary dataset in this example:
from torch_geometric.datasets import TUDataset
# Load the IMDB-Binary dataset
dataset = TUDataset(root='.', name='IMDB-BINARY')
print(f'Dataset: {dataset}
Number of graphs: {len(dataset)}
Number of classes: {dataset.num_classes}')This dataset consists of graphs that represent movie collaboration networks categorized into two classes.
Model Definition
Now, let's define a simple graph neural network using PyTorch Geometric:
import torch
import torch.nn.functional as F
from torch.nn import Linear
from torch_geometric.nn import GCNConv
class GCNModel(torch.nn.Module):
def __init__(self, num_node_features, num_classes):
super(GCNModel, self).__init__()
self.conv1 = GCNConv(num_node_features, 16)
self.conv2 = GCNConv(16, num_classes)
def forward(self, x, edge_index):
x = self.conv1(x, edge_index)
x = F.relu(x)
x = self.conv2(x, edge_index)
return F.log_softmax(x, dim=1)This model consists of two GCN layers and outputs class probabilities for each node, and through appropriate pooling, it can be adapted for graph-level outputs.
Training the Model
Training our model involves standard steps of defining an optimizer, criterion, and optimizing over our data:
from torch_geometric.loader import DataLoader
def train():
model.train()
optimizer.zero_grad()
loss = 0
for data in train_loader: # iterate over the batches
out = model(data.x, data.edge_index)
loss = criterion(out, data.y)
loss.backward()
optimizer.step()
return lossHere, we're iterating over the training data and updating our graph model's weights in each epoch.
Evaluating the Model
After training, assessing the model's performance on unseen data is crucial:
def test(loader):
model.eval()
correct = 0
for data in loader: # iterate over the test batches
out = model(data.x, data.edge_index)
pred = out.argmax(dim=1)
correct += int((pred == data.y).sum())
return correct / len(loader.dataset)The function iterates over test batches, compares the predicted to actual labels, and computes accuracy.
Bringing It All Together
Finally, set up your PyTorch DataLoader and run the training and evaluation functions:
train_loader = DataLoader(dataset, batch_size=32, shuffle=True)
test_loader = DataLoader(dataset, batch_size=32, shuffle=False)
# Instantiating the model
model = GCNModel(num_node_features=dataset.num_node_features, num_classes=dataset.num_classes)
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)
criterion = torch.nn.CrossEntropyLoss()
# Training the model over multiple epochs
epochs = 50
for epoch in range(epochs):
train_loss = train()
test_acc = test(test_loader)
print(f'Epoch: {epoch+1:03d}, Loss: {train_loss:.4f}, Test Acc: {test_acc:.4f}')
This exhaustive setup allows you to experiment further with hyperparameters, network architectures, and more advanced pretrained models, pushing the envelope in graph-based machine learning using PyTorch Geometric.