An Introduction to Automatic Differentiation with `torch.autograd.grad()` in PyTorch

Automatic differentiation is a key feature in modern machine learning frameworks as it allows for efficient computation of gradients. In deep learning, these gradients are essential for optimization algorithms. PyTorch, a popular deep learning framework, leverages automatic differentiation via its torch.autograd package. One of the most critical functions in this package is torch.autograd.grad() which computes and returns the gradients of specified tensors with respect to some inputs.

Understanding Automatic Differentiation
torch.autograd.grad(): The Basics
1. Syntax
Using torch.autograd.grad()
Advanced Example: Chain Rule and Higher-order Derivatives
Key Points

Understanding Automatic Differentiation

Automatic differentiation (AD) is a method used to numerically evaluate the derivative of a function specified by a computer program. Unlike symbolic differentiation, which seeks a formula for the derivative, or numerical differentiation, which approximates derivatives using finite differences, AD computes derivatives accurately and efficiently using a sequence of operations executed in a specific order.

`torch.autograd.grad()`: The Basics

The torch.autograd.grad() function in PyTorch provides support for computing gradients with respect to specified tensors. This function is particularly important for customization in models where default PyTorch backward propagation doesn't fit the target optimization problem.

Syntax

torch.autograd.grad(outputs, inputs, grad_outputs=None, retain_graph=None, create_graph=False, only_inputs=True, allow_unused=False)

Here's a breakdown of the primary parameters used in torch.autograd.grad():

outputs: The output tensors with respect to which gradients will be calculated.
inputs: The input tensors for which gradients are being computed. These must be part of the computational graph.
grad_outputs: This parameter is an optional external gradient to be applied on `outputs`.
retain_graph: When set to True, the computation graph used to compute the gradients will be retained, allowing for further operations.
create_graph: If True, a new computational graph is created, enabling higher-order derivatives to be computed.
allow_unused: If True, it returns None for input tensors unused in the computation; otherwise, it throws an error.

Using `torch.autograd.grad()`

At the heart of PyTorch’s dynamic computational graph is its ability to backpropagate gradients seamlessly. Here's an example of using torch.autograd.grad():

import torch

torch.manual_seed(0)
x = torch.tensor(2.0, requires_grad=True)
y = torch.tensor(3.0, requires_grad=True)

# A simple computation: z = x * y
z = x * y

# Compute the derivatives
grads = torch.autograd.grad(outputs=z, inputs=(x, y))
print(grads)

This code sets up a simple scalar multiplication and computes the gradient with respect to each variable. The result would be the derivative of z with respect to x and y, which are simply the values of y and x, respectively.

Advanced Example: Chain Rule and Higher-order Derivatives

PyTorch’s autograd can also handle more complex computations:

import torch

a = torch.tensor(1.0, requires_grad=True)
t = torch.tensor(2.0, requires_grad=True)

# Function definition
b = a + 2 * t 
c = a * t + t ** 2

d = b**2 + 3 * c

grad_a, grad_t = torch.autograd.grad(outputs=d, inputs=(a, t), create_graph=True)

# Computing higher-order derivatives
second_order_grad_a = torch.autograd.grad(grad_a, a)[0]
second_order_grad_t = torch.autograd.grad(grad_t, t)[0]
print(second_order_grad_a, second_order_grad_t)

This sophisticated example showcases how to differentiate through multiple tensor operations and how to calculate higher-order derivatives.

Key Points

torch.autograd.grad() is a powerful function designed to specifically compute gradients of tensors.
Set create_graph=True when you need to perform operations using the computed gradients, like higher-order gradients.
Keeping the computational graph via retain_graph=True can be important in back-to-back gradient calculations.
Debug and check tensor requirements by examining the requires_grad property and ensuring tensors participate in computations leading up to the target result.

Next Article: Backpropagation Simplified with `torch.autograd.backward()` in PyTorch

Previous Article: How to Identify Finite Values Using `torch.isfinite()` in PyTorch

Series: Working with Tensors in PyTorch

PyTorch