Using LEAD() and LAG() functions in MySQL 8: A Practical Guide

Updated: January 27, 2024 By: Guest Contributor Post a comment

Introduction

In the realm of database analysis and operation, MySQL stands out as one of the most widely used relational database management systems. With the release of MySQL 8, various functions were introduced, enhancing the analysis capabilities that SQL developers and data analysts can utilize. Among these functions, LEAD() and LAG() have proven particularly useful when working with data that involves sequencing or comparison across rows. This tutorial is designed to introduce the LEAD() and LAG() functions in MySQL 8 and demonstrate how they can be skillfully applied through multiple hands-on examples.

The Use of LEAD() and LAG() Functions

The LEAD() function allows you to access data from the following row in the same result set, without requiring a self-join. Similarly, the LAG() function does the opposite. It retrieves data from the preceding row. This is incredibly useful when needing to compare current row values with values of subsequent or prior rows.

Basic Usage of LAG()

To begin experimenting with the LAG() function, consider the following simple example. Suppose we have a table named sales_data, containing daily sales figures across various days:

CREATE TABLE sales_data (
  id INT PRIMARY KEY,
  sale_date DATE,
  sales_amount DECIMAL(10,2)
);

INSERT INTO sales_data (id, sale_date, sales_amount)
VALUES
  (1, '2021-01-01', 1500.00),
  (2, '2021-01-02', 1600.50),
  (3, '2021-01-03', 1800.75),
  (4, '2021-01-04', 2000.00);

Now, let’s use LAG() to compare the sales amount of each day with the previous day:

SELECT
  sale_date,
  sales_amount,
  LAG(sales_amount) OVER (ORDER BY sale_date) AS previous_day_sales
FROM
  sales_data;

The above query will fetch the current day’s sales along with the sales amount of the previous day for comparison. The output will look like this:

+------------+--------------+--------------------+
| sale_date  | sales_amount | previous_day_sales |
+------------+--------------+--------------------+
| 2021-01-01 |       1500.00|               NULL |
| 2021-01-02 |       1600.50|             1500.00|
| 2021-01-03 |       1800.75|             1600.50|
| 2021-01-04 |       2000.00|             1800.75|
+------------+--------------+--------------------+

In this result set, you can see how LAG() retrieves the sales amount of the previous row based on the order specified in the OVER() clause.

Basic Usage of LEAD()

Moving on to the LEAD() function, let’s assume that we want to forecast tomorrow’s sales by looking at today’s sales. We will use the same sales_data table:

SELECT
  sale_date,
  sales_amount,
  LEAD(sales_amount) OVER (ORDER BY sale_date) AS next_day_sales
FROM
  sales_data;

Similar to LAG(), the LEAD() function will display the sales amount for the next day. Here’s what the output will resemble:

+------------+--------------+----------------+
| sale_date  | sales_amount | next_day_sales |
+------------+--------------+----------------+
| 2021-01-01 |       1500.00|         1600.50|
| 2021-01-02 |       1600.50|         1800.75|
| 2021-01-03 |       1800.75|         2000.00|
| 2021-01-04 |       2000.00|            NULL|
+------------+--------------+----------------+

Note that the last row displays NULL because there’s no subsequent sale amount available.

Advanced Usage: Calculating Differences

Now that we understand the basics, we can do more complex calculations, such as finding the difference between the current day’s sales amount and the previous day’s. Modifying the above query:

SELECT
  sale_date,
  sales_amount,
  LAG(sales_amount) OVER (ORDER BY sale_date) AS previous_day_sales,
  sales_amount - LAG(sales_amount) OVER (ORDER BY sale_date) AS difference
FROM
  sales_data;

This will give us an additional column showing the difference, which helps in analyzing day-to-day sales variance. The output will have the following form:

+------------+--------------+--------------------+------------+
| sale_date  | sales_amount | previous_day_sales | difference |
+------------+--------------+--------------------+------------+
| 2021-01-01 |       1500.00|               NULL |       NULL |
| 2021-01-02 |       1600.50|             1500.00|       100.50|
| 2021-01-03 |       1800.75|             1600.50|       200.25|
| 2021-01-04 |       2000.00|             1800.75|       199.25|
+------------+--------------+--------------------+------------+

Window Frames

The LEAD() and LAG() functions are even more powerful when used with window frames. You can define a specific range of rows for these functions to operate over, instead of defaulting to the entire result set. This is particularly helpful when the dataset is partitioned into groups.

Consider the example where you have multiple stores and want to compare the sales data for each store:

CREATE TABLE sales_by_store (
  id INT PRIMARY KEY,
  sale_date DATE,
  store_id INT,
  sales_amount DECIMAL(10,2)
);

-- The statement to populate the table is omitted for brevity

SELECT
  store_id,
  sale_date,
  sales_amount,
  LAG(sales_amount, 1) OVER (PARTITION BY store_id ORDER BY sale_date) AS prev_store_sales
FROM
  sales_by_store;

By using PARTITION BY, you can calculate the LAG() for the sales_amount column for each store separately. The output will reflect the previous sales per store, maintaining the separation by store_id.

Practical Application: Ranking and Lead/Lag Comparison

A common use case when dealing with datasets that reflect a certain order to establish rankings or positional indexing. Let’s say we have sales agents and we want to rank them by the number of sales they have made:

CREATE TABLE sales_agents (
  id INT PRIMARY KEY,
  agent_name VARCHAR(100),
  total_sales INT
);

-- The statement to populate the table is omitted for brevity

SELECT
  id,
  agent_name,
  total_sales,
  ROW_NUMBER() OVER (ORDER BY total_sales DESC) AS rnk,
  LAG(agent_name, 1) OVER (ORDER BY total_sales DESC) AS prev_agent,
  LEAD(agent_name, 1) OVER (ORDER BY total_sales DESC) AS next_agent
FROM
  sales_agents;

This will yield a table that not only ranks the agents by their total sales but also lists the respective agent who is ranked immediately before and after each one, proving a competitive analysis environment.

Conclusion

In utilizing the LEAD() and LAG() functions in MySQL 8, you can harness compelling data insights with relative directional analysis across your data rows. Whether it’s financial comparisons, rankings, or predictive assessments, these functions offer a powerful pathway to uncover patterns and trends in your data.