MySQL 8: Get the latest row of each group with GROUP BY (3 approaches)

Updated: January 27, 2024 By: Guest Contributor Post a comment

Introduction

Retrieving the most recent record in each group is a common task when dealing with databases. With the rise of MySQL 8.0, developers now have more robust and efficient techniques at their disposal to accomplish this task. In this tutorial, we will explore multiple methods of using GROUP BY in MySQL 8 to get the latest row of each group.

Understanding the Data Setup

Before diving into queries, let’s set the stage with a sample data table named ‘sales’. Each row contains a sales entry with the following columns: ‘id’, ‘product_id’, ‘sale_date’, and ‘amount’. Our goal is to find the latest ‘sale_date’ for each ‘product_id’.

CREATE TABLE sales (
  id INT AUTO_INCREMENT PRIMARY KEY,
  product_id INT NOT NULL,
  sale_date DATETIME NOT NULL,
  amount DECIMAL(10 , 2 ) NOT NULL
);

Method 1: Using a Subquery

The traditional approach to this problem involves a subquery that identifies the maximum ‘sale_date’ for each ‘product_id’. Then, the outer query retrieves the corresponding row.

SELECT s1.*
FROM sales s1
JOIN (
  SELECT product_id, MAX(sale_date) AS max_date
  FROM sales
  GROUP BY product_id
) s2 ON s1.product_id = s2.product_id AND s1.sale_date = s2.max_date;

Method 2: Using MySQL 8 Window Functions

MySQL 8 introduces window functions that can simplify this task. The following example uses the ROW_NUMBER() function to assign a unique sequential integer to rows within each ‘product_id’ group, ordered by ‘sale_date’ in descending order. Then, it selects the row with the row number of 1, which represents the most recent sale per product.

SELECT id, product_id, sale_date, amount
FROM (
  SELECT *, ROW_NUMBER() OVER (PARTITION BY product_id ORDER BY sale_date DESC) AS rn
  FROM sales
) ranked_sales
WHERE rn = 1;

Method 3: Using Common Table Expressions and Window Functions

Common Table Expressions (CTEs) offer a cleaner and more readable approach when combined with window functions. The CTE defines a temporary result set from which you can then select the latest rows.

WITH ranked_sales AS (
  SELECT *, ROW_NUMBER() OVER (PARTITION BY product_id ORDER BY sale_date DESC) AS rn
  FROM sales
)
SELECT id, product_id, sale_date, amount
FROM ranked_sales
WHERE rn = 1;

Optimizing Queries for Performance

Performance is crucial when working with large datasets. Indexing ‘product_id’ and ‘sale_date’ can significantly improve the query speed.

CREATE INDEX idx_product_sale_date ON sales(product_id, sale_date);

Conclusion

When working with MySQL, always remember there could be multiple methods to obtain the desired result. Finding the latest sample sales data can be efficiently performed using subqueries, window functions, or CTEs. Knowing when and how to use these features can optimize your database interactions and lead to more maintainable code.

Continue exploring these concepts in your projects, and always consider the dataset size and the execution context while choosing the appropriate method for optimal performance.