Using NTILE() function in MySQL 8: A quick Guide

Updated: January 27, 2024 By: Guest Contributor Post a comment

Introduction

The NTILE() function is a window function introduced in MySQL 8 that allows for dividing an ordered dataset into a specified number of approximately equal ranked groups. This function can be incredibly useful for a wide range of applications in data analysis, reporting, and statistical computations.

In this guide, we will explore the NTILE() function in MySQL 8 through multiple code examples, starting with basic usage and progressing towards more advanced scenarios. The examples will demonstrate how to use this function and interpret the results effectively.

Basic Usage of NTILE()

The NTILE() function’s basic syntax is as follows:

NTILE(number_of_tiles) OVER (ORDER BY column_name)

Let’s start with a simple example. Imagine a table sales with two columns: id and revenue. Here’s how you could divide the data into four equal groups:

SELECT id, revenue, NTILE(4) OVER (ORDER BY revenue DESC) AS quartile
FROM sales;

The above query would rank the sales by revenue in descending order and assign a quartile group ranging from 1 to 4 to each row.

Equal-Size Grouping with NTILE()

Now, let’s make groups of equal size. If you have a dataset with 20 rows and want to divide it into 4 groups this is how you can do it:

SELECT id, revenue, NTILE(4) OVER (ORDER BY revenue DESC) AS group_id
FROM sales;

This will output 20 rows, each assigned to a group from 1 to 4 with an equal or almost equal number of rows.

NTILE() with PARTITION BY

When analyzing a big dataset, you might want to divide it into tiles within subgroups. You can do this with the PARTITION BY clause:

SELECT year, department, revenue, NTILE(4) OVER (PARTITION BY year ORDER BY revenue DESC) AS quartile
FROM sales;

This snippet partitions the data by year, then orders it by revenue within each year, and applies the NTILE function to each partition separately.

Advanced Group Analysis

In a more complex example, let’s say you want to analyze the distribution of sales within each group created via NTILE(). We’ll use aggregate functions in conjunction with NTILE() to achieve this:

SELECT group_id, COUNT(*) AS total_sales, AVG(revenue) AS average_revenue,
MIN(revenue) AS min_revenue, MAX(revenue) AS max_revenue
FROM (
  SELECT revenue, NTILE(4) OVER (ORDER BY revenue DESC) AS group_id
  FROM sales
) AS t
GROUP BY group_id;

Application Example: Calculating Percentiles

If you want to calculate percentiles within your dataset, NTILE can help. For example, to calculate the median value (or the 50th percentile), we would use NTILE() to divide the dataset into two groups:

SELECT NTILE(2) OVER (ORDER BY revenue DESC) AS percentile_50
FROM sales;

Each row would now be tagged as belonging to the first or second percentile group.

Dealing with Ties

If you have ties (equal values), NTILE() will distribute them across the groups as evenly as possible. In case of indivisible ties, the earlier rows will get placed in the lower-numbered groups.

Here’s a small MySQL code example that demonstrates the use of the NTILE() function, particularly focusing on how it deals with ties (equal values). The NTILE() function is used to divide ordered rows into a specified number of approximately equal groups.

-- Create an example table
CREATE TABLE scores (
    id INT AUTO_INCREMENT PRIMARY KEY,
    score INT
);

-- Insert data with ties (equal scores)
INSERT INTO scores (score) VALUES
(10), (20), (20), (30), (30), (30), (40), (40), (50), (50);

-- Query using NTILE() to distribute rows into 4 groups
SELECT 
    id, 
    score,
    NTILE(4) OVER (ORDER BY score) AS quartile
FROM 
    scores;

Explanation

  • Creating a Table: The scores table is created with an id and a score column.
  • Inserting Data: Data inserted into the scores table includes duplicate values (ties), such as multiple entries with scores of 20, 30, 40, and 50.
  • Using NTILE(): The SELECT query uses the NTILE(4) function to divide the rows into 4 groups (quartiles) based on the score column. The ORDER BY clause within OVER is important as NTILE() operates over an ordered set of rows.
  • Dealing with Ties: NTILE() distributes rows with ties across the groups. If there’s an indivisible number of ties (e.g., three rows with the same value when dividing into four groups), the earlier rows among the ties get placed into the lower-numbered groups.

Pagination with NTILE()

Another application of NTILE() is for paginating results. For example, dividing 100 rows into pages of 10:

SELECT id, revenue, NTILE(10) OVER (ORDER BY id) AS page
FROM sales;

This query will assign each row a page number from 1 to 10, effectively creating a paginated view of 10 pages, each with 10 rows.

Nuances of NTILE()

Understanding the nuances of NTILE can help with more accurate groupings. For example, when the dataset size is not perfectly divisible by the number of tiles, NTILE() assigns the extra rows to the initial groups. Thus, the groups in front may have more rows than those at the end.

Below is an example demonstrating the nuances of the NTILE() function in MySQL, particularly focusing on how it behaves when the dataset size is not perfectly divisible by the number of tiles (groups). In such cases, NTILE() distributes the extra rows to the initial groups, resulting in these groups potentially having more rows than those at the end.

The example:

-- Create an example table
CREATE TABLE students (
    id INT AUTO_INCREMENT PRIMARY KEY,
    name VARCHAR(50),
    score INT
);

-- Insert data
INSERT INTO students (name, score) VALUES
('Alice', 85), ('Bob', 90), ('Charlie', 75),
('David', 70), ('Eve', 95), ('Frank', 80),
('Grace', 65), ('Hannah', 60), ('Ivan', 55);

-- Query using NTILE() to divide students into 4 groups
SELECT 
    id, 
    name, 
    score,
    NTILE(4) OVER (ORDER BY score DESC) AS group_number
FROM 
    students;

Here:

  • Table Creation: The students table is created with id, name, and score.
  • Data Insertion: Nine students are inserted into the table with varying scores.
  • NTILE Query: The SELECT query uses NTILE(4) to divide the students into 4 groups based on their scores in descending order.
  • Group Distribution:
    • Since there are 9 students and 4 groups, the groups cannot have an equal number of students (9 is not divisible by 4).
    • NTILE(4) will distribute the students into 4 groups, where the first few groups will have more students than the last ones.
    • The first group will have 3 students (as it gets the extra student), and the remaining groups will have 2 students each.

Performance Considerations

When dealing with large datasets, keep in mind that NTILE() requires sorting data, which can be heavy on performance. Effective indexing, data partitioning, and server resource management are crucial for maintaining query performance.

Conclusion

We’ve taken a journey from basic to advanced use of the NTILE() function in MySQL 8, gaining insight into how it can serve in diverse scenarios from group analysis to pagination. Its versatility makes it an invaluable tool for both data analysts and developers alike.