PostgreSQL COUNT(*), COUNT(column_name), and COUNT(1): Which is fastest?

Introduction
Understanding the COUNT Function
Performance Analysis: COUNT(*) vs. COUNT(column_name) vs. COUNT(1)
Advanced Techniques: Counting with Joins and Conditions
Conclusion

Introduction

In PostgreSQL, the COUNT function is a fundamental aggregation tool used to count rows in a database table. This tutorial explores the performance differences between the COUNT(*) syntax, COUNT(column_name), and COUNT(1), to help you write more efficient queries.

Understanding the COUNT Function

Before diving into performance comparisons, it’s important to understand what each COUNT variation does:

COUNT(*) counts all rows in a table, including nulls and duplicates.
COUNT(column_name) counts all non-NULL values in a specific column.
COUNT(1) is a constant count, effectively the same as COUNT(*), which counts all rows.

Let’s go through them with examples:

BASIC USAGE OF COUNT(*)

SELECT COUNT(*) FROM users;

This query will return the total number of rows in the ‘users’ table.

BASIC USAGE OF COUNT(column_name)

SELECT COUNT(id) FROM users;

This query will return the number of rows with a non-NULL ‘id’ column in the ‘users’ table.

BASIC USAGE OF COUNT(1)

SELECT COUNT(1) FROM users;

Like COUNT(*), this returns the total number of rows in the ‘users’ table.

Performance Analysis: COUNT(*) vs. COUNT(column_name) vs. COUNT(1)

It is a common misconception that COUNT(1) performs faster than COUNT(*) because people sometimes think that counting a constant value should be less work for the database. However, databases like PostgreSQL are optimized for such operations, and the differences between them are more nuanced.

Performance between COUNT(*) and COUNT(1) is generally the same because PostgreSQL recognizes them as semantically equivalent. The actual deciding factor for performance is the complexity and structure of the underlying table and the presence of indexes.

Here’s an example showing all three usages in a single query:

EXPLAIN ANALYZE SELECT COUNT(*), COUNT(id), COUNT(1) FROM users;

We use the EXPLAIN ANALYZE command in PostgreSQL to measure the query plan and execution time, helping us to compare which count operation is more efficient in a real-world scenario.

Considerations may vary, though. For instance, if you’re counting a specific column that is indexed, like the primary key, PostgreSQL can use that index to count rows, which usually results in a quicker query, like so:

EXPLAIN ANALYZE SELECT COUNT(id) FROM users;

An understanding of indexes is crucial in grasping why this might be faster. When PostgreSQL can use an index scan instead of a sequential scan, it often means that less disk I/O is required, thereby increasing performance. That said, this performance gain is not specific to COUNT(column_name) and may not apply in all scenarios.

Advanced Techniques: Counting with Joins and Conditions

When performing more complex queries, such as those involving joins or where clauses, the choice between COUNT methods again boils down to what best suits your data and indexes. Here’s an example with a join:

SELECT COUNT(*) FROM users JOIN orders ON users.id = orders.user_id WHERE orders.status = 'shipped';

On the other hand, when dealing with conditions on indexed columns, it’s worth experimenting between COUNT(*) and COUNT(column_name). An example using a WHERE clause could look like this:

SELECT COUNT(id) FROM users WHERE last_login < '2021-01-01' AND status = 'active';

In these scenarios, use EXPLAIN ANALYZE to identify if one count method offers a substantial performance benefit over the others.

Conclusion

In conclusion, while there might be minor differences when using COUNT(*) vs. COUNT(column_name) vs. COUNT(1) in theory, PostgreSQL’s query planner often negates these distinctions. The best practice is to choose the COUNT function that semantically fits the intention of your query while considering the utilization of indexes and the specific conditions of your database structure. Regardless of the method, always ensure your database is properly tuned and analyze the query plans for performance optimization in any particular case.

Next Article: PostgreSQL aggregation: SUM, AVG, MIN, and MAX

Previous Article: PostgreSQL: Counting Results Using the COUNT Function

Series: PostgreSQL Tutorials: From Basic to Advanced

PostgreSQL