3 Ways to select a single random row in PostgreSQL

Updated: January 4, 2024 By: Guest Contributor Post a comment

Overview

Selecting a random row from a Postgres database can be important for scenarios like sampling data, A/B testing, or just simply when you need to retrieve a random item. This article will explore various ways to accomplish this task each with its own set of advantages and limitations.

Solution 1: ORDER BY RANDOM()

This is the simplest method for selecting a random row. It orders the entire table by the RANDOM() function and then limits the result to a single entry.

  • Step 1: Write the SQL SELECT statement.
  • Step 2: Order the result by the RANDOM() function.
  • Step 3: Limit the result set to one row.

Example:

SELECT * FROM your_table ORDER BY RANDOM() LIMIT 1;

Pros: Very easy to implement. No pre-processing required.

Cons: Performance-intensive on large tables since it requires sorting the entire table.

Solution 2: TBL TABLESAMPLE SYSTEM (1 ROWS)

Utilizes the TABLESAMPLE SYSTEM clause to select a random sample of the table’s rows. Note that this method is less random, as it operates on pages, not individual rows, but it’s much faster for large tables.

  • Step 1: Write the SQL SELECT statement with the TABLESAMPLE SYSTEM clause.
  • Step 2: Specify that you want 1 ROW.

Example:

SELECT * FROM your_table TABLESAMPLE SYSTEM (1 ROWS);

Pros: Significantly faster for large tables.

Cons: Not truly random because rows are sampled from random pages. Some rows may never be selected if they’re on less frequently sampled pages.

Solution 3: Sequential ID Random Selection

This method assumes the presence of a sequential ID or similar column. It randomly generates a number based on the maximum ID value and selects the row with the corresponding ID.

  • Step 1: Retrieve the maximum ID value from the table.
  • Step 2: Generate a random number between 1 and the maximum ID using RANDOM().
  • Step 3: Select the row where the ID matches the random number generated.

Example:

WITH RECURSIVE t AS (SELECT MAX(id) AS max_id FROM your_table)SELECT * FROM your_table WHERE id = (SELECT floor(random() * max_id + 1)::int FROM t);

Pros: More efficient than ORDER BY RANDOM() for large tables with a sequential ID.

Cons: Assumes a sequential ID without gaps, which isn’t always realistic. Potentially problematic with concurrently changing data.

Conclusion

When it comes to selecting a random row in PostgreSQL, various methods are available each suited to different requirements and data setups. ORDER BY RANDOM() is the most straightforward technique but may be impractical for large databases due to its performance implications. TABLESAMPLE can offer better performance at the cost of being less random. For tables with sequential IDs, leveraging the ID column can be a fast and effective middle-ground. Understanding the pros and cons of each method is crucial for picking the right one for your specific use case.