Sling Academy
Home/PostgreSQL/Improving Search Relevance with PostgreSQL's `rank` and `rank_cd` Functions

Improving Search Relevance with PostgreSQL's `rank` and `rank_cd` Functions

Last updated: December 20, 2024

When dealing with large datasets, providing users with relevant search results is crucial. PostgreSQL offers powerful tools such as the rank and rank_cd functions that facilitate this task. These functions are largely used to rank search results, making it easier to display the most pertinent information first.

Understanding the rank and rank_cd Functions

The rank function in PostgreSQL helps to assign a rank to each row within a partition of a result set. Rows with equal values for the ranking criteria receive the same rank, and the next rank is not skipped.

SELECT rank() OVER (
    PARTITION BY department ORDER BY salary DESC
) AS dept_rank,
name, salary, department
FROM employees;

In this example, the employees table is partitioned by the department, and each employee is ranked according to their salary in descending order.

The rank_cd, or cumulative distribution rank, function ranks values like rank but with a keen focus on continuous datasets, ensuring smoother transitions between rank numbers, which proves useful for percentile ranking.

Use Cases for Search Relevance

Improving search relevance involves multiple factors, including keyword match, recentness of the data, and user engagement metrics. Let’s consider you manage a content-driven website where users search for articles. Implementing ranking helps bring the most relevant articles to the forefront.

Keyword Relevance Using rank

When a user enters a search query, it's common to want to return results in order of how well they match the query. Combine the rank function with text search capabilities:

SELECT ts_rank_cd(tsv, query) AS rank,
   title, content
FROM articles,
   to_tsquery('User Input') AS query
WHERE tsv @@ query
ORDER BY rank DESC;

In this example, ts_rank_cd is used with the text search vector tsv to rank documents based on their relevance to the query.

Balancing Relevance and Recency

A potentially more complex scenario involves not only keyword matching but also rewarding more recent articles. This can be achieved by adjusting ranks according to a formula that combines relevance and recency metrics.

SELECT (0.7 * ts_rank_cd(tsv, query) + 0.3 * date_rank) AS combined_rank,
   title, content
FROM articles,
   to_tsquery('User Input') AS query,
   generate_series(1, list_count) AS date_rank
WHERE tsv @@ query
ORDER BY combined_rank DESC;

Here, a simplistic weighting is applied where 70% of the rank comes from content relevance and 30% from the recency of articles.

Combining User Data

Another area where rank_cd can aid is incorporating user interaction data, perhaps by analyzing which articles users spend the most time reading. For instance, suppose we assign scores based on these metrics and order search results accordingly, yielding a more personalized experience.

SELECT (ts_rank_cd(tsv, query) + 0.5 * engagement_score) AS personal_rank,
   title, content
FROM articles,
   to_tsquery('User Input') AS query
WHERE tsv @@ query
ORDER BY personal_rank DESC;

Within this query, an engagement_score derived from user actions can be combined to produce a dynamically tailored rank for each result.

Conclusion

PostgreSQL’s rank and rank_cd functions are powerful allies in crafting a highly relevant search experience. By judiciously applying these functions, you can manipulate search order according to user needs, balancing both relevance and recency, while newly integrated data metrics can further enhance user satisfaction.

Next Article: PostgreSQL Full-Text Search: Using `plainto_tsquery` for Simplified Queries

Previous Article: How to Rank Search Results in PostgreSQL Full-Text Search

Series: PostgreSQL Tutorials: From Basic to Advanced

PostgreSQL

You May Also Like

  • PostgreSQL with TimescaleDB: Querying Time-Series Data with SQL
  • PostgreSQL Full-Text Search with Boolean Operators
  • Filtering Stop Words in PostgreSQL Full-Text Search
  • PostgreSQL command-line cheat sheet
  • How to Perform Efficient Rolling Aggregations with TimescaleDB
  • PostgreSQL with TimescaleDB: Migrating from Traditional Relational Models
  • Best Practices for Maintaining PostgreSQL and TimescaleDB Databases
  • PostgreSQL with TimescaleDB: Building a High-Performance Analytics Engine
  • Integrating PostgreSQL and TimescaleDB with Machine Learning Models
  • PostgreSQL with TimescaleDB: Implementing Temporal Data Analysis
  • Combining PostgreSQL, TimescaleDB, and Airflow for Data Workflows
  • PostgreSQL with TimescaleDB: Visualizing Real-Time Data with Superset
  • Using PostgreSQL with TimescaleDB for Energy Consumption Analysis
  • PostgreSQL with TimescaleDB: How to Query Massive Datasets Efficiently
  • Best Practices for Writing Time-Series Queries in PostgreSQL with TimescaleDB
  • PostgreSQL with TimescaleDB: Implementing Batch Data Processing
  • Using PostgreSQL with TimescaleDB for Network Traffic Analysis
  • PostgreSQL with TimescaleDB: Troubleshooting Common Performance Issues
  • Building an IoT Data Pipeline with PostgreSQL and TimescaleDB