Improving Search Relevance with PostgreSQL's `rank` and `rank_cd` Functions

When dealing with large datasets, providing users with relevant search results is crucial. PostgreSQL offers powerful tools such as the rank and rank_cd functions that facilitate this task. These functions are largely used to rank search results, making it easier to display the most pertinent information first.

Understanding the rank and rank_cd Functions
Use Cases for Search Relevance
Conclusion

Understanding the `rank` and `rank_cd` Functions

The rank function in PostgreSQL helps to assign a rank to each row within a partition of a result set. Rows with equal values for the ranking criteria receive the same rank, and the next rank is not skipped.

SELECT rank() OVER (
    PARTITION BY department ORDER BY salary DESC
) AS dept_rank,
name, salary, department
FROM employees;

In this example, the employees table is partitioned by the department, and each employee is ranked according to their salary in descending order.

The rank_cd, or cumulative distribution rank, function ranks values like rank but with a keen focus on continuous datasets, ensuring smoother transitions between rank numbers, which proves useful for percentile ranking.

Use Cases for Search Relevance

Improving search relevance involves multiple factors, including keyword match, recentness of the data, and user engagement metrics. Let’s consider you manage a content-driven website where users search for articles. Implementing ranking helps bring the most relevant articles to the forefront.

Keyword Relevance Using `rank`

When a user enters a search query, it's common to want to return results in order of how well they match the query. Combine the rank function with text search capabilities:

SELECT ts_rank_cd(tsv, query) AS rank,
   title, content
FROM articles,
   to_tsquery('User Input') AS query
WHERE tsv @@ query
ORDER BY rank DESC;

In this example, ts_rank_cd is used with the text search vector tsv to rank documents based on their relevance to the query.

Balancing Relevance and Recency

A potentially more complex scenario involves not only keyword matching but also rewarding more recent articles. This can be achieved by adjusting ranks according to a formula that combines relevance and recency metrics.

SELECT (0.7 * ts_rank_cd(tsv, query) + 0.3 * date_rank) AS combined_rank,
   title, content
FROM articles,
   to_tsquery('User Input') AS query,
   generate_series(1, list_count) AS date_rank
WHERE tsv @@ query
ORDER BY combined_rank DESC;

Here, a simplistic weighting is applied where 70% of the rank comes from content relevance and 30% from the recency of articles.

Combining User Data

Another area where rank_cd can aid is incorporating user interaction data, perhaps by analyzing which articles users spend the most time reading. For instance, suppose we assign scores based on these metrics and order search results accordingly, yielding a more personalized experience.

SELECT (ts_rank_cd(tsv, query) + 0.5 * engagement_score) AS personal_rank,
   title, content
FROM articles,
   to_tsquery('User Input') AS query
WHERE tsv @@ query
ORDER BY personal_rank DESC;

Within this query, an engagement_score derived from user actions can be combined to produce a dynamically tailored rank for each result.

Conclusion

PostgreSQL’s rank and rank_cd functions are powerful allies in crafting a highly relevant search experience. By judiciously applying these functions, you can manipulate search order according to user needs, balancing both relevance and recency, while newly integrated data metrics can further enhance user satisfaction.

Next Article: PostgreSQL Full-Text Search: Using `plainto_tsquery` for Simplified Queries

Previous Article: How to Rank Search Results in PostgreSQL Full-Text Search

Series: PostgreSQL Tutorials: From Basic to Advanced

PostgreSQL