PostgreSQL: Full text search with tsvector and tsquery

Updated: January 5, 2024 By: Guest Contributor Post a comment

Introduction

Unleash the power of PostgreSQL’s full-text search capabilities by harnessing tsvector and tsquery, providing advanced, efficient, and versatile search functionality within your database.

What is Full Text Search?

Full text search in PostgreSQL allows you to search through a database by looking for specific words or phrases within text data. It is more sophisticated than simple pattern matching like the LIKE might offer.

Understanding tsvector and tsquery Types

In PostgreSQL, tsvector is a data type that represents a document in the form of lexemes, which are distinct lexical units like words or numbers. On the other hand, tsquery represents a text search query.

Creating a tsvector

SELECT 'A fat cat sat on a mat and ate a fat rat.'::tsvector;

This creates a tsvector of lexemes without any weights or document structure.

Formulating a tsquery

SELECT 'fat & rat'::tsquery;

A tsquery looks for the co-occurrence of both words ‘fat’ and ‘rat’ in the document.

Basic Text Search Operations

Let’s start by creating a basic example.

CREATE TABLE documents (
    id serial PRIMARY KEY,
    content text
);

INSERT INTO documents (content) VALUES
    ('A fat cat sat on a mat and ate a fat rat.'),
    ('No cats or rats were injured.');

SELECT * FROM documents WHERE to_tsvector(content) @@ to_tsquery('fat & rat');

This example demonstrates how to find documents containing both ‘fat’ and ‘rat’.

Indexing for Performance

Full text searches can be boosted with indexes.

CREATE INDEX document_idx ON documents USING gin (to_tsvector('english', content));

This index greatly improves search performance on large datasets.

Weighting and Ranking

We can assign weights to different sections of a document for relevance ranking.

SELECT
    id,
    ts_rank_cd(
        setweight(to_tsvector('english', title), 'A') ||
        setweight(to_tsvector('english', body), 'B'),
        to_tsquery('english', 'PostgreSQL & tsvector')
    ) AS rank
FROM articles
ORDER BY rank DESC;

Here, titles carry more weight than bodies for the search term ‘PostgreSQL & tsvector’.

Phrase Search and Proximity Queries

Phrase search finds exact phrases, and proximity queries are used to find words within a certain distance from each other.

SELECT * FROM documents WHERE to_tsvector('english', content) @@ phraseto_tsquery('english', 'fat rat');

This will fetch documents with the phrase ‘fat rat’.

Advanced Text Search Features

PostgreSQL full text search also supports complex boolean queries and subdocument searches.

SELECT * FROM documents WHERE to_tsvector('english', content) @@ to_tsquery('english', '(fat | feline) & !rat');

This means the content should have either ‘fat’ or ‘feline’ and should not have ‘rat’.

Handling Different Languages

Full text search can handle multiple languages and configurations. To search in Spanish, for instance, you’d initialize the tsvector using the Spanish configuration.

SELECT 'El rápido zorro marrón'::tsvector('spanish');

This vectorizes the text considering the nuances of the Spanish language.

Conclusion

Full text search with tsvector and tsquery offers robust search capabilities in PostgreSQL. Mastering these tools allows for high precision and performance when querying across large text datasets. By implementing proper indexing and understanding the versatile functions available, developers can build complex search functionalities suitable for modern applications.