PostgreSQL Full-Text Search: How to Use `tsvector` and `tsquery`

PostgreSQL offers powerful tools for implementing full-text search, a common requirement for applications that store and retrieve data based on human language. Among these tools are the tsvector and tsquery types, which form the foundation of full-text searching in PostgreSQL.

Understanding tsvector and tsquery
Preparing Your Data with tsvector
Creating a tsquery
Enhancing Your Search: Indexing
Practical Full-Text Search Example
Conclusion

Understanding `tsvector` and `tsquery`

The tsvector type is designed to store pre-processed text data. When you search through text using a tsvector, PostgreSQL has already done the work of breaking the text into vectors comprising lexemes. This preprocessing allows for fast comparison between the target text and the search query.

The tsquery type, on the other hand, represents the search query itself. It consists of a set of lexemes combined with logical operators, which PostgreSQL uses to search for occurrences within a tsvector.

Preparing Your Data with `tsvector`

To start using full-text search, the first step is to convert your text data into a tsvector. Here is a simple example:


CREATE TABLE articles (
    id SERIAL PRIMARY KEY,
    title TEXT,
    content TEXT,
    tsv_content TSVECTOR
);

UPDATE articles SET tsv_content = to_tsvector('english', content);

In the above code, a table named articles is created and a new column of type tsvector is used to store parsed text data. We update the tsv_content column using the function to_tsvector, which takes a configuration (like 'english') and the target column.

Creating a `tsquery`

Once your text is prepared, you can create search queries using the to_tsquery function. Here's how you might search through the articles table:


SELECT * FROM articles 
WHERE tsv_content @@ to_tsquery('english', 'search & query');

In this example, @@ is the match operator, used to check whether the tsvector matches the tsquery. The result will include any articles whose content match the phrase 'search' and 'query'.

Enhancing Your Search: Indexing

To improve the performance of full-text search, it is a best practice to create an index on the tsvector column. PostgreSQL supports Generalized Inverted Index (GIN), which is optimized for searches over data types like tsvector:


CREATE INDEX tsv_content_idx ON articles USING gin(tsv_content);

This index makes lookups faster and reduces the overall search time in large datasets.

Practical Full-Text Search Example

Let's put everything together in a practical scenario. Imagine you have an application where you update articles frequently. You want these dynamically entered articles to remain searchable:


CREATE OR REPLACE FUNCTION trigger_update_tsv() RETURNS TRIGGER AS $$
BEGIN
  NEW.tsv_content := to_tsvector('english', NEW.content);
  RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER tsvectorupdate BEFORE INSERT OR UPDATE ON articles 
FOR EACH ROW EXECUTE FUNCTION trigger_update_tsv();

This trigger function trigger_update_tsv ensures that the tsvector representation of an article’s content is automatically updated whenever there's an INSERT or UPDATE operation on the article. This means the full-text search capabilities stay up to date with minimal manual intervention.

Conclusion

PostgreSQL's full-text search features, driven by tsvector and tsquery, provide an efficient and powerful way to perform text searches. With indexing, search queries become highly efficient, even as the searchable content significantly increases. By integrating triggers, you can maintain accuracy and performance dynamically across your database, creating seamless, robust applications that meet the demands of complex searches.

Next Article: Creating Indexes for PostgreSQL Full-Text Search Performance Optimization

Previous Article: Understanding PostgreSQL Full-Text Search Architecture

Series: PostgreSQL Tutorials: From Basic to Advanced

PostgreSQL

PostgreSQL

PostgreSQL Full-Text Search: How to Use `tsvector` and `tsquery`

Table of Contents

Understanding tsvector and tsquery

Preparing Your Data with tsvector