Sling Academy
Home/PostgreSQL/PostgreSQL Full-Text Search: How to Use `tsvector` and `tsquery`

PostgreSQL Full-Text Search: How to Use `tsvector` and `tsquery`

Last updated: December 20, 2024

PostgreSQL offers powerful tools for implementing full-text search, a common requirement for applications that store and retrieve data based on human language. Among these tools are the tsvector and tsquery types, which form the foundation of full-text searching in PostgreSQL.

Understanding tsvector and tsquery

The tsvector type is designed to store pre-processed text data. When you search through text using a tsvector, PostgreSQL has already done the work of breaking the text into vectors comprising lexemes. This preprocessing allows for fast comparison between the target text and the search query.

The tsquery type, on the other hand, represents the search query itself. It consists of a set of lexemes combined with logical operators, which PostgreSQL uses to search for occurrences within a tsvector.

Preparing Your Data with tsvector

To start using full-text search, the first step is to convert your text data into a tsvector. Here is a simple example:


CREATE TABLE articles (
    id SERIAL PRIMARY KEY,
    title TEXT,
    content TEXT,
    tsv_content TSVECTOR
);

UPDATE articles SET tsv_content = to_tsvector('english', content);

In the above code, a table named articles is created and a new column of type tsvector is used to store parsed text data. We update the tsv_content column using the function to_tsvector, which takes a configuration (like 'english') and the target column.

Creating a tsquery

Once your text is prepared, you can create search queries using the to_tsquery function. Here's how you might search through the articles table:


SELECT * FROM articles 
WHERE tsv_content @@ to_tsquery('english', 'search & query');

In this example, @@ is the match operator, used to check whether the tsvector matches the tsquery. The result will include any articles whose content match the phrase 'search' and 'query'.

Enhancing Your Search: Indexing

To improve the performance of full-text search, it is a best practice to create an index on the tsvector column. PostgreSQL supports Generalized Inverted Index (GIN), which is optimized for searches over data types like tsvector:


CREATE INDEX tsv_content_idx ON articles USING gin(tsv_content);

This index makes lookups faster and reduces the overall search time in large datasets.

Practical Full-Text Search Example

Let's put everything together in a practical scenario. Imagine you have an application where you update articles frequently. You want these dynamically entered articles to remain searchable:


CREATE OR REPLACE FUNCTION trigger_update_tsv() RETURNS TRIGGER AS $$
BEGIN
  NEW.tsv_content := to_tsvector('english', NEW.content);
  RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER tsvectorupdate BEFORE INSERT OR UPDATE ON articles 
FOR EACH ROW EXECUTE FUNCTION trigger_update_tsv();

This trigger function trigger_update_tsv ensures that the tsvector representation of an article’s content is automatically updated whenever there's an INSERT or UPDATE operation on the article. This means the full-text search capabilities stay up to date with minimal manual intervention.

Conclusion

PostgreSQL's full-text search features, driven by tsvector and tsquery, provide an efficient and powerful way to perform text searches. With indexing, search queries become highly efficient, even as the searchable content significantly increases. By integrating triggers, you can maintain accuracy and performance dynamically across your database, creating seamless, robust applications that meet the demands of complex searches.

Next Article: Creating Indexes for PostgreSQL Full-Text Search Performance Optimization

Previous Article: Understanding PostgreSQL Full-Text Search Architecture

Series: PostgreSQL Tutorials: From Basic to Advanced

PostgreSQL

You May Also Like

  • PostgreSQL with TimescaleDB: Querying Time-Series Data with SQL
  • PostgreSQL Full-Text Search with Boolean Operators
  • Filtering Stop Words in PostgreSQL Full-Text Search
  • PostgreSQL command-line cheat sheet
  • How to Perform Efficient Rolling Aggregations with TimescaleDB
  • PostgreSQL with TimescaleDB: Migrating from Traditional Relational Models
  • Best Practices for Maintaining PostgreSQL and TimescaleDB Databases
  • PostgreSQL with TimescaleDB: Building a High-Performance Analytics Engine
  • Integrating PostgreSQL and TimescaleDB with Machine Learning Models
  • PostgreSQL with TimescaleDB: Implementing Temporal Data Analysis
  • Combining PostgreSQL, TimescaleDB, and Airflow for Data Workflows
  • PostgreSQL with TimescaleDB: Visualizing Real-Time Data with Superset
  • Using PostgreSQL with TimescaleDB for Energy Consumption Analysis
  • PostgreSQL with TimescaleDB: How to Query Massive Datasets Efficiently
  • Best Practices for Writing Time-Series Queries in PostgreSQL with TimescaleDB
  • PostgreSQL with TimescaleDB: Implementing Batch Data Processing
  • Using PostgreSQL with TimescaleDB for Network Traffic Analysis
  • PostgreSQL with TimescaleDB: Troubleshooting Common Performance Issues
  • Building an IoT Data Pipeline with PostgreSQL and TimescaleDB