Table of Contents
Introduction
Unleash the power of PostgreSQL’s full-text search capabilities by harnessing tsvector and tsquery, providing advanced, efficient, and versatile search functionality within your database.
What is Full Text Search?
Full text search in PostgreSQL allows you to search through a database by looking for specific words or phrases within text data. It is more sophisticated than simple pattern matching like the LIKE
might offer.
Understanding tsvector and tsquery Types
In PostgreSQL, tsvector
is a data type that represents a document in the form of lexemes, which are distinct lexical units like words or numbers. On the other hand, tsquery
represents a text search query.
Creating a tsvector
SELECT 'A fat cat sat on a mat and ate a fat rat.'::tsvector;
This creates a tsvector of lexemes without any weights or document structure.
Formulating a tsquery
SELECT 'fat & rat'::tsquery;
A tsquery looks for the co-occurrence of both words ‘fat’ and ‘rat’ in the document.
Basic Text Search Operations
Let’s start by creating a basic example.
CREATE TABLE documents (
id serial PRIMARY KEY,
content text
);
INSERT INTO documents (content) VALUES
('A fat cat sat on a mat and ate a fat rat.'),
('No cats or rats were injured.');
SELECT * FROM documents WHERE to_tsvector(content) @@ to_tsquery('fat & rat');
This example demonstrates how to find documents containing both ‘fat’ and ‘rat’.
Indexing for Performance
Full text searches can be boosted with indexes.
CREATE INDEX document_idx ON documents USING gin (to_tsvector('english', content));
This index greatly improves search performance on large datasets.
Weighting and Ranking
We can assign weights to different sections of a document for relevance ranking.
SELECT
id,
ts_rank_cd(
setweight(to_tsvector('english', title), 'A') ||
setweight(to_tsvector('english', body), 'B'),
to_tsquery('english', 'PostgreSQL & tsvector')
) AS rank
FROM articles
ORDER BY rank DESC;
Here, titles carry more weight than bodies for the search term ‘PostgreSQL & tsvector’.
Phrase Search and Proximity Queries
Phrase search finds exact phrases, and proximity queries are used to find words within a certain distance from each other.
SELECT * FROM documents WHERE to_tsvector('english', content) @@ phraseto_tsquery('english', 'fat rat');
This will fetch documents with the phrase ‘fat rat’.
Advanced Text Search Features
PostgreSQL full text search also supports complex boolean queries and subdocument searches.
SELECT * FROM documents WHERE to_tsvector('english', content) @@ to_tsquery('english', '(fat | feline) & !rat');
This means the content should have either ‘fat’ or ‘feline’ and should not have ‘rat’.
Handling Different Languages
Full text search can handle multiple languages and configurations. To search in Spanish, for instance, you’d initialize the tsvector using the Spanish configuration.
SELECT 'El rápido zorro marrón'::tsvector('spanish');
This vectorizes the text considering the nuances of the Spanish language.
Conclusion
Full text search with tsvector and tsquery offers robust search capabilities in PostgreSQL. Mastering these tools allows for high precision and performance when querying across large text datasets. By implementing proper indexing and understanding the versatile functions available, developers can build complex search functionalities suitable for modern applications.