PostgreSQL Full-Text Search: Working with `dictionary` Configurations

PostgreSQL is a powerful, open-source database system known for its advanced features and extensibility. One such feature is its full-text search capabilities, which provide efficient search functionalities for textual data stored in PostgreSQL databases. A crucial component of optimizing full-text search in PostgreSQL is working with dictionary configurations. Dictionaries in PostgreSQL are used to define how words are recognized and processed during search operations. This article will guide you through setting up and working with dictionary configurations in PostgreSQL full-text search.

Understanding Dictionaries in PostgreSQL
Types of Dictionaries
Creating and Using Dictionaries
Testing Full-Text Search with Custom Configurations
Benefits of Custom Dictionary Configurations

Understanding Dictionaries in PostgreSQL

In PostgreSQL, a dictionary is responsible for the linguistic processing of text, such as tokenization and normalization. Dictionaries help to remove noise, handle synonyms, and deal with stemming variations of words. Different types of dictionaries cater to different linguistic requirements, like recognizing stop words or performing stemming.

Types of Dictionaries

Simple: Removes stop words but doesn’t perform stemming.
Snowball: Performs stemming for languages supported by the Snowball project.
Thesaurus: Handles synonyms by mapping a set of terms to a single output.
Unaccent: Handles removal of accents for diacritical-sensitive text matching.

Creating and Using Dictionaries

Let's walk through the process of configuring and using dictionaries in PostgreSQL.

Step 1: Check Existing Dictionaries

Before creating a new dictionary, it's important to review existing ones to avoid duplicates. You can list all available dictionaries by using the following SQL query:


SELECT * FROM pg_catalog.pg_ts_dict;

Step 2: Creating a Custom Dictionary

Suppose you want to create a custom dictionary for the English language that uses Snowball stemming. First, define the dictionary with the SQL command:


CREATE TEXT SEARCH DICTIONARY english_stem (
  TEMPLATE = snowball,
  LANGUAGE = 'english'
);

Replacing English words with their stems can improve search accuracy and relevance by matching various forms of a word.

Step 3: Integrate the Dictionary into a Text Search Configuration

PostgreSQL allows the integration of the dictionary into configurations, which dictate how text transformations should be applied. Here’s an example of creating a custom text search configuration:


CREATE TEXT SEARCH CONFIGURATION my_search_conf (
  COPY = pg_catalog.english
);
ALTER TEXT SEARCH CONFIGURATION my_search_conf
  ALTER MAPPING FOR asciiword, asciihword, hword_asciipart, word, hword, hword_part
  WITH english_stem;

This command integrates your custom dictionary english_stem into the search configuration, essentially applying stemming to appropriate word tokens.

Testing Full-Text Search with Custom Configurations

With the custom configuration set up, search queries can now leverage the enhanced processing power. Let’s see an example of performing a search:


SELECT to_tsvector('my_search_conf', 'running run ran runner') @@ to_tsquery('my_search_conf', 'run');

The above query normalizes various forms of "run" and checks if they are part of the text vector, yielding true if a match is found.

Benefits of Custom Dictionary Configurations

Utilizing dictionary configurations can vastly enhance the performance and relevance of full-text searches in several significant ways:

Performance: Reduces overhead by eliminating noise and focusing on relevant terms.
Relevance: Increases the matched results' relevance by factoring in different word forms and synonyms.
Customization: Allows you to tailor search queries to specific linguistic needs.

Next Article: How to Debug PostgreSQL Full-Text Search Queries

Previous Article: Filtering Stop Words in PostgreSQL Full-Text Search

Series: PostgreSQL Tutorials: From Basic to Advanced

PostgreSQL