Using Regular Expressions in PostgreSQL: Basic to Advanced Examples

Updated: January 5, 2024 By: Guest Contributor Post a comment

From basic string searches to complex pattern matching, mastering regular expressions in PostgreSQL can dramatically improve your data querying capabilities. This guide explores how you can leverage regex for efficient database management.

Introduction

Regular expressions, or regex, are a powerful tool for pattern matching and searching within text. In PostgreSQL, regex can be used to perform sophisticated queries and manipulate string data in ways simple SQL cannot.

Basic Regex Usage

Before diving into advanced examples, it’s important to understand the basics. PostgreSQL offers several operators and functions for regex. Let’s start with the ~, ~*, !~, and !~* regex operators:

SELECT 'PostgreSQL' ~ 'SQL';  -- returns true
SELECT 'PostgreSQL' ~* 'sql'; -- returns true (case-insensitive)
SELECT 'PostgreSQL' !~ 'SQL';  -- returns false
SELECT 'PostgreSQL' !~* 'sql'; -- returns false (case-insensitive)

The above examples show how to match strings and the use of case-sensitive and case-insensitive patterns.

Matching Specific Characters

Let’s now look at matching specific characters or sets of characters in a string:

SELECT 'a100' ~ '^[a-zA-Z]+\d+

This demonstrates a pattern that starts with a letter followed by one or more digits.

Using Regex to Filter Queries

Regex is especially useful in WHERE clauses to filter rows based on complex string patterns:

-- Retrieve records where email ends with 'example.com'
SELECT * FROM users WHERE email ~* '\.example\.com

Here we escape the dot character to match a literal period, then use the $ anchor to specify the end of the string.

Advanced Regex Functions

For more advanced patterns, PostgreSQL offers regex functions like REGEXP_REPLACE and REGEXP_MATCHES:

-- Replace all occurrences of 'abc' with 'xyz'
SELECT REGEXP_REPLACE('abc123abc', '(abc)', 'xyz', 'g');

-- Find matches of a pattern within a string
SELECT REGEXP_MATCHES('abc123', '[a-zA-Z]+');

REGEXP_REPLACE allows for replacing matched patterns with a new string, and the ‘g’ flag indicates all occurrences. REGEXP_MATCHES returns the matched substrings as text arrays.

Advanced Pattern Matching

Relying on more complex regex functionalities can solve complicated tasks:

-- Extract all words from the string, returning each word as an array element
SELECT REGEXP_MATCHES('PostgreSQL is awesome! 123', '\\w+', 'g');

The above example uses the word-boundary \w+ pattern to extract whole words, illustrating the power of regex in pattern recognition.

Lookarounds in PostgreSQL Regex

Lookahead and lookbehind assertions allow for matches based on what is or isn’t followed or preceded by a specific pattern:

-- Select strings that have 'pq' followed by 'rs', but 'rs' is not included in the match
SELECT substring('abc pqrstu' from 'pq(?=rs)');

This is an example of a positive lookahead assertion using (?=).

Combining Regex with Other SQL Clauses

One powerful aspect of PostgreSQL’s regex support is the ability to combine it with various SQL clauses and functions:

-- Count the occurrences of the word 'PostgreSQL' across all rows in a text column
SELECT SUM((LENGTH(description) - LENGTH(REPLACE(description, 'PostgreSQL', ''))) / LENGTH('PostgreSQL')) as count
FROM articles;

This combines LENGTH, REPLACE, and arithmetic operations to count the number of times a particular word appears.

Error Handling with Regex in PostgreSQL

When using regex, it’s important to handle errors that can arise from invalid patterns:

SET client_min_messages TO WARNING; -- Avoid stopping the transaction
SELECT 'abc' ~* '[';  -- This would typically raise an error

Handling invalid patterns ensures query execution doesn’t break from avoidable mistakes.

Conclusion

Mastery of regex in PostgreSQL opens up a world of possibilities for data searching and manipulation. Whether you need basic string comparison or complex pattern extraction, regex is an indispensable tool in the hands of a skillful PostgreSQL user. Always test your regex patterns and use the database functions that offer error handling to avoid unexpected results when running queries.