PostgreSQL: REGEXP_MATCHES, REGEXP_REPLACE, REGEXP_SPLIT_TO_ARRAY, REGEXP_SPLIT_TO_TABLE

Updated: January 6, 2024 By: Guest Contributor Post a comment

Introduction

PostgreSQL offers powerful text processing capabilities through its regular expression functions. Among these, REGEXP_MATCHES, REGEXP_REPLACE, REGEXP_SPLIT_TO_ARRAY, and REGEXP_SPLIT_TO_TABLE are particularly useful when dealing with text manipulation and data retrieval. This tutorial will explore these functions in detail, offering code examples and tips to harness their potential effectively.

Using REGEXP_MATCHES

The REGEXP_MATCHES function in PostgreSQL is used to match substrings in a string that fit a specified pattern. This function returns a text array of all the matching substrings.

SELECT REGEXP_MATCHES('The quick brown fox', 'quick|slow');

This query will return {"quick"} as the word “quick” matches the pattern.

Multiple matches:

SELECT REGEXP_MATCHES('Repeat, repeat, repeat...', '\\w+', 'g');

The ‘g’ flag is used for global matches, meaning all occurrences in the text, rather than stopping after the first match. The above query returns an array of words.

Using REGEXP_REPLACE

REGEXP_REPLACE is used to replace substrings matching a specific pattern with another substring.

SELECT REGEXP_REPLACE('One for all, all for one', 'all', 'any');

The substitution will result in ‘One for any, any for one’. To replace all appearances, you use the ‘g’ flag:

SELECT REGEXP_REPLACE('All around the mulberry bush', 'All|all', 'each', 'g');

This will turn all occurrences of ‘All’ or ‘all’ to ‘each’, regardless of their case.

Using REGEXP_SPLIT_TO_ARRAY

REGEXP_SPLIT_TO_ARRAY function divides a string into an array based on a regular expression pattern.

SELECT REGEXP_SPLIT_TO_ARRAY('One,Two,Three,Four', ',');

This results in {"One","Two","Three","Four"}. You can split a sentence into words by splitting on spaces:

SELECT REGEXP_SPLIT_TO_ARRAY('A stitch in time saves nine', '\\s+');

Splits the string by one or more whitespace characters, producing an array of words.

Using REGEXP_SPLIT_TO_TABLE

REGEXP_SPLIT_TO_TABLE is similar to REGEXP_SPLIT_TO_ARRAY but distributes the results across rows rather than as an array.

SELECT REGEXP_SPLIT_TO_TABLE('This is a sentence.', '\\s');

This would result in each word being returned in a separate row instead of appearing in an array.

Splitting a CSV into multiple rows can be performed as follows:

SELECT REGEXP_SPLIT_TO_TABLE('John,Paul,George,Ringo',',');

And it separates each name into different rows.

Combining Functions for Complex Patterns

These functions can be combined to achieve more complex text manipulation. For example, to find and replace a pattern and then split the result into an array, you would use REGEXP_REPLACE nested in REGEXP_SPLIT_TO_ARRAY:

SELECT REGEXP_SPLIT_TO_ARRAY(REGEXP_REPLACE('abc123def456ghi789', '\\d+', '-', 'g'), '-');

This replaces all sequences of digits with hyphens and then splits the string into an array at each hyphen.

Advanced Patterns and Conditions

Using advanced regular expression patterns allows you to deal with more complex string manipulations. Lookaheads, lookbehinds, and capturing groups help you tailor the text processing to your needs:

SELECT REGEXP_REPLACE('Price: $15.99, Tax: $1.44, Total: $17.43', '(\\w+): \$(\\d+.\\d+)', '\\1 is \\2 dollars', 'g');

This sophisticated pattern extracts monetary values and reformats them into a more readable sentence, while retaining the category names.

Performance Considerations

Regular expressions can be computationally expensive. It’s important to consider the performance when running these functions on large datasets or columns with large text strings.

To enhance performance, avoid unnecessary global searches and use simpler patterns where possible. Additionally, utilizing indexing strategies such as expression indexes could help if regular expression queries are frequent on specific columns.

Conclusion

This tutorial covered the basics through to some advanced usage of PostgreSQL’s REGEXP_MATCHES, REGEXP_REPLACE, REGEXP_SPLIT_TO_ARRAY, and REGEXP_SPLIT_TO_TABLE functions. Through careful crafting of regular expressions and understanding the use cases of these functions, you can perform sophisticated and efficient text processing and manipulation within your database.