PostgreSQL SIMILAR TO operator

Updated: January 6, 2024 By: Guest Contributor Post a comment

Introduction

The PostgreSQL SIMILAR TO operator is a powerful tool for pattern matching. It allows you to define a pattern using SQL standard’s regular expressions to match data within your database tables. Whether you’re a seasoned DBA or a novice SQL user, understanding SIMILAR TO can enhance your data querying abilities.

Basic Usage of SIMILAR TO

Firstly, let’s look at the syntax: value SIMILAR TO pattern. This operator returns true if the value matches the pattern.

Here’s the simplest example:

SELECT 'abcdef' SIMILAR TO 'abc%';
-- Returns true

In this case, the % symbol is a wildcard that represents any sequence of characters. Therefore, the pattern ‘abc%’ will match any string that begins with ‘abc’.

Including Specific Characters

You can specify the inclusion of specific characters in your pattern:

SELECT 'abc123' SIMILAR TO 'abc_123';
-- Returns true

The _ in the pattern is a placeholder for a single character. In this case, it matches the number 1.

Using Character Classes

SIMILAR TO also allows for the use of character classes to match one character out of a set of characters:

SELECT 'a2' SIMILAR TO 'a[0-9]';
-- Returns true

The pattern 'a[0-9]' will match any string that starts with ‘a’ followed by any digit from 0 to 9.

Repetition Patterns

If you need to match a specific number of repetitions, you can use:

SELECT 'aaaa' SIMILAR TO 'a{4}';
-- Returns true

The {4} means exactly four repetitions of the character ‘a’. Similarly, you can define a range of repetitions:

SELECT 'aaa' SIMILAR TO 'a{2,4}';
-- Returns true

It matches strings that contain from two up to four ‘a’ characters consecutively.

Complex Patterns

The real power comes when combining these notations to create more complex patterns:

SELECT 'a1b2c3' SIMILAR TO 'a[1-9]b[0-9]c[0-9]';
-- Returns true

This pattern ensures the string starts with ‘a’, followed by a digit 1-9, then ‘b’, any digit, ‘c’, and any digit as well.

Using the OR Operator

You can also use the pipe symbol (|) as an OR operator to match multiple patterns:

SELECT 'axy' SIMILAR TO 'a(x|y)';
-- Returns true

This matches strings that start with ‘a’ and are followed by either ‘x’ or ‘y’.

Escaping Special Characters

To match special characters literally in your patterns, you need to escape them using the backslash (\):

SELECT 'a%b' SIMILAR TO 'a\%b';
-- Returns true

This will match a string that has ‘a’ followed by an actual percent symbol, followed by ‘b’.

Advanced Example: Email Matching

Let’s apply what we’ve learned to a practical example – verifying simple email patterns:

SELECT '[email protected]' SIMILAR TO '%_@__%.__%' AS is_email;
-- Returns true

This SIMILAR TO pattern checks for a common email structure: anything before the ‘@’, followed by anything with at least one character before and after a ‘.’, and ending in anything.

Advanced Repetition and Class Usage

For a stricter email validation:

SELECT '[email protected]' SIMILAR TO '[A-Za-z0-9._%-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,4}' AS is_email;
-- Returns true

This ensures the local part of the email contains one or more of the allowed characters, and the domain part includes only alphanumeric characters, hyphens or periods, followed by a domain extension that’s between 2 to 4 characters long.

Performance Considerations

While SIMILAR TO is very flexible, it can be less efficient than other predicates like LIKE. Use it wisely, particularly with large datasets, as complex patterns can increase the query execution time.

Conclusion

The SIMILAR TO operator in PostgreSQL is a nuanced feature tailored for pattern matching. It finds its strength in the expressive power of SQL standard’s regular expressions, allowing for precise and flexible data queries. Whether validating emails or filtering data based on complex patterns, SIMILAR TO is an invaluable tool in the PostgreSQL arsenal. Like any feature, it should be used appropriately to balance expressive queries and performance demands.