Using Regular Expressions in MySQL 8: The Complete Guide

Updated: January 31, 2024 By: Guest Contributor Post a comment

Introduction

Regular expressions are a powerful tool for pattern matching and data manipulation. As of MySQL 8, they are fully supported, giving you the capability to perform complex searches and operations directly within your database queries. In this comprehensive guide, we will explore how to leverage regular expressions in MySQL 8.

What can be Done with Regular Expressions?

Before diving into MySQL specifics, it’s important to understand what regular expressions are. Commonly known as regex, regular expressions are sequences of characters that form a search pattern. They can be used to check if a string contains the specified search pattern, replace the found pattern, or split the string. Regular expressions provide a concise and flexible means for “matching” particular characters, words, or patterns of characters.

Regular Expression Syntax in MySQL 8

MySQL uses the extended version of regular expressions that include additional features. Here is a basic guide to the syntax:

  • . – Matches any single character
  • [...] – Matches any single character listed in the brackets
  • [^...] – Matches any single character not listed in the brackets
  • * – Matches zero or more occurrences of the preceding element
  • + – Matches one or more occurrences of the preceding element
  • ? – Matches zero or one occurrence of the preceding element
  • {n} – Matches exactly n occurrences of the preceding element
  • {n,} – Matches at least n occurrences of the preceding element
  • {n,m} – Matches between n and m occurrences of the preceding element, both included
  • ^ – Matches the starting position within the string
  • $ – Matches the ending position of the string or the position just before a string-ending newline

Note: When using regex functions in MySQL, remember to escape special regex characters if they are intended to be used as a literal in your expression.

Regular Expression Functions in MySQL 8

MySQL 8 introduces several functions where regular expressions can be applied which includes REGEXP_LIKE(), REGEXP_INSTR(), REGEXP_SUBSTR(), REGEXP_REPLACE() and others. We will discuss the usage of each function with examples.

REGEXP_LIKE()

The REGEXP_LIKE() function is used to search for a regular expression pattern within a string. If the pattern is found, the function returns 1, otherwise, it returns 0.

-- Example: Find rows with a column that contains the word 'MySQL'

SELECT * FROM your_table WHERE REGEXP_LIKE(your_column, 'MySQL');

REGEXP_INSTR()

This function returns the start position of the first occurrence of the regular expression pattern in the string. If the pattern is not found, it returns 0.

-- Example: Find the start position of 'MySQL' 

SELECT REGEXP_INSTR(your_column, 'MySQL') FROM your_table;

REGEXP_SUBSTR()

This function extracts a substring from a string that matches the regular expression pattern.

-- Example: Extract the first word bounded by non-word characters 

SELECT REGEXP_SUBSTR(your_column, '\y\w+\y') FROM your_table;

REGEXP_REPLACE()

To replace occurrences of a specified pattern in the string with another string, you can use REGEXP_REPLACE().

-- Example: Replace 'MySQL' with 'PostgreSQL' 

SELECT REGEXP_REPLACE(your_column, 'MySQL', 'PostgreSQL') FROM your_table;

Pattern Matching with Sample Data

Let’s walk through detailed examples using a sample dataset. Suppose you have a table ‘users’ with a column ‘bio’ that contains biographical information. You want to perform various searches using regular expressions.

-- Find users who mention 'SQL' or 'data' in their bio 

SELECT * FROM users WHERE REGEXP_LIKE(bio, 'SQL|data');

Here, the vertical bar | signifies a logical OR in the regular expression, looking for ‘SQL’ or ‘data’ in the ‘bio’ column.

-- Extract any email addresses present in the bio 

SELECT REGEXP_SUBSTR(bio, '[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,4}') FROM users;

This expression matches the common pattern for email addresses and extracts them from the bio text.

-- Check if the bio begins with 'I am' and is immediately followed by one or more word characters 

SELECT * FROM users WHERE REGEXP_LIKE(bio, '^I am \w+');

This pattern ensures the bio starts with ‘I am’ and includes at least one word character following it.

Advanced Uses of Regular Expressions

There are even more advanced uses of regular expressions in MySQL, including conditional expressions and filtering based on complex patterns. The REGEXP_REPLACE() function allows not just simple replacements, but can also refer to captured groups in a pattern with the \1, \2, …, syntax.

-- Example: Swap two words around 

SELECT REGEXP_REPLACE(bio, '(\w+) (\w+)', '\2 \1') FROM users;

In the replacement pattern, \1 and \2 refer to the content of the first and second capturing groups, respectively.

Performance Considerations

While regular expressions are incredibly powerful, they come with a cost. Regex operations can be resource-intensive, and it is crucial to be mindful of performance, especially with large databases. It’s recommended to use indexed columns for pattern matching when possible, and avoid overly complex patterns that can reduce query performance.

Here’s an example of a bad regular expression that might cause performance issues:

SELECT * FROM your_table WHERE your_column REGEXP '.*MySQL.*';

Explanation:

  1. Overuse of Wildcards: The .* at the beginning and end of the pattern are wildcards that match any character (including none). This means the regex engine will try to match “MySQL” at every possible position in the string, which is highly inefficient.
  2. No Anchors: Without anchors (like ^ for the start of a string or $ for the end), the regex engine scans the entire string, even if “MySQL” is found at the beginning.
  3. Complex Patterns on Large Text Fields: If your_column contains large text fields, using a regex pattern like this can be particularly costly in terms of performance.
  4. Full Table Scan: This query will likely result in a full table scan, as regular expressions generally prevent the use of indexes.

To improve performance:

  • Be as specific as possible with the pattern.
  • Avoid leading wildcards in patterns if you can.
  • Use simpler string functions like LIKE if regex features are not needed.
  • Consider full-text search capabilities for complex text searching needs.

Conclusion

MySQL 8 greatly enhances database operations with its robust regular expression support. Mastering the use of regex in MySQL challenges you to think about data in new ways, from cleaning up strings to performing intricate searches. By employing the functions and patterns covered in this guide, you’ll be well-equipped to harness the power of regular expressions within your MySQL databases.