The modern Python regular expressions cheat sheet

Updated: May 25, 2023 By: Goodman Post a comment

This page provides a comprehensive, example-style cheat sheet about regular expressions in Python. I myself use it quite regularly in my work. You can bookmark it to quickly look up as needed. Note that you always have to import the re module (import re) before working with regular expressions.

Regular Expression Patterns

Literal Characters

Match literal characters by including them directly in the pattern:

pattern = r"abc"

Character Classes

Match any character in a specific set using character classes:

pattern = r"[aeiou]"     # Matches any vowel
pattern = r"[A-Z]"       # Matches any uppercase letter
pattern = r"[0-9]"       # Matches any digit
pattern = r"[^0-9]"      # Matches any character except digits
pattern = r"[A-Za-z]"    # Matches any letter

Shorthand Character Classes

Use shorthand character classes for common patterns:

pattern = r"\d"          # Matches any digit [0-9]
pattern = r"\D"          # Matches any non-digit character [^0-9]
pattern = r"\w"          # Matches any alphanumeric character [a-zA-Z0-9_]
pattern = r"\W"          # Matches any non-alphanumeric character [^a-zA-Z0-9_]
pattern = r"\s"          # Matches any whitespace character [ \t\n\r\f\v]
pattern = r"\S"          # Matches any non-whitespace character [^ \t\n\r\f\v]

Quantifiers

Specify the number of repetitions using quantifiers:

pattern = r"a+"          # Matches one or more occurrences of "a"
pattern = r"a*"          # Matches zero or more occurrences of "a"
pattern = r"a?"          # Matches zero or one occurrence of "a"
pattern = r"a{2,4}"      # Matches 2 to 4 occurrences of "a"
pattern = r"a{2,}"       # Matches 2 or more occurrences of "a"
pattern = r"a{,4}"       # Matches up to 4 occurrences of "a"

Greedy and Non-Greedy Matching

Quantifiers are greedy by default but can be made non-greedy using ?:

pattern = r"<.*>"        # Greedy matching
pattern = r"<.*?>"       # Non-greedy matching

Anchors

Use anchors to match patterns at specific positions:

pattern = r"^start"      # Matches "start" at the beginning of a string
pattern = r"end$"        # Matches "end" at the end of a string
pattern = r"\bword\b"    # Matches "word" as a whole word

Word Boundaries

Use \b to match word boundaries:

pattern = r"\btest\b"    # Matches "test" as a whole word

Capturing Groups

Use parentheses to create groups and capture portions of the match:

pattern = r"(ab)+"              # Matches one or more occurrences of "ab"
pattern = r"(a|b)"              # Matches either "a" or "b"
pattern = r"(?P<name>\w+)"      # Matches and captures alphanumeric sequences with the name "name"

Non-Capturing Groups

Use (?:…) to create non-capturing groups:

pattern = r"(?:ab)+"            # Matches one or more occurrences of "ab" without capturing

Lookahead and Lookbehind Assertions

Use lookahead and lookbehind assertions to match patterns without including them in the result:

pattern = r"foo(?=bar)"         # Matches "foo" only if followed by "bar"
pattern = r"(?<=foo)bar"        # Matches "bar" only if preceded by "foo"
pattern = r"(?!bar)foo"         # Matches "foo" only if not followed by "bar"
pattern = r"(?<!foo)bar"        # Matches "bar" only if not preceded by "foo"

Flags

Flags modify the behavior of the regex matching. Common flags include re.IGNORECASE for case-insensitive matching and re.MULTILINE for matching across multiple lines:

pattern = r"abc"
result = re.search(pattern, string, flags=re.IGNORECASE)

Matching Patterns

Basic Matching

The match() function attempts to match a pattern at the beginning of a string:

result = re.match(pattern, string)

Searching for Patterns

The search() function searches for a pattern anywhere in the string:

result = re.search(pattern, string)

Finding all Matches

The findall() function returns all non-overlapping matches of a pattern in a string:

matches = re.findall(pattern, string)

Iterating Over Matches

The finditer() function returns an iterator yielding match objects for all matches:

matches = re.finditer(pattern, string)
for match in matches:
    # Process each match

Splitting Strings

The split() function splits a string by the occurrences of a pattern:

result = re.split(pattern, string)

Replacing Patterns

The sub() function replaces all occurrences of a pattern in a string with a new substring:

result = re.sub(pattern, replacement, string)

Getting Match Information

Match objects provide information about the match:

match = re.search(pattern, string)
match.group()   # Returns the matched substring
match.start()   # Returns the start position of the match
match.end()     # Returns the end position of the match

Example Usage

import re

# Matching patterns
result = re.match(r"abc", "abcdef")       # Matches "abc"
result = re.search(r"def", "abcdef")      # Matches "def"
matches = re.findall(r"\d+", "A10 B20 C30")  # Matches ["10", "20", "30"]
matches = re.finditer(r"\d+", "A10 B20 C30") # Returns an iterator of match objects

# Splitting strings
result = re.split(r"\s", "Hello World")    # Splits the string into ["Hello", "World"]

# Replacing patterns
result = re.sub(r"\d", "*", "A10 B20 C30")  # Replaces digits with "*" -> "A** B** C**"

# Getting match information
match = re.search(r"abc", "abcdef")
match.group()   # Returns the matched substring "abc"
match.start()   # Returns the start position of the match 0
match.end()     # Returns the end position of the match 3

Afterword

This comprehensive cheat sheet covers various aspects of regular expressions in Python, including pattern matching, searching, splitting, replacing, extracting match information, and advanced features like lookaheads and lookbehinds. It provides examples and explanations for literal characters, character classes, quantifiers, anchors, escape sequences, groups, flags, and more. Feel free to refer to this cheat sheet as a quick reference when working with regular expressions in Python!