This page provides a comprehensive, example-style cheat sheet about regular expressions in Python. I myself use it quite regularly in my work. You can bookmark it to quickly look up as needed. Note that you always have to import the re
module (import re
) before working with regular expressions.
Table of Contents
Regular Expression Patterns
Literal Characters
Match literal characters by including them directly in the pattern:
pattern = r"abc"
Character Classes
Match any character in a specific set using character classes:
pattern = r"[aeiou]" # Matches any vowel
pattern = r"[A-Z]" # Matches any uppercase letter
pattern = r"[0-9]" # Matches any digit
pattern = r"[^0-9]" # Matches any character except digits
pattern = r"[A-Za-z]" # Matches any letter
Shorthand Character Classes
Use shorthand character classes for common patterns:
pattern = r"\d" # Matches any digit [0-9]
pattern = r"\D" # Matches any non-digit character [^0-9]
pattern = r"\w" # Matches any alphanumeric character [a-zA-Z0-9_]
pattern = r"\W" # Matches any non-alphanumeric character [^a-zA-Z0-9_]
pattern = r"\s" # Matches any whitespace character [ \t\n\r\f\v]
pattern = r"\S" # Matches any non-whitespace character [^ \t\n\r\f\v]
Quantifiers
Specify the number of repetitions using quantifiers:
pattern = r"a+" # Matches one or more occurrences of "a"
pattern = r"a*" # Matches zero or more occurrences of "a"
pattern = r"a?" # Matches zero or one occurrence of "a"
pattern = r"a{2,4}" # Matches 2 to 4 occurrences of "a"
pattern = r"a{2,}" # Matches 2 or more occurrences of "a"
pattern = r"a{,4}" # Matches up to 4 occurrences of "a"
Greedy and Non-Greedy Matching
Quantifiers are greedy by default but can be made non-greedy using ?
:
pattern = r"<.*>" # Greedy matching
pattern = r"<.*?>" # Non-greedy matching
Anchors
Use anchors to match patterns at specific positions:
pattern = r"^start" # Matches "start" at the beginning of a string
pattern = r"end$" # Matches "end" at the end of a string
pattern = r"\bword\b" # Matches "word" as a whole word
Word Boundaries
Use \b
to match word boundaries:
pattern = r"\btest\b" # Matches "test" as a whole word
Capturing Groups
Use parentheses to create groups and capture portions of the match:
pattern = r"(ab)+" # Matches one or more occurrences of "ab"
pattern = r"(a|b)" # Matches either "a" or "b"
pattern = r"(?P<name>\w+)" # Matches and captures alphanumeric sequences with the name "name"
Non-Capturing Groups
Use (?:…)
to create non-capturing groups:
pattern = r"(?:ab)+" # Matches one or more occurrences of "ab" without capturing
Lookahead and Lookbehind Assertions
Use lookahead and lookbehind assertions to match patterns without including them in the result:
pattern = r"foo(?=bar)" # Matches "foo" only if followed by "bar"
pattern = r"(?<=foo)bar" # Matches "bar" only if preceded by "foo"
pattern = r"(?!bar)foo" # Matches "foo" only if not followed by "bar"
pattern = r"(?<!foo)bar" # Matches "bar" only if not preceded by "foo"
Flags
Flags modify the behavior of the regex matching. Common flags include re.IGNORECASE
for case-insensitive matching and re.MULTILINE
for matching across multiple lines:
pattern = r"abc"
result = re.search(pattern, string, flags=re.IGNORECASE)
Matching Patterns
Basic Matching
The match()
function attempts to match a pattern at the beginning of a string:
result = re.match(pattern, string)
Searching for Patterns
The search()
function searches for a pattern anywhere in the string:
result = re.search(pattern, string)
Finding all Matches
The findall()
function returns all non-overlapping matches of a pattern in a string:
matches = re.findall(pattern, string)
Iterating Over Matches
The finditer()
function returns an iterator yielding match objects for all matches:
matches = re.finditer(pattern, string)
for match in matches:
# Process each match
Splitting Strings
The split()
function splits a string by the occurrences of a pattern:
result = re.split(pattern, string)
Replacing Patterns
The sub()
function replaces all occurrences of a pattern in a string with a new substring:
result = re.sub(pattern, replacement, string)
Getting Match Information
Match objects provide information about the match:
match = re.search(pattern, string)
match.group() # Returns the matched substring
match.start() # Returns the start position of the match
match.end() # Returns the end position of the match
Example Usage
import re
# Matching patterns
result = re.match(r"abc", "abcdef") # Matches "abc"
result = re.search(r"def", "abcdef") # Matches "def"
matches = re.findall(r"\d+", "A10 B20 C30") # Matches ["10", "20", "30"]
matches = re.finditer(r"\d+", "A10 B20 C30") # Returns an iterator of match objects
# Splitting strings
result = re.split(r"\s", "Hello World") # Splits the string into ["Hello", "World"]
# Replacing patterns
result = re.sub(r"\d", "*", "A10 B20 C30") # Replaces digits with "*" -> "A** B** C**"
# Getting match information
match = re.search(r"abc", "abcdef")
match.group() # Returns the matched substring "abc"
match.start() # Returns the start position of the match 0
match.end() # Returns the end position of the match 3
Afterword
This comprehensive cheat sheet covers various aspects of regular expressions in Python, including pattern matching, searching, splitting, replacing, extracting match information, and advanced features like lookaheads and lookbehinds. It provides examples and explanations for literal characters, character classes, quantifiers, anchors, escape sequences, groups, flags, and more. Feel free to refer to this cheat sheet as a quick reference when working with regular expressions in Python!