How to Use Pydantic with Regular Expressions (2 Examples)

Updated: December 1, 2023 By: Khue Post a comment

This concise, practical article walks you through a couple of different ways to use Pydantic with regular expressions to validate data in Python. Before getting started, make sure you have Pydantic 2.x (the latest version).

Using constr type with pattern argument

This approach uses the constr type from Pydantic, which is a constrained string type that can accept a pattern argument to specify the regex pattern to match.

In the coming example, we’ll create a Pydantic model that presents a website, including a name and a URL. We use constr and a regular expression to define a type that matches any URL starting with http or https as shown below:

MyUrlsType = constr(pattern="^https?://.*$")

Example:

from pydantic import BaseModel, constr

# Define a custom type with constr and a regex pattern
MyUrlsType = constr(pattern="^https?://.*$")


# Define a Pydantic model with a field that uses the custom type
class MyModel(BaseModel):
    url: MyUrlsType
    name: str

Let’s try it with a valid URL:

site1 = MyModel(
    name="Sling Academy",
    url="https://www.slingacademy.com"
)
print(site1)
# Output: url='https://www.slingacademy.com' name='Sling Academy'

Then see how our model deals with an invalid URL:

site2 = MyModel(
    name="Test Website",
    url="abc://example.com"
)
print(site2)

# ValidationError: 1 validation error for MyModel
# url
#   String should match pattern '^https?://.*$' [type=string_pattern_mismatch, input_value='abc://example.com', input_type=str]

This solution is simple and easy to implement. It leverages the power of Pydantic’s validation and parsing features, such as automatic error messages, coercion, and serialization. However, it may not work well with complex regular expressions that have multiple groups or modifiers, as Pydantic does not support some advanced regex patterns.

Using the field_validator decorator and the re module

This solution uses the field_validator decorator from Pydantic (only available in Pydantic 2.x), which allows you to define custom validation functions for your fields. It also uses the re module from the Python standard library, which provides functions for working with regular expressions.

Let’s say you want to validate a data model that represents a person’s name and age. You want to make sure that the name is not empty and does not contain any numbers or symbols, and that the age is a positive integer between 1 and 120. Here are the steps to achieve the goal:

  1. Import the necessary modules from Pydantic and the re module. You will need BaseModel and  field_validator from Pydantic, and the re module for regular expression operations.
  2. Define a regex pattern for names that only allows letters and spaces. For example, you can use the following regex: r"^[A-Za-z ]+$".
  3. Define your data model class that inherits from the BaseModel class. Use the str type annotation for your name field and the int type annotation for your age field.
  4. Define a validator function for each field using the @field_validator decorator. Use the re.fullmatch function to check if the name field value matches the name regex pattern. Use a simple if statement to check if the age field value is within the desired range. If the value does not pass the validation, raise a ValueError with a custom message. If it passes, return the value.
  5. Create an instance of your data model class with some data and check if it is valid or not. If the data does not pass the validation, Pydantic will show some useful error messages.

Complete code:

# Import modules
from pydantic import BaseModel, field_validator
import re

# Define a regex pattern for names
NAME_REGEX = r"^[A-Za-z ]+$"


# Define a data model class
class Person(BaseModel):
    name: str
    age: int

    # Define a validator function for the name field
    @field_validator("name")
    def validate_name(cls, value):
        # Use re.fullmatch to check if the value matches the name regex pattern
        if not re.fullmatch(NAME_REGEX, value):
            # Raise a ValueError if the value does not match the pattern
            raise ValueError(f"{value} is not a valid name")
        # Return the value if it matches the pattern
        return value

    # Define a validator function for the age field
    @field_validator("age")
    def validate_age(cls, value):
        # Check if the value is within the desired range
        if not (1 <= value <= 120):
            # Raise a ValueError if the value is out of range
            raise ValueError(f"{value} is not a valid age")
        # Return the value if it is within range
        return value


# Create an instance of data model class with valid data
try:
    person1 = Person(name="Turtle", age=45)
    print(person1)
except ValueError as e:
    print(e)

# Create an instance of data model class with invalid data
try:
    person2 = Person(name="Puppy", age=999)
    print(person2)
except ValueError as e:
    print(e)
    # 1 validation error for Person
    # age
    #   Value error, 999 is not a valid age [type=value_error, input_value=999, input_type=int]

This approach is verbose and repetitive, as you need to define a separate validation function and decorator for each field that you want to validate. However, it is flexible and powerful, as you can use any regular expression function from the re module and customize the validation logic and error handling.

Conclusion

You’ve learned 2 methods to use Pydantic in combination with regular expressions. Each of them has its own strengths and weaknesses. If your use case is simple, the first one is good to go with. If you situation requires using complex regex patterns, then the second one might be the better choice.

This tutorial ends here. If you find something outdated or incorrect, please let me know by leaving a comment. Happy coding & have a nice day!