Python: How to Validate Data in Dataclass

Updated: March 1, 2023 By: Frienzied Flame Post a comment

This concise example-based article shows you how to validate data with Python dataclasses.

Basic Example

Data validation is an essential part of any data processing system. It ensures that the data received by the system is correct and in the expected format. Python’s dataclass provides an easy way to validate data during object initialization. Let’s see how it’s done.

In the following example, we are going to define a dataclass named Person with 2 attributes: name and age. Our goal is to implement validation logic to ensure that the age cannot be outside the range of 0 to 150.

Step 1 – Defining Dataclass

Using the @dataclass decorator:

from dataclasses import dataclass

@dataclass
class Person:
    name: str
    age: int

Step 2 – Validating Data during Initialization

You can add custom validation logic by adding a __post_init__() method in the class. This method is called after the object is initialized with the given values. You can raise an exception if the data is not in the expected format:

from dataclasses import dataclass

@dataclass
class Person:
    name: str
    age: int

    def __post_init__(self):
        if not isinstance(self.name, str):
            raise TypeError('Name should be of type str')

        if not isinstance(self.age, int):
            raise TypeError('Age should be of type int')

        if self.age < 0 or self.age > 150:
            raise ValueError('Age must be between 0 and 150')

Step 3 – Test It

Try to create a person whose age is 160:

person = Person('John', 160)

And you will get this error:

ValueError: Age must be between 0 and 150

Let’s try to initialize another person object with a non-string name:

person = Person(123, 123)

And you will run into this:

TypeError: Name should be of type str

Advanced Example

This is a real-life example that you might face when building registration and login-related systems. We will create a dataclass User with 2 fields: email and password. Our goal is to make sure:

  • Email must have the correct format (we will use regular expressions for this)
  • Password must be between 6 and 12 characters in length

The code:

from dataclasses import dataclass
import re

@dataclass
class User:
    email: str
    password: str
    
    def __post_init__(self):
        # Validate email
        if not re.match(r"[^@]+@[^@]+\.[^@]+", self.email):
            raise ValueError("Invalid email address.")
        
        # Validate password length
        if not 8 <= len(self.password) <= 12:
            raise ValueError("Password length should be between 8 and 12 characters.")

Now, let’s create a User object with an invalid email address and see how the validation works:

u = User(email="test@slingacademy", password="password123")

Output:

ValueError: Invalid email address.

What about an invalid password?

u = User(email="[email protected]", password="1234")

You will get ValueError:

ValueError: Password length should be between 8 and 12 characters.

Let’s do the right thing:

u = User(email="[email protected]", password="password123")
print(u)

And we pass the validation:

User(email='[email protected]', password='password123')

That’s it. Happy coding and have a nice day!