Understanding pandas.DataFrame.iterrows() method (5 examples)

Updated: February 19, 2024 By: Guest Contributor Post a comment

Introduction

The pandas library in Python is an indispensable tool for data analysis and manipulation, particularly when dealing with tabular data. Among its vast array of functionalities, the DataFrame.iterrows() method provides a flexible way to iterate over DataFrame rows as (index, Series) pairs. This tutorial offers a deep dive into understanding and using the iterrows() method through five illustrative examples, from basic usage to more advanced applications.

Before diving into the examples, it’s important to understand that iterrows() iterates over DataFrame rows, returning each row as a pandas Series object. This method is not the most efficient way to perform row-wise operations in pandas, especially on large data, due to its inherent row-wise operation nature. However, it is unmatched in terms of flexibility and simplicity when the task at hand does not demand optimized performance.

Example 1: Basic Usage

The following example demonstrates the basic usage of iterrows() for printing each row in a DataFrame.

import pandas as pd

df = pd.DataFrame({
   'Name': ['Alice', 'Bob', 'Charlie'],
   'Age': [25, 30, 35],
   'Occupation': ['Engineer', 'Doctor', 'Artist']
})

for index, row in df.iterrows():
    print(index, row['Name'], row['Age'], row['Occupation'])

This outputs:

0 Alice 25 Engineer
1 Bob 30 Doctor
2 Charlie 35 Artist

Example 2: Extracting Specific Data

In this example, we filter out specific rows based on a condition applied to a column. This approach is helpful when you need to work with subsets of data.

import pandas as pd

df = pd.DataFrame(
    {
        "Name": ["Alice", "Bob", "Charlie"],
        "Age": [25, 30, 35],
        "Occupation": ["Engineer", "Doctor", "Artist"],
    }
)

for index, row in df.iterrows():
    if row["Age"] > 30:
        print(index, row["Name"], row["Age"])

This outputs:

2 Charlie 35

Example 3: Modifying DataFrame During Iteration

One common but not recommended practice is to modify the DataFrame’s data while iterating over it using iterrows(). A safer approach is to make modifications in a separate container and then update the DataFrame at once. However, for demonstrative purposes, let’s see a simple modification example.

import pandas as pd

df = pd.DataFrame(
    {
        "Name": ["Alice", "Bob", "Charlie"],
        "Age": [25, 30, 35],
        "Occupation": ["Engineer", "Doctor", "Artist"],
    }
)

modifications = []
for index, row in df.iterrows():
    if row["Age"] > 30:
        modifications.append((index, row["Age"] + 5))

for index, new_age in modifications:
    df.at[index, "Age"] = new_age

print(df)

Output:

      Name  Age Occupation
0    Alice   25   Engineer
1      Bob   30     Doctor
2  Charlie   40     Artist

This practice, while it works, could introduce potential risks and inefficiencies and is generally discouraged in favor of vectorized operations or applying functions.

Example 4: Complex Data Processing

In more complex applications, such as applying a function to each row, iterrows() can be invaluable. Here, we’re computing a new column based on existing data in each row.

import pandas as pd

df = pd.DataFrame(
    {
        "Name": ["Alice", "Bob", "Charlie"],
        "Age": [25, 30, 35],
        "Occupation": ["Engineer", "Doctor", "Artist"],
    }
)


def compute_bonus(row):
    if row["Occupation"] == "Engineer":
        return 0.1 * row["Age"]
    elif row["Occupation"] == "Doctor":
        return 0.15 * row["Age"]
    else:
        return 0


for index, row in df.iterrows():
    df.at[index, "Bonus"] = compute_bonus(row)

print(df)

Output:

      Name  Age Occupation  Bonus
0    Alice   25   Engineer    2.5
1      Bob   30     Doctor    4.5
2  Charlie   35     Artist    0.0

This demonstrates the utility of iterrows() for complex row-by-row manipulation that might be tedious to implement using vectorized operations.

Example 5: Combining with Other Iteration Tools

Last but certainly not least, iterrows() can be combined with other Python tools to achieve even more powerful data manipulation. For instance, incorporating iterrows() with list comprehensions can streamline operations.

import pandas as pd

df = pd.DataFrame(
    {
        "Name": ["Alice", "Bob", "Charlie"],
        "Age": [25, 30, 35],
        "Occupation": ["Engineer", "Doctor", "Artist"],
    }
)

def compute_bonus(row):
    if row["Occupation"] == "Engineer":
        return 0.1 * row["Age"]
    elif row["Occupation"] == "Doctor":
        return 0.15 * row["Age"]
    else:
        return 0


bonuses = [compute_bonus(row) for _, row in df.iterrows()]
df["Bonus"] = bonuses

print(df)

Output:

      Name  Age Occupation  Bonus
0    Alice   25   Engineer    2.5
1      Bob   30     Doctor    4.5
2  Charlie   35     Artist    0.0

This showcases the versatility of iterrows(), which, when combined with other Python idioms, can significantly simplify complex data manipulations.

Conclusion

This tutorial has explored the iterrows() method through practical, escalating examples, demonstrating its flexibility and usefulness in data analysis and manipulation. While it’s important to be aware of its performance characteristics and prefer vectorized operations where possible, iterrows() remains an invaluable tool for situations where row-wise iteration is essential.