Sling Academy
Home/Pandas/Understanding pandas.DataFrame.iterrows() method (5 examples)

Understanding pandas.DataFrame.iterrows() method (5 examples)

Last updated: February 19, 2024

Introduction

The pandas library in Python is an indispensable tool for data analysis and manipulation, particularly when dealing with tabular data. Among its vast array of functionalities, the DataFrame.iterrows() method provides a flexible way to iterate over DataFrame rows as (index, Series) pairs. This tutorial offers a deep dive into understanding and using the iterrows() method through five illustrative examples, from basic usage to more advanced applications.

Before diving into the examples, it’s important to understand that iterrows() iterates over DataFrame rows, returning each row as a pandas Series object. This method is not the most efficient way to perform row-wise operations in pandas, especially on large data, due to its inherent row-wise operation nature. However, it is unmatched in terms of flexibility and simplicity when the task at hand does not demand optimized performance.

Example 1: Basic Usage

The following example demonstrates the basic usage of iterrows() for printing each row in a DataFrame.

import pandas as pd

df = pd.DataFrame({
   'Name': ['Alice', 'Bob', 'Charlie'],
   'Age': [25, 30, 35],
   'Occupation': ['Engineer', 'Doctor', 'Artist']
})

for index, row in df.iterrows():
    print(index, row['Name'], row['Age'], row['Occupation'])

This outputs:

0 Alice 25 Engineer
1 Bob 30 Doctor
2 Charlie 35 Artist

Example 2: Extracting Specific Data

In this example, we filter out specific rows based on a condition applied to a column. This approach is helpful when you need to work with subsets of data.

import pandas as pd

df = pd.DataFrame(
    {
        "Name": ["Alice", "Bob", "Charlie"],
        "Age": [25, 30, 35],
        "Occupation": ["Engineer", "Doctor", "Artist"],
    }
)

for index, row in df.iterrows():
    if row["Age"] > 30:
        print(index, row["Name"], row["Age"])

This outputs:

2 Charlie 35

Example 3: Modifying DataFrame During Iteration

One common but not recommended practice is to modify the DataFrame’s data while iterating over it using iterrows(). A safer approach is to make modifications in a separate container and then update the DataFrame at once. However, for demonstrative purposes, let’s see a simple modification example.

import pandas as pd

df = pd.DataFrame(
    {
        "Name": ["Alice", "Bob", "Charlie"],
        "Age": [25, 30, 35],
        "Occupation": ["Engineer", "Doctor", "Artist"],
    }
)

modifications = []
for index, row in df.iterrows():
    if row["Age"] > 30:
        modifications.append((index, row["Age"] + 5))

for index, new_age in modifications:
    df.at[index, "Age"] = new_age

print(df)

Output:

      Name  Age Occupation
0    Alice   25   Engineer
1      Bob   30     Doctor
2  Charlie   40     Artist

This practice, while it works, could introduce potential risks and inefficiencies and is generally discouraged in favor of vectorized operations or applying functions.

Example 4: Complex Data Processing

In more complex applications, such as applying a function to each row, iterrows() can be invaluable. Here, we’re computing a new column based on existing data in each row.

import pandas as pd

df = pd.DataFrame(
    {
        "Name": ["Alice", "Bob", "Charlie"],
        "Age": [25, 30, 35],
        "Occupation": ["Engineer", "Doctor", "Artist"],
    }
)


def compute_bonus(row):
    if row["Occupation"] == "Engineer":
        return 0.1 * row["Age"]
    elif row["Occupation"] == "Doctor":
        return 0.15 * row["Age"]
    else:
        return 0


for index, row in df.iterrows():
    df.at[index, "Bonus"] = compute_bonus(row)

print(df)

Output:

      Name  Age Occupation  Bonus
0    Alice   25   Engineer    2.5
1      Bob   30     Doctor    4.5
2  Charlie   35     Artist    0.0

This demonstrates the utility of iterrows() for complex row-by-row manipulation that might be tedious to implement using vectorized operations.

Example 5: Combining with Other Iteration Tools

Last but certainly not least, iterrows() can be combined with other Python tools to achieve even more powerful data manipulation. For instance, incorporating iterrows() with list comprehensions can streamline operations.

import pandas as pd

df = pd.DataFrame(
    {
        "Name": ["Alice", "Bob", "Charlie"],
        "Age": [25, 30, 35],
        "Occupation": ["Engineer", "Doctor", "Artist"],
    }
)

def compute_bonus(row):
    if row["Occupation"] == "Engineer":
        return 0.1 * row["Age"]
    elif row["Occupation"] == "Doctor":
        return 0.15 * row["Age"]
    else:
        return 0


bonuses = [compute_bonus(row) for _, row in df.iterrows()]
df["Bonus"] = bonuses

print(df)

Output:

      Name  Age Occupation  Bonus
0    Alice   25   Engineer    2.5
1      Bob   30     Doctor    4.5
2  Charlie   35     Artist    0.0

This showcases the versatility of iterrows(), which, when combined with other Python idioms, can significantly simplify complex data manipulations.

Conclusion

This tutorial has explored the iterrows() method through practical, escalating examples, demonstrating its flexibility and usefulness in data analysis and manipulation. While it’s important to be aware of its performance characteristics and prefer vectorized operations where possible, iterrows() remains an invaluable tool for situations where row-wise iteration is essential.

Next Article: Exploring pandas.DataFrame.itertuples() method (with examples)

Previous Article: Using pandas.DataFrame.items() method

Series: DateFrames in Pandas

Pandas

You May Also Like

  • How to Use Pandas Profiling for Data Analysis (4 examples)
  • How to Handle Large Datasets with Pandas and Dask (4 examples)
  • Pandas – Using DataFrame.pivot() method (3 examples)
  • Pandas: How to ‘FULL JOIN’ 2 DataFrames (3 examples)
  • Pandas: Select columns whose names start/end with a specific string (4 examples)
  • 3 ways to turn off future warnings in Pandas
  • How to Integrate Pandas with Apache Spark
  • How to Use Pandas for Web Scraping and Saving Data (2 examples)
  • How to Clean and Preprocess Text Data with Pandas (3 examples)
  • Pandas – Using Series.replace() method (3 examples)
  • Pandas json_normalize() function: Explained with examples
  • Pandas: Reading CSV and Excel files from AWS S3 (4 examples)
  • Using pandas.Series.rank() method (4 examples)
  • Pandas: Dropping columns whose names contain a specific string (4 examples)
  • Pandas: How to print a DataFrame without index (3 ways)
  • Fixing Pandas NameError: name ‘df’ is not defined
  • Pandas – Using DataFrame idxmax() and idxmin() methods (4 examples)
  • Pandas FutureWarning: ‘M’ is deprecated and will be removed in a future version, please use ‘ME’ instead
  • Pandas: Checking equality of 2 DataFrames (element-wise)