Pandas: How to iterate over rows in a DataFrame (6 examples)

Introduction
Setting Up Your DataFrame
Example 1: Iterating with iterrows()
Example 2: Using itertuples()
Example 3: Apply Functions
Example 4: Vectorized Operations
Example 5: Using applymap() for Element-wise Operations
Example 6: The transform() Method
Conclusion

Introduction

In data analysis and manipulation with Python, Pandas is one of the most popular libraries due to its powerful and flexible data structures. A common task you may encounter is the need to iterate over rows in a DataFrame. This can be for data transformation, analysis, or even generating insights. In this tutorial, we’ll explore six methods to iterate over rows in a Pandas DataFrame, ranging from basic to advanced techniques.

Setting Up Your DataFrame

Before diving into the examples, let’s set up a simple DataFrame to use throughout this tutorial:

import pandas as pd
data = {
  'Name': ['John', 'Anna', 'Peter', 'Linda'],
  'Age': [28, 34, 29, 32],
  'City': ['New York', 'Paris', 'Berlin', 'London']
}
df = pd.DataFrame(data)
print(df)

This DataFrame contains names, ages, and cities of four individuals:

    Name  Age      City
0   John   28  New York
1   Anna   34     Paris
2  Peter   29    Berlin
3  Linda   32    London

Example 1: Iterating with `iterrows()`

One of the simplest ways to iterate over DataFrame rows is by using the iterrows() method. This yields the index and row data as a Series for each row.

for index, row in df.iterrows():
    print(index, row["Name"], row["Age"], row["City"])
    print('---') # Add a separator between rows

Output:

0 John 28 New York
---
1 Anna 34 Paris
---
2 Peter 29 Berlin
---
3 Linda 32 London
---

This method is particularly useful for quick inspections or operations that do not require vectorized operations for performance gains.

Example 2: Using `itertuples()`

The itertuples() method is a faster alternative to iterrows() and returns named tuples of the data.

for row in df.itertuples():
    print(row.Index, row.Name, row.Age, row.City)

This approach is usually faster than iterrows() but keep in mind that it does not allow modifications to the DataFrame directly within the loop.

Example 3: Apply Functions

The apply() method is very powerful for applying a function along an axis of the DataFrame (rows in this case).

df.apply(lambda x: print(x['Name'], x['Age'], x['City']), axis=1)

This way is more Pandas-centric and can leverage internal optimizations.

Output:

John 28 New York
Anna 34 Paris
Peter 29 Berlin
Linda 32 London

Example 4: Vectorized Operations

For purely computational tasks, direct vectorized operations on columns are preferred due to their high efficiency. Here’s an example:

df['Age_plus_one'] = df['Age'] + 1
print(df)

This operation adds 1 to each value in the ‘Age’ column without explicitly iterating over each row.

Example 5: Using `applymap()` for Element-wise Operations

While not strictly for row operations, applymap() is great for element-wise operations on a DataFrame. If your task requires individual transformations per element, consider this:

df[['Name', 'City']].applymap(str.upper)

This converts all strings in the ‘Name’ and ‘City’ columns to uppercase.

Example 6: The `transform()` Method

Another sophisticated method for row-wise operations is using transform(), which allows you to perform a function on each element in the row, but with the ability to retain the original shape of the DataFrame.

df['Name_length'] = df['Name'].transform(lambda x: len(x))
print(df)

This adds a column showing the length of each name. It’s particularly useful for more complex data transformations within groups.

Conclusion

Iterating over rows in a DataFrame is a common task in data analysis with Pandas. The method you choose depends on the specific requirements of your task, such as the need for speed, simplicity, or direct data modification. Understanding these six methods provides a robust toolkit for handling various data iteration and transformation tasks effectively.

Next Article: Is it possible to use async/await in Pandas?

Previous Article: Pandas: What is a MultiIndex and how to create one

Series: DateFrames in Pandas

Pandas