Introduction
The pandas
library in Python is an indispensable tool for data analysis and manipulation, particularly when dealing with tabular data. Among its vast array of functionalities, the DataFrame.iterrows()
method provides a flexible way to iterate over DataFrame rows as (index, Series) pairs. This tutorial offers a deep dive into understanding and using the iterrows()
method through five illustrative examples, from basic usage to more advanced applications.
Before diving into the examples, it’s important to understand that iterrows()
iterates over DataFrame rows, returning each row as a pandas Series object. This method is not the most efficient way to perform row-wise operations in pandas, especially on large data, due to its inherent row-wise operation nature. However, it is unmatched in terms of flexibility and simplicity when the task at hand does not demand optimized performance.
Example 1: Basic Usage
The following example demonstrates the basic usage of iterrows()
for printing each row in a DataFrame.
import pandas as pd
df = pd.DataFrame({
'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'Occupation': ['Engineer', 'Doctor', 'Artist']
})
for index, row in df.iterrows():
print(index, row['Name'], row['Age'], row['Occupation'])
This outputs:
0 Alice 25 Engineer
1 Bob 30 Doctor
2 Charlie 35 Artist
Example 2: Extracting Specific Data
In this example, we filter out specific rows based on a condition applied to a column. This approach is helpful when you need to work with subsets of data.
import pandas as pd
df = pd.DataFrame(
{
"Name": ["Alice", "Bob", "Charlie"],
"Age": [25, 30, 35],
"Occupation": ["Engineer", "Doctor", "Artist"],
}
)
for index, row in df.iterrows():
if row["Age"] > 30:
print(index, row["Name"], row["Age"])
This outputs:
2 Charlie 35
Example 3: Modifying DataFrame During Iteration
One common but not recommended practice is to modify the DataFrame’s data while iterating over it using iterrows()
. A safer approach is to make modifications in a separate container and then update the DataFrame at once. However, for demonstrative purposes, let’s see a simple modification example.
import pandas as pd
df = pd.DataFrame(
{
"Name": ["Alice", "Bob", "Charlie"],
"Age": [25, 30, 35],
"Occupation": ["Engineer", "Doctor", "Artist"],
}
)
modifications = []
for index, row in df.iterrows():
if row["Age"] > 30:
modifications.append((index, row["Age"] + 5))
for index, new_age in modifications:
df.at[index, "Age"] = new_age
print(df)
Output:
Name Age Occupation
0 Alice 25 Engineer
1 Bob 30 Doctor
2 Charlie 40 Artist
This practice, while it works, could introduce potential risks and inefficiencies and is generally discouraged in favor of vectorized operations or applying functions.
Example 4: Complex Data Processing
In more complex applications, such as applying a function to each row, iterrows()
can be invaluable. Here, we’re computing a new column based on existing data in each row.
import pandas as pd
df = pd.DataFrame(
{
"Name": ["Alice", "Bob", "Charlie"],
"Age": [25, 30, 35],
"Occupation": ["Engineer", "Doctor", "Artist"],
}
)
def compute_bonus(row):
if row["Occupation"] == "Engineer":
return 0.1 * row["Age"]
elif row["Occupation"] == "Doctor":
return 0.15 * row["Age"]
else:
return 0
for index, row in df.iterrows():
df.at[index, "Bonus"] = compute_bonus(row)
print(df)
Output:
Name Age Occupation Bonus
0 Alice 25 Engineer 2.5
1 Bob 30 Doctor 4.5
2 Charlie 35 Artist 0.0
This demonstrates the utility of iterrows()
for complex row-by-row manipulation that might be tedious to implement using vectorized operations.
Example 5: Combining with Other Iteration Tools
Last but certainly not least, iterrows()
can be combined with other Python tools to achieve even more powerful data manipulation. For instance, incorporating iterrows()
with list comprehensions can streamline operations.
import pandas as pd
df = pd.DataFrame(
{
"Name": ["Alice", "Bob", "Charlie"],
"Age": [25, 30, 35],
"Occupation": ["Engineer", "Doctor", "Artist"],
}
)
def compute_bonus(row):
if row["Occupation"] == "Engineer":
return 0.1 * row["Age"]
elif row["Occupation"] == "Doctor":
return 0.15 * row["Age"]
else:
return 0
bonuses = [compute_bonus(row) for _, row in df.iterrows()]
df["Bonus"] = bonuses
print(df)
Output:
Name Age Occupation Bonus
0 Alice 25 Engineer 2.5
1 Bob 30 Doctor 4.5
2 Charlie 35 Artist 0.0
This showcases the versatility of iterrows()
, which, when combined with other Python idioms, can significantly simplify complex data manipulations.
Conclusion
This tutorial has explored the iterrows()
method through practical, escalating examples, demonstrating its flexibility and usefulness in data analysis and manipulation. While it’s important to be aware of its performance characteristics and prefer vectorized operations where possible, iterrows()
remains an invaluable tool for situations where row-wise iteration is essential.