Sling Academy
Home/Pandas/Exploring pandas.DataFrame.itertuples() method (with examples)

Exploring pandas.DataFrame.itertuples() method (with examples)

Last updated: February 22, 2024

Introduction

The pandas.DataFrame.itertuples() method is a powerful and efficient tool for iterating over DataFrame rows in a way that is both memory-friendly and faster than traditional methods like iterrows(). In this tutorial, we will explore six examples that showcase the range of applications for the itertuples() method, moving from basic to advanced use cases.

What does itertuples() return?

Before diving into the examples, let’s discuss what itertuples() is and how it’s different from other iteration methods. itertuples() returns an iterator yielding a named tuple for each row in the DataFrame. The column values are accessible through attributes with their names. This method offers a balance between ease of use and performance, making it suitable for many data processing tasks.

Basic Usage

Example 1: Iterating through rows

import pandas as pd
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
for row in df.itertuples(index=False):
    print(row)

Output:

Pandas(Index=0, A=1, B=4)
Pandas(Index=1, A=2, B=5)
Pandas(Index=2, A=3, B=6)

This example demonstrates the simplest use case of itertuples(), printing each row’s contents as a named tuple.

Accessing Data by Column Name

Example 2: Individual column values

for row in df.itertuples():
    print(f'A: {row.A}, B: {row.B}')

Notably, using the attribute access enabled by named tuples makes the code more readable and maintains a direct mapping to DataFrame columns.

Performance Comparison

Example 3: Comparing with iterrows()

import timeit

code_itertuples = '''
import pandas as pd
df = pd.DataFrame({'A': [10, 20, 30], 'B': [40, 50, 60]})
for row in df.itertuples():
    pass
'''

code_iterrows = '''
import pandas as pd
df = pd.DataFrame({'A': [10, 20, 30], 'B': [40, 50, 60]})
for _, row in df.iterrows():
    pass
'''

itertuples_time = timeit.timeit(stmt=code_itertuples, number=1000)
iterrows_time = timeit.timeit(stmt=code_iterrows, number=1000)
print(f'itertuples: {itertuples_time}, iterrows: {iterrows_time}')

Results showcase itertuples()’s efficiency advantage over iterrows(), highlighting its suitability for large-scale data processing tasks.

Handling Missing Data

Example 4: Handling NaN values

df = pd.DataFrame({'A': [1, pd.NA, 3], 'B': [4, 5, None]})
for row in df.itertuples():
    A_value = 0 if pd.isna(row.A) else row.A
    print(f'A: {A_value}, B: {row.B}')

This example shows how to gracefully handle missing data within the iteration, ensuring data integrity in subsequent processing steps.

Advanced Data Manipulation

Example 5: Aggregating Data

totals = {}
for row in df.itertuples():
    if row.A not in totals:
        totals[row.A] = row.B
    else:
        totals[row.A] += row.B
print(totals)

This example illustrates a simple way to aggregate data by a specific column during iteration, showcasing itertuples()’s utility in more complex data manipulation tasks.

Integrating with External Systems

Example 6: Database Insertions

import sqlite3
df = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
conn = sqlite3.connect('example.db')
cursor = conn.cursor()
for row in df.itertuples(index=False):
    cursor.execute('INSERT INTO table_name (A, B) VALUES (?, ?)', (row.A, row.B))
conn.commit()

This advanced example demonstrates how itertuples() can be utilized in integrating DataFrame data with external systems like databases, showcasing its versatility beyond mere data processing.

Conclusion

The pandas.DataFrame.itertuples() method offers a performant and user-friendly avenue for DataFrame row iteration, accommodating a broad spectrum of data processing and manipulation tasks. Whether for basic data exploration or complex integrations, itertuples() provides a robust foundation for efficient and effective data handling operations.

Next Article: Pandas: How to drop a column from a DataFrame

Previous Article: Understanding pandas.DataFrame.iterrows() method (5 examples)

Series: DateFrames in Pandas

Pandas

You May Also Like

  • How to Use Pandas Profiling for Data Analysis (4 examples)
  • How to Handle Large Datasets with Pandas and Dask (4 examples)
  • Pandas – Using DataFrame.pivot() method (3 examples)
  • Pandas: How to ‘FULL JOIN’ 2 DataFrames (3 examples)
  • Pandas: Select columns whose names start/end with a specific string (4 examples)
  • 3 ways to turn off future warnings in Pandas
  • How to Integrate Pandas with Apache Spark
  • How to Use Pandas for Web Scraping and Saving Data (2 examples)
  • How to Clean and Preprocess Text Data with Pandas (3 examples)
  • Pandas – Using Series.replace() method (3 examples)
  • Pandas json_normalize() function: Explained with examples
  • Pandas: Reading CSV and Excel files from AWS S3 (4 examples)
  • Using pandas.Series.rank() method (4 examples)
  • Pandas: Dropping columns whose names contain a specific string (4 examples)
  • Pandas: How to print a DataFrame without index (3 ways)
  • Fixing Pandas NameError: name ‘df’ is not defined
  • Pandas – Using DataFrame idxmax() and idxmin() methods (4 examples)
  • Pandas FutureWarning: ‘M’ is deprecated and will be removed in a future version, please use ‘ME’ instead
  • Pandas: Checking equality of 2 DataFrames (element-wise)