Pandas: Retrieve the first/last N rows of a DataFrame

Updated: February 19, 2024 By: Guest Contributor Post a comment

Introduction

In data analysis, the initial and final portions of your dataset can provide insightful information about the structure and the potential direction of your investigations. Pandas, a powerful Python data manipulation library, facilitates this through its intuitive handling of data structures, specifically DataFrames. This tutorial will guide you through various methods to retrieve the first or last N rows from a DataFrame, providing clarity through examples that range from basic to advanced.

Creating a Test DataFrame

A DataFrame is a two-dimensional, size-mutable, and potentially heterogeneous tabular data structure with labeled axes (rows and columns) in Pandas. Before diving into retrieving rows, let’s quickly set up a DataFrame to work with:

import pandas as pd

# Sample dataset
data = {'Name': ['John Doe', 'Jane Doe', 'Mary Jane', 'Peter Parker'],
        'Age': [28, 22, 31, 18],
        'City': ['New York', 'Los Angeles', 'Chicago', 'Houston']}
df = pd.DataFrame(data)
print(df)

This creates a DataFrame with names, ages, and cities of four individuals.

Retrieving the First N Rows

To view the first N rows of a DataFrame, Pandas provides the .head() method. By default, it returns the first five rows, but you can specify any number:

# Default first five rows
df.head()

# First two rows
df.head(2)

The output for the first two rows would be:

       Name  Age         City
0   John Doe   28     New York
1   Jane Doe   22  Los Angeles

Retrieving the Last N Rows

Similar to the .head() method, Pandas offers the .tail() method to access the last N rows of a DataFrame. Again, it returns the default last five rows, or you can specify the number:

# Default last five rows
df.tail()

# Last two rows
df.tail(2)

The output for the last two rows would be:

          Name  Age     City
2    Mary Jane   31  Chicago
3  Peter Parker   18  Houston

Advanced Retrieval Methods

Beyond the basic .head() and .tail() methods, there are more advanced techniques for accessing specific portions of your DataFrame. Let’s explore some of these:

Slicing

You can use Python’s slicing syntax to retrieve rows from a DataFrame:

# Get the first three rows
df[:3]

# Get the last two rows - using negative indexing
df[-2:]

iloc and loc Methods

For more granular control, .iloc can be used for positional indexing, while .loc accesses groups of rows and columns by labels.

# Using iloc to retrieve the first three rows
df.iloc[:3]

# Using loc to retrieve the last two row by index labels (assuming a specific index set)
df.loc[df.index[-2:]]

Query-based Retrieval

If your DataFrame is sufficiently large, you might only be interested in rows that satisfy a certain condition, serving as a more advanced form of ‘retrieving’ specific rows:

# Retrieve rows where Age is greater than 25
df.query('Age > 25')

Conclusion

Throughout this tutorial, we’ve explored multiple ways to retrieve the first or last N rows from a DataFrame using Pandas. Starting with basic methods like .head() and .tail(), and moving towards more sophisticated techniques such as slicing, and the .iloc and .loc methods. Understanding and applying these methods in your data analysis tasks can significantly improve the efficiency and depth of your explorations.