Pandas: Select rows between 2 dates in a DataFrame

Updated: February 20, 2024 By: Guest Contributor Post a comment

Introduction to Pandas

Pandas is a powerful Python library for data analysis and manipulation. It provides high-performance, easy-to-use data structures and data analysis tools. One of the key features of Pandas is its DataFrame, which is essentially a multidimensional array with labeled axes (rows and columns).

Dealing with date and time data is a common yet crucial task in data analysis. Whether you are analyzing stock prices, weather data, or sales transactions, the ability to filter records within a specific date range is essential. In this tutorial, we will explore how to select rows between two dates in a Pandas DataFrame. We will start with basic examples and progressively delve into more advanced scenarios. By the end of this guide, you will be adept at manipulating and analyzing time-series data using Pandas.

Prerequisite

Before proceeding, ensure you have the following installed:

  • Python (3.6 or later)
  • Pandas library (can be installed using pip install pandas)

Getting Started with Date Ranges in DataFrames

To select rows between two dates, it is first imperative to have date columns in the correct format. You can convert a column to datetime using the pd.to_datetime() function. Here’s how:

import pandas as pd

df = pd.DataFrame(
    {
        "date": ["2021-01-01", "2021-01-02", "2021-01-03", "2021-01-04"],
        "value": [10, 20, 30, 40],
    }
)

df["date"] = pd.to_datetime(df["date"])

Now that our date column is correctly formatted, we are set to filter rows within a specific date range.

Basic Date Range Selection

To select rows between two dates, you can use Boolean indexing. This method is straightforward and easy to understand:

start_date = '2021-01-02'
end_date = '2021-01-03'

filtered_df = df[(df['date'] >= start_date) & (df['date'] <= end_date)]
print(filtered_df)

Output:

        date  value
1 2021-01-02     20
2 2021-01-03     30

Using loc Method

The loc method offers a more concise way to select rows based on conditions. Here’s how you can apply it to our date range selection:

start_date = '2021-01-02'
end_date = '2021-01-03'

filtered_df = df.loc[(df['date'] >= start_date) & (df['date'] <= end_date)]
print(filtered_df)

Output:

        date  value
1 2021-01-02     20
2 2021-01-03     30

Advanced: Handling Time Zones

When dealing with global data, time zones become an important factor. Pandas allows you to localize and convert time zones. Let’s assume our DataFrame’s dates are in UTC, and we want to select records in Eastern Time (ET).

df['date'] = df['date'].dt.tz_localize('UTC').dt.tz_convert('US/Eastern')
start_date_et = pd.Timestamp('2021-01-02T00:00:00', tz='US/Eastern')
end_date_et = pd.Timestamp('2021-01-03T23:59:59', tz='US/Eastern')

filtered_df = df[(df['date'] >= start_date_et) & (df['date'] <= end_date_et)]
print(filtered_df)

Output:

                       date  value
2 2021-01-02 19:00:00-05:00     30
3 2021-01-03 19:00:00-05:00     40

Intersection of Date Ranges

Another advanced scenario is selecting rows that fall within multiple overlapping date ranges. This requires a bit more logic and manipulation:

additional_dates = pd.DataFrame({
  'start_date': ['2021-01-02', '2021-01-04'],
  'end_date': ['2021-01-03', '2021-01-05']
})

for index, row in additional_dates.iterrows():
    df['temp'] = (df['date'] >= row['start_date']) & (df['date'] <= row['end_date'])
    df = df[df['temp'] == True].drop('temp', axis=1)

print(df)

This allows for flexibility in handling multiple date ranges, adjusting to more complex filtering requirements.

Conclusion

Selecting rows between two dates in a Pandas DataFrame is a valuable skill in data analysis. By mastering basic to advanced techniques, you can efficiently manipulate and analyze time-series data. As shown, Pandas provides flexible methods to accomplish this task, empowering you to handle various scenarios with ease.