Sling Academy
Home/Pandas/Pandas: Select rows between 2 dates in a DataFrame

Pandas: Select rows between 2 dates in a DataFrame

Last updated: February 20, 2024

Introduction to Pandas

Pandas is a powerful Python library for data analysis and manipulation. It provides high-performance, easy-to-use data structures and data analysis tools. One of the key features of Pandas is its DataFrame, which is essentially a multidimensional array with labeled axes (rows and columns).

Dealing with date and time data is a common yet crucial task in data analysis. Whether you are analyzing stock prices, weather data, or sales transactions, the ability to filter records within a specific date range is essential. In this tutorial, we will explore how to select rows between two dates in a Pandas DataFrame. We will start with basic examples and progressively delve into more advanced scenarios. By the end of this guide, you will be adept at manipulating and analyzing time-series data using Pandas.

Prerequisite

Before proceeding, ensure you have the following installed:

  • Python (3.6 or later)
  • Pandas library (can be installed using pip install pandas)

Getting Started with Date Ranges in DataFrames

To select rows between two dates, it is first imperative to have date columns in the correct format. You can convert a column to datetime using the pd.to_datetime() function. Here’s how:

import pandas as pd

df = pd.DataFrame(
    {
        "date": ["2021-01-01", "2021-01-02", "2021-01-03", "2021-01-04"],
        "value": [10, 20, 30, 40],
    }
)

df["date"] = pd.to_datetime(df["date"])

Now that our date column is correctly formatted, we are set to filter rows within a specific date range.

Basic Date Range Selection

To select rows between two dates, you can use Boolean indexing. This method is straightforward and easy to understand:

start_date = '2021-01-02'
end_date = '2021-01-03'

filtered_df = df[(df['date'] >= start_date) & (df['date'] <= end_date)]
print(filtered_df)

Output:

        date  value
1 2021-01-02     20
2 2021-01-03     30

Using loc Method

The loc method offers a more concise way to select rows based on conditions. Here’s how you can apply it to our date range selection:

start_date = '2021-01-02'
end_date = '2021-01-03'

filtered_df = df.loc[(df['date'] >= start_date) & (df['date'] <= end_date)]
print(filtered_df)

Output:

        date  value
1 2021-01-02     20
2 2021-01-03     30

Advanced: Handling Time Zones

When dealing with global data, time zones become an important factor. Pandas allows you to localize and convert time zones. Let’s assume our DataFrame’s dates are in UTC, and we want to select records in Eastern Time (ET).

df['date'] = df['date'].dt.tz_localize('UTC').dt.tz_convert('US/Eastern')
start_date_et = pd.Timestamp('2021-01-02T00:00:00', tz='US/Eastern')
end_date_et = pd.Timestamp('2021-01-03T23:59:59', tz='US/Eastern')

filtered_df = df[(df['date'] >= start_date_et) & (df['date'] <= end_date_et)]
print(filtered_df)

Output:

                       date  value
2 2021-01-02 19:00:00-05:00     30
3 2021-01-03 19:00:00-05:00     40

Intersection of Date Ranges

Another advanced scenario is selecting rows that fall within multiple overlapping date ranges. This requires a bit more logic and manipulation:

additional_dates = pd.DataFrame({
  'start_date': ['2021-01-02', '2021-01-04'],
  'end_date': ['2021-01-03', '2021-01-05']
})

for index, row in additional_dates.iterrows():
    df['temp'] = (df['date'] >= row['start_date']) & (df['date'] <= row['end_date'])
    df = df[df['temp'] == True].drop('temp', axis=1)

print(df)

This allows for flexibility in handling multiple date ranges, adjusting to more complex filtering requirements.

Conclusion

Selecting rows between two dates in a Pandas DataFrame is a valuable skill in data analysis. By mastering basic to advanced techniques, you can efficiently manipulate and analyze time-series data. As shown, Pandas provides flexible methods to accomplish this task, empowering you to handle various scenarios with ease.

Next Article: Pandas: Select rows since last day/week/month in a DataFrame

Previous Article: Pandas: How to select N random rows from a DataFrame

Series: DateFrames in Pandas

Pandas

You May Also Like

  • How to Use Pandas Profiling for Data Analysis (4 examples)
  • How to Handle Large Datasets with Pandas and Dask (4 examples)
  • Pandas – Using DataFrame.pivot() method (3 examples)
  • Pandas: How to ‘FULL JOIN’ 2 DataFrames (3 examples)
  • Pandas: Select columns whose names start/end with a specific string (4 examples)
  • 3 ways to turn off future warnings in Pandas
  • How to Integrate Pandas with Apache Spark
  • How to Use Pandas for Web Scraping and Saving Data (2 examples)
  • How to Clean and Preprocess Text Data with Pandas (3 examples)
  • Pandas – Using Series.replace() method (3 examples)
  • Pandas json_normalize() function: Explained with examples
  • Pandas: Reading CSV and Excel files from AWS S3 (4 examples)
  • Using pandas.Series.rank() method (4 examples)
  • Pandas: Dropping columns whose names contain a specific string (4 examples)
  • Pandas: How to print a DataFrame without index (3 ways)
  • Fixing Pandas NameError: name ‘df’ is not defined
  • Pandas – Using DataFrame idxmax() and idxmin() methods (4 examples)
  • Pandas FutureWarning: ‘M’ is deprecated and will be removed in a future version, please use ‘ME’ instead
  • Pandas: Checking equality of 2 DataFrames (element-wise)