Pandas: Split a Time Series by Year, Month, and Day

Updated: February 19, 2024 By: Guest Contributor Post a comment

Introduction

In the world of data analysis and manipulation, time-series data is ubiquitous, ranging from stock prices to weather forecasting. The Python library Pandas is a powerful tool for handling such data. A frequent requirement while working with time-series data is to split it by time intervals, such as year, month, or day. This tutorial provides a comprehensive guide on how to perform these operations using Pandas, complete with code examples from basic to advanced.

Getting Started

To work with Pandas, you first need to install it. If you haven’t already, you can install Pandas using pip:

pip install pandas

Once installed, the next step is to import Pandas along with the datetime library, which will be used for handling time-related information.

import pandas as pd
from datetime import datetime

We’ll also create a sample DataFrame with datetime objects, which will serve as our time-series data for this tutorial.

data = {'Date': ['2021-01-01', '2021-01-02', '2021-01-03', '2021-02-01', '2021-03-01', '2021-03-15', '2021-04-01',
                 '2022-01-01', '2022-02-01', '2022-03-01'],
         'Value': [10, 15, 20, 25, 30, 35, 40, 45, 50, 55]}
df = pd.DataFrame(data)
df['Date'] = pd.to_datetime(df['Date'])
print(df)

This DataFrame contains dates and corresponding values. The subsequent steps will demonstrate how to split this dataset by year, month, and day.

Splitting by Year

To split the DataFrame by year, we can extract the year from the ‘Date’ column and then group the data based on these years. Here’s how:

df['Year'] = df['Date'].dt.year
df_yearly = df.groupby('Year').apply(lambda x: x.reset_index(drop=True)).reset_index(drop=True)
print(df_yearly)

The output shows the grouped data by each year, with the Year column added for clarity.

Splitting by Month

Similarly, to split the data by month, we first need to extract the month from the ‘Date’ column. After doing so, we can then group the data by month:

df['Month'] = df['Date'].dt.month
df_monthly = df.groupby(['Year', 'Month']).apply(lambda x: x.reset_index(drop=True)).reset_index(drop=True)
print(df_monthly)

This groups the data by both year and month, making it easy to observe the data within specific months across different years.

Splitting by Day

For a more granular analysis, you may want to split the data by day. Just like the previous steps, extract the day from the ‘Date’ column:

df['Day'] = df['Date'].dt.day
df_daily = df.groupby(['Year', 'Month', 'Day']).apply(lambda x: x.reset_index(drop=True)).reset_index(drop=True)
print(df_daily)

Now, the data is grouped by year, month, and day, providing a detailed view of daily values.

Advanced Operations

Beyond basic grouping and splitting, Pandas allows for advanced operations to further analyze and manipulate time-series data. Here are two examples:

Resampling Time Series Data

Resampling is a powerful technique for time series analysis, particularly useful for changing the frequency of your time series data. For example, converting daily data to monthly averages:

df.set_index('Date', inplace=True)
df_resampled = df.resample('M').mean()
print(df_resampled)

This code snippet will resample the data to monthly intervals, computing the mean for each month.

Rolling Window Calculations

Another useful technique is performing rolling window calculations, which can be used for smoothing time series data or calculating moving averages:

rolling_avg = df['Value'].rolling(window=7).mean()
print(rolling_avg)

This example computes a 7-day moving average of the ‘Value’ column, useful for observing trends over time.

Conclusion

By learning to split time series data by year, month, and day in Pandas, you can perform a wide range of data analyses tailored to your specific needs. Whether your interest lies in finance, meteorology, or any field involving time series, these techniques provide a solid foundation for your data manipulation tasks. With practice, you’ll be able to apply these methods to increasingly complex datasets, gaining valuable insights into temporal trends and patterns.