Sling Academy
Home/Pandas/Pandas: Split a Time Series by Year, Month, and Day

Pandas: Split a Time Series by Year, Month, and Day

Last updated: February 19, 2024

Introduction

In the world of data analysis and manipulation, time-series data is ubiquitous, ranging from stock prices to weather forecasting. The Python library Pandas is a powerful tool for handling such data. A frequent requirement while working with time-series data is to split it by time intervals, such as year, month, or day. This tutorial provides a comprehensive guide on how to perform these operations using Pandas, complete with code examples from basic to advanced.

Getting Started

To work with Pandas, you first need to install it. If you haven’t already, you can install Pandas using pip:

pip install pandas

Once installed, the next step is to import Pandas along with the datetime library, which will be used for handling time-related information.

import pandas as pd
from datetime import datetime

We’ll also create a sample DataFrame with datetime objects, which will serve as our time-series data for this tutorial.

data = {'Date': ['2021-01-01', '2021-01-02', '2021-01-03', '2021-02-01', '2021-03-01', '2021-03-15', '2021-04-01',
                 '2022-01-01', '2022-02-01', '2022-03-01'],
         'Value': [10, 15, 20, 25, 30, 35, 40, 45, 50, 55]}
df = pd.DataFrame(data)
df['Date'] = pd.to_datetime(df['Date'])
print(df)

This DataFrame contains dates and corresponding values. The subsequent steps will demonstrate how to split this dataset by year, month, and day.

Splitting by Year

To split the DataFrame by year, we can extract the year from the ‘Date’ column and then group the data based on these years. Here’s how:

df['Year'] = df['Date'].dt.year
df_yearly = df.groupby('Year').apply(lambda x: x.reset_index(drop=True)).reset_index(drop=True)
print(df_yearly)

The output shows the grouped data by each year, with the Year column added for clarity.

Splitting by Month

Similarly, to split the data by month, we first need to extract the month from the ‘Date’ column. After doing so, we can then group the data by month:

df['Month'] = df['Date'].dt.month
df_monthly = df.groupby(['Year', 'Month']).apply(lambda x: x.reset_index(drop=True)).reset_index(drop=True)
print(df_monthly)

This groups the data by both year and month, making it easy to observe the data within specific months across different years.

Splitting by Day

For a more granular analysis, you may want to split the data by day. Just like the previous steps, extract the day from the ‘Date’ column:

df['Day'] = df['Date'].dt.day
df_daily = df.groupby(['Year', 'Month', 'Day']).apply(lambda x: x.reset_index(drop=True)).reset_index(drop=True)
print(df_daily)

Now, the data is grouped by year, month, and day, providing a detailed view of daily values.

Advanced Operations

Beyond basic grouping and splitting, Pandas allows for advanced operations to further analyze and manipulate time-series data. Here are two examples:

Resampling Time Series Data

Resampling is a powerful technique for time series analysis, particularly useful for changing the frequency of your time series data. For example, converting daily data to monthly averages:

df.set_index('Date', inplace=True)
df_resampled = df.resample('M').mean()
print(df_resampled)

This code snippet will resample the data to monthly intervals, computing the mean for each month.

Rolling Window Calculations

Another useful technique is performing rolling window calculations, which can be used for smoothing time series data or calculating moving averages:

rolling_avg = df['Value'].rolling(window=7).mean()
print(rolling_avg)

This example computes a 7-day moving average of the ‘Value’ column, useful for observing trends over time.

Conclusion

By learning to split time series data by year, month, and day in Pandas, you can perform a wide range of data analyses tailored to your specific needs. Whether your interest lies in finance, meteorology, or any field involving time series, these techniques provide a solid foundation for your data manipulation tasks. With practice, you’ll be able to apply these methods to increasingly complex datasets, gaining valuable insights into temporal trends and patterns.

Next Article: Explore pandas.Series.dt.floor() method (4 examples)

Previous Article: Pandas Time Series: Change daily frequency to week/month frequency

Series: Pandas Series: From Basic to Advanced

Pandas

You May Also Like

  • How to Use Pandas Profiling for Data Analysis (4 examples)
  • How to Handle Large Datasets with Pandas and Dask (4 examples)
  • Pandas – Using DataFrame.pivot() method (3 examples)
  • Pandas: How to ‘FULL JOIN’ 2 DataFrames (3 examples)
  • Pandas: Select columns whose names start/end with a specific string (4 examples)
  • 3 ways to turn off future warnings in Pandas
  • How to Integrate Pandas with Apache Spark
  • How to Use Pandas for Web Scraping and Saving Data (2 examples)
  • How to Clean and Preprocess Text Data with Pandas (3 examples)
  • Pandas – Using Series.replace() method (3 examples)
  • Pandas json_normalize() function: Explained with examples
  • Pandas: Reading CSV and Excel files from AWS S3 (4 examples)
  • Using pandas.Series.rank() method (4 examples)
  • Pandas: Dropping columns whose names contain a specific string (4 examples)
  • Pandas: How to print a DataFrame without index (3 ways)
  • Fixing Pandas NameError: name ‘df’ is not defined
  • Pandas – Using DataFrame idxmax() and idxmin() methods (4 examples)
  • Pandas FutureWarning: ‘M’ is deprecated and will be removed in a future version, please use ‘ME’ instead
  • Pandas: Checking equality of 2 DataFrames (element-wise)