Introduction
Manipulating time series data is a common task in data analysis, enabling insights into trends, patterns, and cycles. In this tutorial, we will specifically explore how to change the frequency of time series data from daily to weekly or monthly using pandas, a powerful Python data manipulation library. This task is particularly useful for making datasets more manageable, summarizing trends over a longer period, or preparing data for predictive modeling.
Getting Started
First, ensure you have pandas installed. If not, you can install it using pip:
pip install pandas
Next, import pandas in your Python script or notebook:
import pandas as pd
Create a simple daily dates panda Series to use as an example:
rng = pd.date_range(start='2023-01-01', end='2023-01-31', freq='D')
data = pd.Series(range(len(rng)), index=rng)
This creates a Series with daily increments from January 1 to January 31, 2023. Each day is assigned a sequential value starting from 0.
Changing Frequency to Weekly
Let’s start by changing the frequency from daily to weekly. There are multiple methods to accomplish this, but one common way is using the resample()
method. Resampling is powerful for frequency conversion and provides ample flexibility for handling the aggregation of data.
weekly_data = data.resample('W').mean()
In this example, we resampled the data to a weekly frequency, choosing the mean as our aggregation method. This will give you the average of values for a week. The output will be:
2023-01-01 0.0
2023-01-08 3.5
... # abbreviated for brevity
2023-01-29 27.5
Freq: W-SUN, dtype: float64
The resample('W')
function resamples the data to a weekly frequency, with weeks ending on Sunday. The mean of daily values is calculated for each week.
Changing Frequency to Monthly
Similarly, to change the frequency to a monthly basis, we use the same resample()
method but adjust the frequency parameter:
monthly_data = data.resample('M').mean()
This resamples the data to a monthly frequency, again using the mean for aggregation:
2023-01-31 15.0
Freq: M, dtype: float64
Here, we get the average of daily values for the month of January 2023.
Advanced Techniques
While aggregation with mean values is common, there are several other methods you can use, including sum, max, min, and custom aggregation functions. For instance, to sum values weekly:
weekly_sum = data.resample('W').sum()
Or, to find the maximum value monthly:
monthly_max = data.resample('M').max()
Custom aggregation allows for even more flexibility:
custom_resample = data.resample('W').agg({'data': ['mean', 'min', 'max']})
Here, agg()
is used to apply multiple aggregation functionalities at once.
Handling Missing Data
When changing frequencies, especially from a higher frequency to a lower one, you might encounter NaN values due to aggregation over periods without data. To handle these, pandas offers methods like fillna()
or dropna()
. For instance, to fill missing values with the last available value:
weekly_data_filled = weekly_data.fillna(method='ffill')
To drop them altogether:
monthly_data_clean = monthly_data.dropna()
These techniques ensure that the resulting dataset is clean and ready for further analysis or modeling.
Conclusion
Changing the frequency of time series data is a straightforward yet powerful technique for data preparation and analysis. Whether summarizing data to simplify visualization, reduce noise, or prepare for modeling, pandas’ resampling capabilities provide the functionality needed to transform data efficiently. By adjusting the frequency from daily to weekly or monthly, we gain different insights and can cater our analysis to specific objectives.