Overview
pandas is a powerful Python library that provides a plethora of functionalities for data manipulation and analysis. Among its myriad of features, the resample()
method is a cornerstone for time series data analysis, allowing data to be summarized or converted with different time frames. This guide will walk you through six illustrative examples to showcase the versatility and power of the Series.resample()
method in pandas.
Prerequisites
Before diving into examples, ensure you have pandas installed in your environment and know the basics of handling time series data in pandas. A solid understanding of Python is also required.
Example 1: Basic Resampling
import pandas as pd
import numpy as np
dates = pd.date_range('20230101', periods=6)
data = pd.Series(np.random.randn(6), index=dates)
print(data.resample('2D').mean())
This example demonstrates the basic use of resample()
to aggregate time series data into larger bins (2 days in this case) and compute the mean for each bin.
Output (vary, due to the randomness):
2023-01-01 -0.678215
2023-01-03 -0.241955
2023-01-05 0.169140
Freq: 2D, dtype: float64
Example 2: Downsampling and Applying Multiple Aggregations
import pandas as pd
import numpy as np
dates = pd.date_range('20230101', periods=12)
data = pd.Series(np.random.randn(12), index=dates)
result = data.resample('3D').agg(['mean', 'std'])
print(result)
In this example, we show how to downsample data from a daily to a tri-day scale and apply multiple statistics (mean and standard deviation) simultaneously.
Output (random);
mean std
2023-01-01 1.155251 1.748583
2023-01-04 0.425340 1.635663
2023-01-07 0.069398 0.495763
2023-01-10 -0.458953 0.628117
Example 3: Upsampling and Filling Missing Values
import pandas as pd
import numpy as np
dates = pd.date_range('20230101', periods=6)
data = pd.Series(np.random.randn(6), index=dates)
upsampled = data.resample('D').asfreq()
upsampled.fillna(method='ffill', inplace=True)
print(upsampled)
This example highlights upsampling from a daily frequency to a higher frequency (hourly) and methods for imputing the missing values, illustrating data versatility enhancement.
Output (random):
2023-01-01 0.073173
2023-01-02 0.868983
2023-01-03 -1.590373
2023-01-04 -0.752302
2023-01-05 -0.374519
2023-01-06 -0.242952
Freq: D, dtype: float64
Example 4: Grouping by a Time Period
import pandas as pd
import numpy as np
dates = pd.date_range(start='2023-01-01', end='2023-01-31')
data = pd.Series(np.random.randn(len(dates)), index=dates)
monthly_data = data.resample('M').sum()
print(monthly_data)
Here, we’re showing how to group time series data by a longer time period (month) and calculate the total for each group, useful for monthly summaries or reports.
Example 5: Resampling with Custom Functions
import pandas as pd
import numpy as np
dates = pd.date_range('20230101', periods=10)
data = pd.Series(np.random.rand(10), index=dates)
# Define a custom function to calculate the range (max - min)
def range_func(array):
return array.max() - array.min()
custom_resample = data.resample('5D').apply(range_func)
print(custom_resample)
This example explores the application of custom functions (such as a range function) on resampled data, highlighting the method’s flexibility.
Example 6: Handling Time Zones
import pandas as pd
import numpy as np
dates = pd.date_range('20230101', periods=6, tz='UTC')
data = pd.Series(np.random.randn(6), index=dates)
localized_data = data.tz_convert('America/New_York')
resampled_data = localized_data.resample('2D').mean()
print(resampled_data)
Time zone management is crucial in time series analysis. This example shows how to convert time zones in a datetime Series before applying the resample()
method.
Conclusion
The resample()
method in pandas is a dynamic and versatile tool critical for successful time series data analysis. Through this guide’s examples, we’ve shown how it can be applied for basic aggregations, applying multiple and custom functions, handling missing values, and dealing with time zones. Mastering the resample()
method can empower analysts to extract meaningful insights from time series data efficiently.