Unlocking the power of pandas.Series.resample() method (6 examples)

Updated: February 18, 2024 By: Guest Contributor Post a comment

Table Of Contents

1 Overview

1.1 Prerequisites

2 Example 1: Basic Resampling

3 Example 2: Downsampling and Applying Multiple Aggregations

4 Example 3: Upsampling and Filling Missing Values

5 Example 4: Grouping by a Time Period

6 Example 5: Resampling with Custom Functions

7 Example 6: Handling Time Zones

8 Conclusion

Overview

pandas is a powerful Python library that provides a plethora of functionalities for data manipulation and analysis. Among its myriad of features, the resample() method is a cornerstone for time series data analysis, allowing data to be summarized or converted with different time frames. This guide will walk you through six illustrative examples to showcase the versatility and power of the Series.resample() method in pandas.

Prerequisites

Before diving into examples, ensure you have pandas installed in your environment and know the basics of handling time series data in pandas. A solid understanding of Python is also required.

Example 1: Basic Resampling

import pandas as pd
import numpy as np
dates = pd.date_range('20230101', periods=6)
data = pd.Series(np.random.randn(6), index=dates)
print(data.resample('2D').mean())

This example demonstrates the basic use of resample() to aggregate time series data into larger bins (2 days in this case) and compute the mean for each bin.

Output (vary, due to the randomness):

2023-01-01   -0.678215
2023-01-03   -0.241955
2023-01-05    0.169140
Freq: 2D, dtype: float64

Example 2: Downsampling and Applying Multiple Aggregations

import pandas as pd
import numpy as np
dates = pd.date_range('20230101', periods=12)
data = pd.Series(np.random.randn(12), index=dates)
result = data.resample('3D').agg(['mean', 'std'])
print(result)

In this example, we show how to downsample data from a daily to a tri-day scale and apply multiple statistics (mean and standard deviation) simultaneously.

Output (random);

                mean       std
2023-01-01  1.155251  1.748583
2023-01-04  0.425340  1.635663
2023-01-07  0.069398  0.495763
2023-01-10 -0.458953  0.628117

Example 3: Upsampling and Filling Missing Values

import pandas as pd
import numpy as np
dates = pd.date_range('20230101', periods=6)
data = pd.Series(np.random.randn(6), index=dates)
upsampled = data.resample('D').asfreq()
upsampled.fillna(method='ffill', inplace=True)
print(upsampled)

This example highlights upsampling from a daily frequency to a higher frequency (hourly) and methods for imputing the missing values, illustrating data versatility enhancement.

Output (random):

2023-01-01    0.073173
2023-01-02    0.868983
2023-01-03   -1.590373
2023-01-04   -0.752302
2023-01-05   -0.374519
2023-01-06   -0.242952
Freq: D, dtype: float64

Example 4: Grouping by a Time Period

import pandas as pd
import numpy as np
dates = pd.date_range(start='2023-01-01', end='2023-01-31')
data = pd.Series(np.random.randn(len(dates)), index=dates)
monthly_data = data.resample('M').sum()
print(monthly_data)

Here, we’re showing how to group time series data by a longer time period (month) and calculate the total for each group, useful for monthly summaries or reports.

Example 5: Resampling with Custom Functions

import pandas as pd
import numpy as np
dates = pd.date_range('20230101', periods=10)
data = pd.Series(np.random.rand(10), index=dates)
# Define a custom function to calculate the range (max - min)
def range_func(array):
    return array.max() - array.min()
custom_resample = data.resample('5D').apply(range_func)
print(custom_resample)

This example explores the application of custom functions (such as a range function) on resampled data, highlighting the method’s flexibility.

Example 6: Handling Time Zones

import pandas as pd
import numpy as np
dates = pd.date_range('20230101', periods=6, tz='UTC')
data = pd.Series(np.random.randn(6), index=dates)
localized_data = data.tz_convert('America/New_York')
resampled_data = localized_data.resample('2D').mean()
print(resampled_data)

Time zone management is crucial in time series analysis. This example shows how to convert time zones in a datetime Series before applying the resample() method.

Conclusion

The resample() method in pandas is a dynamic and versatile tool critical for successful time series data analysis. Through this guide’s examples, we’ve shown how it can be applied for basic aggregations, applying multiple and custom functions, handling missing values, and dealing with time zones. Mastering the resample() method can empower analysts to extract meaningful insights from time series data efficiently.

Next Article: Pandas Series: Counting NaN and Non-NaN Values

Previous Article: pandas.Series.shift() method: A detailed guide (with examples)

Series: Pandas Series: From Basic to Advanced

Pandas