Sling Academy
Home/Pandas/Unlocking the power of pandas.Series.resample() method (6 examples)

Unlocking the power of pandas.Series.resample() method (6 examples)

Last updated: February 18, 2024

Overview

pandas is a powerful Python library that provides a plethora of functionalities for data manipulation and analysis. Among its myriad of features, the resample() method is a cornerstone for time series data analysis, allowing data to be summarized or converted with different time frames. This guide will walk you through six illustrative examples to showcase the versatility and power of the Series.resample() method in pandas.

Prerequisites

Before diving into examples, ensure you have pandas installed in your environment and know the basics of handling time series data in pandas. A solid understanding of Python is also required.

Example 1: Basic Resampling

import pandas as pd
import numpy as np
dates = pd.date_range('20230101', periods=6)
data = pd.Series(np.random.randn(6), index=dates)
print(data.resample('2D').mean())

This example demonstrates the basic use of resample() to aggregate time series data into larger bins (2 days in this case) and compute the mean for each bin.

Output (vary, due to the randomness):

2023-01-01   -0.678215
2023-01-03   -0.241955
2023-01-05    0.169140
Freq: 2D, dtype: float64

Example 2: Downsampling and Applying Multiple Aggregations

import pandas as pd
import numpy as np
dates = pd.date_range('20230101', periods=12)
data = pd.Series(np.random.randn(12), index=dates)
result = data.resample('3D').agg(['mean', 'std'])
print(result)

In this example, we show how to downsample data from a daily to a tri-day scale and apply multiple statistics (mean and standard deviation) simultaneously.

Output (random);

                mean       std
2023-01-01  1.155251  1.748583
2023-01-04  0.425340  1.635663
2023-01-07  0.069398  0.495763
2023-01-10 -0.458953  0.628117

Example 3: Upsampling and Filling Missing Values

import pandas as pd
import numpy as np
dates = pd.date_range('20230101', periods=6)
data = pd.Series(np.random.randn(6), index=dates)
upsampled = data.resample('D').asfreq()
upsampled.fillna(method='ffill', inplace=True)
print(upsampled)

This example highlights upsampling from a daily frequency to a higher frequency (hourly) and methods for imputing the missing values, illustrating data versatility enhancement.

Output (random):

2023-01-01    0.073173
2023-01-02    0.868983
2023-01-03   -1.590373
2023-01-04   -0.752302
2023-01-05   -0.374519
2023-01-06   -0.242952
Freq: D, dtype: float64

Example 4: Grouping by a Time Period

import pandas as pd
import numpy as np
dates = pd.date_range(start='2023-01-01', end='2023-01-31')
data = pd.Series(np.random.randn(len(dates)), index=dates)
monthly_data = data.resample('M').sum()
print(monthly_data)

Here, we’re showing how to group time series data by a longer time period (month) and calculate the total for each group, useful for monthly summaries or reports.

Example 5: Resampling with Custom Functions

import pandas as pd
import numpy as np
dates = pd.date_range('20230101', periods=10)
data = pd.Series(np.random.rand(10), index=dates)
# Define a custom function to calculate the range (max - min)
def range_func(array):
    return array.max() - array.min()
custom_resample = data.resample('5D').apply(range_func)
print(custom_resample)

This example explores the application of custom functions (such as a range function) on resampled data, highlighting the method’s flexibility.

Example 6: Handling Time Zones

import pandas as pd
import numpy as np
dates = pd.date_range('20230101', periods=6, tz='UTC')
data = pd.Series(np.random.randn(6), index=dates)
localized_data = data.tz_convert('America/New_York')
resampled_data = localized_data.resample('2D').mean()
print(resampled_data)

Time zone management is crucial in time series analysis. This example shows how to convert time zones in a datetime Series before applying the resample() method.

Conclusion

The resample() method in pandas is a dynamic and versatile tool critical for successful time series data analysis. Through this guide’s examples, we’ve shown how it can be applied for basic aggregations, applying multiple and custom functions, handling missing values, and dealing with time zones. Mastering the resample() method can empower analysts to extract meaningful insights from time series data efficiently.

Next Article: Understanding pandas.Series.tz_convert() method (5 examples)

Previous Article: pandas.Series.shift() method: A detailed guide (with examples)

Series: Pandas Series: From Basic to Advanced

Pandas

You May Also Like

  • How to Use Pandas Profiling for Data Analysis (4 examples)
  • How to Handle Large Datasets with Pandas and Dask (4 examples)
  • Pandas – Using DataFrame.pivot() method (3 examples)
  • Pandas: How to ‘FULL JOIN’ 2 DataFrames (3 examples)
  • Pandas: Select columns whose names start/end with a specific string (4 examples)
  • 3 ways to turn off future warnings in Pandas
  • How to Integrate Pandas with Apache Spark
  • How to Use Pandas for Web Scraping and Saving Data (2 examples)
  • How to Clean and Preprocess Text Data with Pandas (3 examples)
  • Pandas – Using Series.replace() method (3 examples)
  • Pandas json_normalize() function: Explained with examples
  • Pandas: Reading CSV and Excel files from AWS S3 (4 examples)
  • Using pandas.Series.rank() method (4 examples)
  • Pandas: Dropping columns whose names contain a specific string (4 examples)
  • Pandas: How to print a DataFrame without index (3 ways)
  • Fixing Pandas NameError: name ‘df’ is not defined
  • Pandas – Using DataFrame idxmax() and idxmin() methods (4 examples)
  • Pandas FutureWarning: ‘M’ is deprecated and will be removed in a future version, please use ‘ME’ instead
  • Pandas: Checking equality of 2 DataFrames (element-wise)