Sling Academy
Home/Pandas/Pandas time series: Find the sum/avg/min/max of each day/month/year

Pandas time series: Find the sum/avg/min/max of each day/month/year

Last updated: February 22, 2024


Pandas is a powerhouse tool for data analysis in Python, providing high-performance, easy-to-use data structures. Among its versatile features, time series analysis stands out, allowing users to effortlessly manipulate date and time-based data. In this comprehensive tutorial, we’ll explore how to find the sum, average, minimum, and maximum of values for each day, month, and year within a Pandas DataFrame.

Getting Started

Before diving into time series operations, ensure you have Pandas installed in your environment:

pip install pandas

For timeseries data manipulation, it’s also recommended to have dateutil:

pip install python-dateutil

Lets begin by creating a sample time series data:

import pandas as pd
import numpy as np

# Create a date range
date_rng = pd.date_range(start='1/1/2022', end='12/31/2022', freq='D')
# Create a sample DataFrame
df = pd.DataFrame(date_rng, columns=['date'])
df['data'] = np.random.rand(len(date_rng))

Setting the DateTimeIndex

For effective time series analysis, it’s essential to set the dataframe’s index to a DatetimeIndex:

df.set_index('date', inplace=True)

Now, our DataFrame is ready for time-based grouping operations.

Sum of Values by Time Period

To calculate the sum of values for each day, month, or year, we use the resample() method:

# Daily sum
# Monthly sum
# Yearly sum

The 'D', 'M', and 'Y' characters represent daily, monthly, and yearly frequencies, respectively.

Average Value by Time Period

Finding the average (mean) follows a similar pattern, utilizing the .mean() method after resampling:

# Daily average
# Monthly average
# Yearly average

Minimum and Maximum Values by Time Period

To discover the minimum and maximum values within each period, use the .min() and .max() methods:

# Daily minimum
# Monthly minimum
# Yearly minimum

# Daily maximum
# Monthly maximum
# Yearly maximum

Visualizing Time Series Data

Visualizing your time series data can provide insights that are not easily visible through summarization alone. Using Pandas integration with Matplotlib, plot the monthly averages:

import matplotlib.pyplot as plt
plt.title('Monthly Average Data')
plt.ylabel('Avg Data')

Advanced Time Series Analysis

Beyond the basics, you might be interested in calculating rolling averages, performing seasonal decompositions, or predictive modeling with time series data. Pandas, in combination with the statsmodels library, can facilitate these more complex tasks:

pip install statsmodels

Here’s how to calculate a simple 7-day rolling average:

df['7-day rolling avg'] = df['data'].rolling(window=7).mean()

You can then visualize this alongside our initial data:

df[['data', '7-day rolling avg']].plot()
plt.title('7-Day Rolling Average')

The realm of time series analysis in Pandas is vast, offering a broad spectrum of methods to manipulate and analyze datetime data effectively. Whether your interest lies in simple summarization techniques or advanced statistical analysis, Pandas serves as a critical tool to achieve your data analysis goals efficiently.

Wrapping up, this tutorial has walked you through basic and some intermediate techniques in handling time series data using Pandas. By mastering these skills, you are now better equipped to tackle real-world data analysis challenges with confidence.

Next Article: Pandas time series: Adjust stock price after paying dividends or splitting – Example

Previous Article: Explore pandas.Series.convert_dtypes() method

Series: Pandas Series: From Basic to Advanced


You May Also Like

  • How to Use Pandas Profiling for Data Analysis (4 examples)
  • How to Handle Large Datasets with Pandas and Dask (4 examples)
  • Pandas – Using DataFrame.pivot() method (3 examples)
  • Pandas: How to ‘FULL JOIN’ 2 DataFrames (3 examples)
  • Pandas: Select columns whose names start/end with a specific string (4 examples)
  • 3 ways to turn off future warnings in Pandas
  • How to Integrate Pandas with Apache Spark
  • How to Use Pandas for Web Scraping and Saving Data (2 examples)
  • How to Clean and Preprocess Text Data with Pandas (3 examples)
  • Pandas – Using Series.replace() method (3 examples)
  • Pandas json_normalize() function: Explained with examples
  • Pandas: Reading CSV and Excel files from AWS S3 (4 examples)
  • Using pandas.Series.rank() method (4 examples)
  • Pandas: Dropping columns whose names contain a specific string (4 examples)
  • Pandas: How to print a DataFrame without index (3 ways)
  • Fixing Pandas NameError: name ‘df’ is not defined
  • Pandas – Using DataFrame idxmax() and idxmin() methods (4 examples)
  • Pandas FutureWarning: ‘M’ is deprecated and will be removed in a future version, please use ‘ME’ instead
  • Pandas: Checking equality of 2 DataFrames (element-wise)