Sling Academy
Home/Pandas/Pandas: How to create a pivot table with count/sum/average of values

Pandas: How to create a pivot table with count/sum/average of values

Last updated: February 23, 2024

Pandas, a powerful and flexible Python library, is synonymous with data manipulation and analysis. Among its numerous features, the pivot table function stands out for its ability to summarize and analyze large datasets in a simple manner. In this tutorial, we will dive deep into how to create pivot tables in Pandas to count, sum, and average values, complete with step-by-step examples.

Introduction to Pivot Tables in Pandas

Pivot tables are a quintessential tool for data analysis, allowing for the summarization of data. They enable the aggregation of data points based on columns and calculation of statistics such as counts, sums, and averages. Pandas’ pivot table functionality brings this power into the Python environment, enhancing data analysis workflows.

Getting Started

First, ensure you have Pandas installed:

pip install pandas

Then, import Pandas in your Python script:

import pandas as pd

Basic Pivot Table

To create a basic pivot table, we start with a simple DataFrame:

data = {
  'Product': ['Apples', 'Bananas', 'Oranges', 'Apples', 'Oranges'],
  'Sales': [20, 30, 40, 50, 60],
  'Region': ['East', 'West', 'East', 'West', 'East']
}
df = pd.DataFrame(data)

Creating a pivot table to count the number of sales per product:

pd.pivot_table(df, values='Sales', index='Product', aggfunc='count')

Sum and Average

For summarizing sales, you can also calculate the sum and average:

pd.pivot_table(df, values='Sales', index='Product', aggfunc=['sum', 'mean'])

Advanced Pivot Table

Moving towards more complex scenarios, we consider another DataFrame:

data = {
  'Product': ['Apples', 'Bananas', 'Oranges', 'Apples', 'Oranges', 'Bananas'],
  'Sales': [20, 30, 40, 50, 60, 70],
  'Region': ['East', 'West', 'East', 'West', 'East', 'West'],
  'Month': ['January', 'February', 'January', 'February', 'January', 'February']
}
df = pd.DataFrame(data)

We can use the pivot table to analyze sales data across regions and months:

pd.pivot_table(df, values='Sales', index=['Region', 'Month'], columns='Product', aggfunc={'Sales': ['sum', 'mean']})

Advanced Techniques

Further exploring the capabilities of Pandas’ pivot table, we delve into multi-level indexing, conditional aggregations, and incorporating multiple aggregation functions.

pd.pivot_table(df, values='Sales', index=['Region', 'Month'], aggfunc={'Sales': 'sum', 'Product': lambda x: x.mode()[0]})

Handling Missing Data

In cases where your dataset includes missing values, Pandas pivot tables offer parameters like fill_value and dropna to handle them effectively:

pd.pivot_table(df, values='Sales', index=['Region', 'Month'], columns='Product', fill_value=0, dropna=False)

Visualization

Understanding data through visualization is a powerful technique. Pandas pivot tables can be directly used with libraries like Matplotlib and Seaborn, for graphical representation:

import matplotlib.pyplot as plt
import seaborn as sns

# Assuming `pivot` is a previously created pivot table
sns.heatmap(pivot)
plt.show()

Conclusion

In conclusion, the Pandas’ pivot table function is a versatile and powerful tool in the data analyst’s arsenal, enabling effective summarization and analysis of data through counting, summing, and averaging values. Through this tutorial, you should now have a solid understanding of creating basic to advanced pivot tables, expanding your data analysis capabilities.

Next Article: Pandas DataFrame: How to calculate lag/lead of a column

Previous Article: Pandas DataFrame: Adding a percentage column based on other columns

Series: DateFrames in Pandas

Pandas

You May Also Like

  • How to Use Pandas Profiling for Data Analysis (4 examples)
  • How to Handle Large Datasets with Pandas and Dask (4 examples)
  • Pandas – Using DataFrame.pivot() method (3 examples)
  • Pandas: How to ‘FULL JOIN’ 2 DataFrames (3 examples)
  • Pandas: Select columns whose names start/end with a specific string (4 examples)
  • 3 ways to turn off future warnings in Pandas
  • How to Integrate Pandas with Apache Spark
  • How to Use Pandas for Web Scraping and Saving Data (2 examples)
  • How to Clean and Preprocess Text Data with Pandas (3 examples)
  • Pandas – Using Series.replace() method (3 examples)
  • Pandas json_normalize() function: Explained with examples
  • Pandas: Reading CSV and Excel files from AWS S3 (4 examples)
  • Using pandas.Series.rank() method (4 examples)
  • Pandas: Dropping columns whose names contain a specific string (4 examples)
  • Pandas: How to print a DataFrame without index (3 ways)
  • Fixing Pandas NameError: name ‘df’ is not defined
  • Pandas – Using DataFrame idxmax() and idxmin() methods (4 examples)
  • Pandas FutureWarning: ‘M’ is deprecated and will be removed in a future version, please use ‘ME’ instead
  • Pandas: Checking equality of 2 DataFrames (element-wise)