Sling Academy
Home/Pandas/Pandas – Using DataFrame.pivot() method (3 examples)

Pandas – Using DataFrame.pivot() method (3 examples)

Last updated: March 01, 2024

Introduction

Pandas is a fast, powerful, flexible, and easy-to-use open-source data analysis and manipulation tool, built on top of the Python programming language. One of the essential functionalities it provides is the ability to reshape dataframes. The pivot() method, specifically, is a versatile tool for pivoting without aggregation. This tutorial will guide you through the use of DataFrame.pivot() method with three progressive examples.

What is the pivot() Method?

The pivot() method in Pandas allows you to reshape your dataframe by reorganizing your data, turning unique values from one column into multiple columns in the output, and relocating corresponding values from other columns into the new structure. It’s particularly useful for transforming data from long to wide format.

Syntax:

DataFrame.pivot(index=None, columns=None, values=None)

Here:

  • index: string or object, optional. Column name to use to make new frame’s index. If None, uses existing index.
  • columns: string or object. Column name to use to make new frame’s columns.
  • values: string, object or a list of the previous, optional. Column(s) to use for populating new frame’s values. If not specified, all remaining columns will be used and the result will have hierarchically indexed columns.

Example 1: Basic Usage of pivot()

Let’s start with a simple example to understand the basics of pivoting.

import pandas as pd

# Create a simple dataframe
data = {'Day': ['Mon', 'Tue', 'Wed', 'Thu', 'Fri'],
        'Fruit': ['Apple', 'Banana', 'Apple', 'Banana', 'Apple'],
        'Quantity': [5, 3, 6, 2, 7]}
df = pd.DataFrame(data)

# Applying the pivot method
df_pivoted = df.pivot(index='Day', columns='Fruit', values='Quantity')

print(df_pivoted)

This will output:

 Fruit  Apple  Banana
Day                    
Mon        5       NaN
Tue        NaN     3
Wed        6       NaN
Thu        NaN     2
Fri        7       NaN

Here, we pivoted the dataframe such that the days of the week became the index, and fruit types became columns, showing quantities as values. NaN indicates missing values for combinations not present in the original data.

Example 2: Handling Multiple Columns

In our second example, we’ll see how to handle situations where you have multiple values that you want to spread across different columns. This requires a slightly more complex setup.

import pandas as pd

# Creating a more complex dataframe
data = {'Day': ['Mon', 'Tue', 'Wed', 'Mon', 'Tue', 'Wed'],
        'Fruit': ['Apple', 'Apple', 'Apple', 'Banana', 'Banana', 'Banana'],
        'Person': ['Alice', 'Bob', 'Alice', 'Alice', 'Bob', 'Alice'],
        'Quantity': [5, 3, 6, 2, 7, 4]}
df = pd.DataFrame(data)

# Apply the pivot method
df_pivoted = df.pivot(index='Day', columns='Fruit', values='Quantity')
print(df_pivoted)

Output:

Fruit  Apple  Banana
Day                 
Mon        5       2
Tue        3       7
Wed        6       4

Note that in cases where multiple entries for a single combination of index/columns exist, an ValueError will be raised due to multiple values.

Example 3: Advanced Usage with Aggregation

For advanced usage involving aggregation with the pivot() method in Pandas, you would typically pivot your data first and then apply aggregation functions. However, the pivot() method itself doesn’t directly support aggregation. When you need to aggregate data during pivoting, you should use the pivot_table() method, which is designed to handle duplicate entries by applying an aggregation function.

Let’s examine an example demonstrating how to use pivot_table() for pivoting with aggregation. Imagine you have sales data for different products on multiple dates and you want to see the total sales for each product on each date.

import pandas as pd

# Sample data
data = {
    'Date': ['2021-01-01', '2021-01-01', '2021-01-02', '2021-01-02', '2021-01-01'],
    'Product': ['A', 'B', 'A', 'B', 'A'],
    'Sales': [100, 200, 150, 250, 75]
}

df = pd.DataFrame(data)

# Use pivot_table to aggregate data
pivot_df = df.pivot_table(index='Date', columns='Product', values='Sales', aggfunc='sum')

print(pivot_df)

Output:

Product       A    B
Date                
2021-01-01  175  200
2021-01-02  150  250

Explantions:

  • index='Date': This specifies that the dates should be the rows of the resulting pivot table.
  • columns='Product': This specifies that the different products should be the columns of the resulting pivot table.
  • values='Sales': This specifies that the values we’re interested in pivoting and aggregating are the sales figures.
  • aggfunc='sum': This is where the aggregation comes into play. Because there can be multiple sales entries for the same product on the same date, we specify that we want to sum these sales to get total sales per product per date.

Conclusion

Through these examples, we’ve seen how to use the pivot() method in Pandas to reshape our data, from simple reorganization tasks to handling more complex situations involving multiple values per index/columns combination. Mastering pivot() and pivot_table() can significantly increase your data manipulation capabilities within Pandas.

Next Article: Pandas DataFrame.pivot_table() method: Explained with examples

Previous Article: Using DataFrame.droplevel() method in Pandas (4 examples)

Series: DateFrames in Pandas

Pandas

You May Also Like

  • How to Use Pandas Profiling for Data Analysis (4 examples)
  • How to Handle Large Datasets with Pandas and Dask (4 examples)
  • Pandas: How to ‘FULL JOIN’ 2 DataFrames (3 examples)
  • Pandas: Select columns whose names start/end with a specific string (4 examples)
  • 3 ways to turn off future warnings in Pandas
  • How to Integrate Pandas with Apache Spark
  • How to Use Pandas for Web Scraping and Saving Data (2 examples)
  • How to Clean and Preprocess Text Data with Pandas (3 examples)
  • Pandas – Using Series.replace() method (3 examples)
  • Pandas json_normalize() function: Explained with examples
  • Pandas: Reading CSV and Excel files from AWS S3 (4 examples)
  • Using pandas.Series.rank() method (4 examples)
  • Pandas: Dropping columns whose names contain a specific string (4 examples)
  • Pandas: How to print a DataFrame without index (3 ways)
  • Fixing Pandas NameError: name ‘df’ is not defined
  • Pandas – Using DataFrame idxmax() and idxmin() methods (4 examples)
  • Pandas FutureWarning: ‘M’ is deprecated and will be removed in a future version, please use ‘ME’ instead
  • Pandas: Checking equality of 2 DataFrames (element-wise)
  • Understanding pandas.DataFrame.loc[] through 6 examples