Introduction
Pandas is a fast, powerful, flexible, and easy-to-use open-source data analysis and manipulation tool, built on top of the Python programming language. One of the essential functionalities it provides is the ability to reshape dataframes. The pivot()
method, specifically, is a versatile tool for pivoting without aggregation. This tutorial will guide you through the use of DataFrame.pivot()
method with three progressive examples.
What is the pivot() Method?
The pivot()
method in Pandas allows you to reshape your dataframe by reorganizing your data, turning unique values from one column into multiple columns in the output, and relocating corresponding values from other columns into the new structure. It’s particularly useful for transforming data from long to wide format.
Syntax:
DataFrame.pivot(index=None, columns=None, values=None)
Here:
- index: string or object, optional. Column name to use to make new frame’s index. If
None
, uses existing index. - columns: string or object. Column name to use to make new frame’s columns.
- values: string, object or a list of the previous, optional. Column(s) to use for populating new frame’s values. If not specified, all remaining columns will be used and the result will have hierarchically indexed columns.
Example 1: Basic Usage of pivot()
Let’s start with a simple example to understand the basics of pivoting.
import pandas as pd
# Create a simple dataframe
data = {'Day': ['Mon', 'Tue', 'Wed', 'Thu', 'Fri'],
'Fruit': ['Apple', 'Banana', 'Apple', 'Banana', 'Apple'],
'Quantity': [5, 3, 6, 2, 7]}
df = pd.DataFrame(data)
# Applying the pivot method
df_pivoted = df.pivot(index='Day', columns='Fruit', values='Quantity')
print(df_pivoted)
This will output:
Fruit Apple Banana
Day
Mon 5 NaN
Tue NaN 3
Wed 6 NaN
Thu NaN 2
Fri 7 NaN
Here, we pivoted the dataframe such that the days of the week became the index, and fruit types became columns, showing quantities as values. NaN
indicates missing values for combinations not present in the original data.
Example 2: Handling Multiple Columns
In our second example, we’ll see how to handle situations where you have multiple values that you want to spread across different columns. This requires a slightly more complex setup.
import pandas as pd
# Creating a more complex dataframe
data = {'Day': ['Mon', 'Tue', 'Wed', 'Mon', 'Tue', 'Wed'],
'Fruit': ['Apple', 'Apple', 'Apple', 'Banana', 'Banana', 'Banana'],
'Person': ['Alice', 'Bob', 'Alice', 'Alice', 'Bob', 'Alice'],
'Quantity': [5, 3, 6, 2, 7, 4]}
df = pd.DataFrame(data)
# Apply the pivot method
df_pivoted = df.pivot(index='Day', columns='Fruit', values='Quantity')
print(df_pivoted)
Output:
Fruit Apple Banana
Day
Mon 5 2
Tue 3 7
Wed 6 4
Note that in cases where multiple entries for a single combination of index/columns exist, an ValueError
will be raised due to multiple values.
Example 3: Advanced Usage with Aggregation
For advanced usage involving aggregation with the pivot()
method in Pandas, you would typically pivot your data first and then apply aggregation functions. However, the pivot()
method itself doesn’t directly support aggregation. When you need to aggregate data during pivoting, you should use the pivot_table()
method, which is designed to handle duplicate entries by applying an aggregation function.
Let’s examine an example demonstrating how to use pivot_table()
for pivoting with aggregation. Imagine you have sales data for different products on multiple dates and you want to see the total sales for each product on each date.
import pandas as pd
# Sample data
data = {
'Date': ['2021-01-01', '2021-01-01', '2021-01-02', '2021-01-02', '2021-01-01'],
'Product': ['A', 'B', 'A', 'B', 'A'],
'Sales': [100, 200, 150, 250, 75]
}
df = pd.DataFrame(data)
# Use pivot_table to aggregate data
pivot_df = df.pivot_table(index='Date', columns='Product', values='Sales', aggfunc='sum')
print(pivot_df)
Output:
Product A B
Date
2021-01-01 175 200
2021-01-02 150 250
Explantions:
index='Date'
: This specifies that the dates should be the rows of the resulting pivot table.columns='Product'
: This specifies that the different products should be the columns of the resulting pivot table.values='Sales'
: This specifies that the values we’re interested in pivoting and aggregating are the sales figures.aggfunc='sum'
: This is where the aggregation comes into play. Because there can be multiple sales entries for the same product on the same date, we specify that we want to sum these sales to get total sales per product per date.
Conclusion
Through these examples, we’ve seen how to use the pivot()
method in Pandas to reshape our data, from simple reorganization tasks to handling more complex situations involving multiple values per index/columns combination. Mastering pivot()
and pivot_table()
can significantly increase your data manipulation capabilities within Pandas.