Sling Academy
Home/Pandas/Pandas: How to Drop MultiIndex in Pivot Table

Pandas: How to Drop MultiIndex in Pivot Table

Last updated: February 21, 2024

Introduction

Working with pivot tables is a common operation in data analysis and manipulation. In pandas, a powerful Python data analysis library, pivot tables often use a MultiIndex for rows, columns, or both, which adds a hierarchical index structure to the data. While this can be incredibly useful for organizing and summarizing data, there are times when you might want to drop the MultiIndex to simplify the data structure. This tutorial will guide you through several methods to achieve this, from basic to advanced techniques.

Understanding MultiIndex in Pivot Tables

Before diving into how to drop a MultiIndex, let’s understand what a MultiIndex is. In pandas, a MultiIndex (also known as a hierarchical index) allows for multiple levels of indexing, which is particularly useful when working with higher dimensional data. This comes into play often with pivot tables, where data is grouped and summarized across multiple dimensions.

Example setup: Let’s start by creating a sample DataFrame and a pivot table with a MultiIndex.

import pandas as pd

# Sample DataFrame
data = {'Category': ['A', 'A', 'B', 'B'],
        'Subcategory': ['C1', 'C2', 'C1', 'C2'],
        'Values': [10, 20, 30, 40]}
df = pd.DataFrame(data)

# Creating pivot table
pivot_df = df.pivot_table(index=['Category', 'Subcategory'], values='Values')
print(pivot_df)

The output will look something like this, showing the MultiIndex for rows:

                     Values
Category Subcategory       
A        C1                10
         C2                20
B        C1                30
         C2                40

Method 1: Resetting the Index

The simplest way to drop a MultiIndex on rows is by using the reset_index() method. This converts the MultiIndex DataFrame into a traditional single-level DataFrame.

pivot_df.reset_index(inplace=True)
print(pivot_df)

The output will now display without the hierarchical index:

  Category Subcategory  Values
0        A           C1      10
1        A           C2      20
2        B           C1      30
3        B           C2      40

Method 2: Using Droplevel

If you only want to remove specific levels of the MultiIndex, the droplevel() method can be particularly useful. For example, if you only want to drop the ‘Subcategory’ index, you can do so as follows:

pivot_df.index = pivot_df.index.droplevel('Subcategory')
print(pivot_df)

This will result in the following output:

          Values
Category       
A             10
A             20
B             30
B             40

Method 3: Flatten Columns after Performing Multiple Aggregations

When you perform multiple aggregations on your pivot table, pandas automatically creates a MultiIndex for columns. Simplifying this setup requires a different strategy. One approach is to flatten the hierarchy by joining the levels of the MultiIndex columns into a single level.

Here is how you can achieve it:

pivot_df = df.pivot_table(index='Category', columns='Subcategory', values='Values', aggfunc=[sum, max])

# Flatten columns
pivot_df.columns = [' '.join(col).strip() for col in pivot_df.columns.values]
print(pivot_df)

This turns the MultiIndex columns into a simple list of columns:

          sum C1  sum C2  max C1  max C2
Category                                
A            10      20      10      20
B            30      40      30      40

Advanced Techniques

While the methods described above cover most scenarios you’ll encounter when needing to drop the MultiIndex, pandas also offers more advanced techniques for handling complex data structures. These include methods like melt() and advanced indexing options, which can provide more flexibility in restructuring your pivot tables.

Conclusion

Dropping a MultiIndex in a pivot table can significantly simplify data analysis and manipulation efforts. Whether you need a flat DataFrame for visual representation, or you’re preparing your data for machine learning algorithms, understanding how to effectively manage MultiIndexes is a vital skill. The methods outlined in this tutorial provide a strong foundation for anyone looking to enhance their pandas expertise.

Next Article: Pandas DataFrame: Adding a percentage column based on other columns

Previous Article: Pandas: How to skip N first/last rows of a CSV file

Series: DateFrames in Pandas

Pandas

You May Also Like

  • How to Use Pandas Profiling for Data Analysis (4 examples)
  • How to Handle Large Datasets with Pandas and Dask (4 examples)
  • Pandas – Using DataFrame.pivot() method (3 examples)
  • Pandas: How to ‘FULL JOIN’ 2 DataFrames (3 examples)
  • Pandas: Select columns whose names start/end with a specific string (4 examples)
  • 3 ways to turn off future warnings in Pandas
  • How to Integrate Pandas with Apache Spark
  • How to Use Pandas for Web Scraping and Saving Data (2 examples)
  • How to Clean and Preprocess Text Data with Pandas (3 examples)
  • Pandas – Using Series.replace() method (3 examples)
  • Pandas json_normalize() function: Explained with examples
  • Pandas: Reading CSV and Excel files from AWS S3 (4 examples)
  • Using pandas.Series.rank() method (4 examples)
  • Pandas: Dropping columns whose names contain a specific string (4 examples)
  • Pandas: How to print a DataFrame without index (3 ways)
  • Fixing Pandas NameError: name ‘df’ is not defined
  • Pandas – Using DataFrame idxmax() and idxmin() methods (4 examples)
  • Pandas FutureWarning: ‘M’ is deprecated and will be removed in a future version, please use ‘ME’ instead
  • Pandas: Checking equality of 2 DataFrames (element-wise)