Sling Academy
Home/Pandas/Pandas: Converting a nested dictionary to a multi-index DataFrame

Pandas: Converting a nested dictionary to a multi-index DataFrame

Last updated: February 21, 2024

Introduction

Pandas is an indispensable tool in the pocket of data scientists and analysts for data manipulation and analysis. A common task one might encounter is converting structured data in the form of nested dictionaries into a multi-index DataFrame. This operation lays out data in a tabular format that is easier to work with for analysis and visualization. In this article, we dive deep into converting nested dictionaries into multi-index DataFrames using Pandas, complete with incremental code examples.

Nested dictionaries are a convenient way to store hierarchical data. Converting this structure into a multi-index DataFrame can significantly enhance the readability and make data manipulations easier. Let’s start with the basics and progressively tackle more complex scenarios.

Basic Conversion

Consider the simplest form of a nested dictionary:

{
    'Fruit': {
        'Apple': 10,
        'Banana': 20
    },
    'Vegetable': {
        'Carrot': 15,
        'Potato': 25
    }
}

To convert this into a DataFrame:

import pandas as pd

data = {
    'Fruit': {
        'Apple': 10,
        'Banana': 20
    },
    'Vegetable': {
        'Carrot': 15,
        'Potato': 25
    }
}

df = pd.DataFrame(data)

print(df)

Output:

         Fruit  Vegetable
Apple       10         15
Banana      20         25
Carrot     NaN         15
Potato     NaN         25

Here, the outer keys become column indices, and the inner keys turn into row indices. NaN values indicate missing data.

Going Deeper: Multi-Level Index

For a nested dictionary with more depth, we use a more nuanced method. Consider the following:

{
    'Food': {
        'Fruit': {
            'Apple': 10,
            'Banana': 20
        },
        'Vegetable': {
            'Carrot': 15,
            'Potato': 25
        }
    },
    'Beverage': {
        'Cold': {
            'Soda': 5,
            'Water': 2
        },
        'Hot': {
            'Coffee': 3,
            'Tea': 4
        }
    }
}

Converting this structure:

import pandas as pd

data = {
    'Food': {
        'Fruit': {
            'Apple': 10,
            'Banana': 20
        },
        'Vegetable': {
            'Carrot': 15,
            'Potato': 25
        }
    },
    'Beverage': {
        'Cold': {
            'Soda': 5,
            'Water': 2
        },
        'Hot': {
            'Coffee': 3,
            'Tea': 4
        }
    }
}

df = pd.DataFrame(data).stack().unstack()
print(df)

Produces a multi-index DataFrame with hierarchies in both rows and columns.

Advanced Manipulations

For more complex structures or further customization of the DataFrame, Pandas provides a variety of methods. Let’s say we want to add another level of index based on year:

import pandas as pd

data = {
    2020: {
        'Food': {
            'Fruit': {
                'Apple': 10,
                'Banana': 20
            },
            'Vegetable': {
                'Carrot': 15,
                'Potato': 25
            }
        },
        'Beverage': {
            'Cold': {
                'Soda': 5,
                'Water': 2
            },
            'Hot': {
                'Coffee': 3,
                'Tea': 4
            }
        }
    },
    2021: {
        'Food': {
            'Fruit': {
                'Apple': 12,
                'Banana': 22
            },
            'Vegetable': {
                'Carrot': 18,
                'Potato': 30
            }
        },
        'Beverage': {
            'Cold': {
                'Soda': 6,
                'Water': 3
            },
            'Hot': {
                'Coffee': 4,
                'Tea': 5
            }
        }
    }
}

# Convert to multi-index DataFrame
years = list(data.keys())
multi_df = pd.concat({year: pd.DataFrame(data[year]).stack().unstack() for year in years}, axis=0)
multi_df.index.names = ['Year', 'Type']
print(multi_df)

This creates a more complex, multi-dimensional data structure that better represents our dataset’s hierarchical nature.

Conclusion

Converting nested dictionaries to multi-index DataFrames enhances data manipulability and lays it out in a format that’s easier to analyze and visualize. Pandas, with its comprehensive functionalities, makes these conversions straightforward, allowing for a wide range of manipulations to suit virtually any data analysis scenario.

Next Article: Pandas DataFrame: Convert all string values to binary

Previous Article: Pandas DataFrame: Find row with the closest value to a given number (4 ways)

Series: DateFrames in Pandas

Pandas

You May Also Like

  • How to Use Pandas Profiling for Data Analysis (4 examples)
  • How to Handle Large Datasets with Pandas and Dask (4 examples)
  • Pandas – Using DataFrame.pivot() method (3 examples)
  • Pandas: How to ‘FULL JOIN’ 2 DataFrames (3 examples)
  • Pandas: Select columns whose names start/end with a specific string (4 examples)
  • 3 ways to turn off future warnings in Pandas
  • How to Integrate Pandas with Apache Spark
  • How to Use Pandas for Web Scraping and Saving Data (2 examples)
  • How to Clean and Preprocess Text Data with Pandas (3 examples)
  • Pandas – Using Series.replace() method (3 examples)
  • Pandas json_normalize() function: Explained with examples
  • Pandas: Reading CSV and Excel files from AWS S3 (4 examples)
  • Using pandas.Series.rank() method (4 examples)
  • Pandas: Dropping columns whose names contain a specific string (4 examples)
  • Pandas: How to print a DataFrame without index (3 ways)
  • Fixing Pandas NameError: name ‘df’ is not defined
  • Pandas – Using DataFrame idxmax() and idxmin() methods (4 examples)
  • Pandas FutureWarning: ‘M’ is deprecated and will be removed in a future version, please use ‘ME’ instead
  • Pandas: Checking equality of 2 DataFrames (element-wise)