Pandas DataFrame: Adding a percentage column based on other columns

Updated: February 21, 2024 By: Guest Contributor Post a comment

Introduction

Performing data analysis often requires the computation and addition of new columns to the existing DataFrame, especially when dealing with percentages, which provide valuable insights into the relative sizes of parts to the whole. Pandas, a powerful data manipulation library in Python, simplifies these tasks significantly. In this tutorial, we will learn various methods to add a percentage column to a DataFrame, based on the values of other columns, using Pandas.

Prerequisites

Before proceeding, ensure you have Pandas installed. You can install Pandas using pip:

pip install pandas

Also, ensure you have basic familiarity with Python and Pandas. Let’s start with the simplest case and gradually move to more complex examples.

Basic Percentage of a Single Column

First, we create a DataFrame to work with:

import pandas as pd  

df = pd.DataFrame({'A': [1, 2, 3, 4]})

To add a percentage column that represents each value’s percentage of the sum of column ‘A’, we use:

df['Perc_A'] = (df['A'] / df['A'].sum()) * 100  
print(df)

Output:

   A  Perc_A
0  1    10.0
1  2    20.0
2  3    30.0
3  4    40.0

Percentage Change Between Rows

Next, we calculate the percentage change between rows in a column:

df['A_Percent_Change'] = df['A'].pct_change() * 100  
print(df)

Output:

   A  A_Percent_Change
0  1               NaN
1  2        100.000000
2  3         50.000000
3  4         33.333333

Note: The first row in ‘A_Percent_Change’ is NaN since there’s no previous row to compare.

Weighted Percentage of Multiple Columns

If your DataFrame has multiple columns and you want to get a weighted percentage of those columns into a new column, here’s how you could do it:

import pandas as pd

df = pd.DataFrame({"A": [1, 2, 3, 4]})

df = pd.DataFrame({"A": [1, 2, 3, 4], "B": [2, 3, 4, 5]})
total = df["A"].sum() + df["B"].sum()
df["Weighted_Perc"] = ((df["A"] + df["B"]) / total) * 100
print(df)

Output:

   A  B  Weighted_Perc
0  1  2      12.500000
1  2  3      20.833333
2  3  4      29.166667
3  4  5      37.500000

Conditional Percentages

For more complex scenarios, where the percentage column is conditional upon the values of other columns, the np.where function from NumPy can be very useful. For example, if you want to create a percentage column where only values above a certain threshold in column ‘A’ are considered:

import pandas as pd
import numpy as np

df = pd.DataFrame({"A": [1, 2, 3, 4]})

df["Perc_Above_Threshold"] = np.where(df["A"] > 2, (df["A"] / df["A"].sum()) * 100, 0)
print(df)

Output:

   A  Perc_Above_Threshold
0  1                   0.0
1  2                   0.0
2  3                  30.0
3  4                  40.0

5. Using Apply Functions to Compute Custom Percentages

When you have complex requirements for computing percentages that cannot be easily expressed with arithmetic operations or conditions, using the apply function can be very powerful. Let’s create a column that applies a custom function to compute the percentage:

import pandas as pd

df = pd.DataFrame({"A": [1, 2, 3, 4], "B": [2, 3, 4, 5]})


def custom_percentage(row):
    return (row["A"] / (row["A"] + row["B"])) * 100


df["Custom_Perc"] = df.apply(custom_percentage, axis=1)
print(df)

This function calculates the percentage of ‘A’ in the total of ‘A’ and ‘B’ for each row and applies it across the DataFrame.

Output:

   A  B  Custom_Perc
0  1  2    33.333333
1  2  3    40.000000
2  3  4    42.857143
3  4  5    44.444444

Conclusion

Adding percentage columns to a DataFrame is a common task in data analysis. As demonstrated, Pandas provides multiple ways to approach this task, from simple percentage calculations to applying custom functions for more complex requirements. Choosing the right approach depends on the specific needs of your data analysis project. Armed with these techniques, you’re now well-equipped to add insightful percentage-based metrics to your datasets.