Pandas DataFrame.median() method (5 examples)

Updated: February 22, 2024 By: Guest Contributor Post a comment

Introduction

Pandas is a cornerstone tool in the data science ecosystem in Python, known for its powerful capabilities when it comes to data manipulation and analysis. One of the essential statistical methods provided by the Pandas library is the .median() method applied on DataFrame objects. This method is instrumental in finding the median value across different axes of a DataFrame, providing insights into the central tendency of your data. In this guide, we’ll explore the .median() method through five practical examples, ranging from basic to advanced usage scenarios.

What is the Median?

Before diving into the examples, let’s clarify what the median is. The median is the value that separates the higher half from the lower half of a data sample, probability distribution, or a population. In a sorted list of numbers, it is the middle value when the total number of values is odd, and the average of the two middle values if the total number is even.

Basic Example

Let’s start with a straightforward example. Here, we’re using a small DataFrame to calculate the median of each column.

import pandas as pd

df = pd.DataFrame({
    'A': [1, 2, 3, 4, 5],
    'B': [2, 3, 4, 5, 6],
    'C': [3, 4, 5, 6, 7]
})

print(df.median())

Output:

A    3.0
B    4.0
C    5.0
type: float64

This output shows the median value for each column in the DataFrame. It’s a basic but fundamental operation to understand how the method works.

Ignoring NaN Values

It’s common to encounter missing values in real-world data. The .median() method automatically ignores these NaN values.

import pandas as pd

df = pd.DataFrame({
    'A': [1, np.nan, 3, 4, 5],
    'B': [2, 3, np.nan, 5, 6],
    'C': [np.nan, 4, 5, 6, 7]
})

print(df.median())

Output:

A    3.5
B    4.0
C    5.5
type: float64

This example illustrates how .median() seamlessly handles missing data, returning the median of the non-missing values.

Computing Median Along Different Axes

The .median() method can calculate the median not just for each column, but across rows as well. This versatility is especially useful when you need a summarization along the other dimension of your dataset.

import pandas as pd

df = pd.DataFrame({
    'A': [1, 2, 3, 4, 5],
    'B': [2, 3, 4, 5, 6],
    'C': [3, 4, 5, 6, 7]
})

print(df.median(axis=1))

Output:

0    2.0
1    3.0
2    4.0
3    5.0
4    6.0
type: float64

In this example, setting the axis parameter to 1 computes the median across rows instead of columns, providing a median value for each row.

Using ‘skipna’ Parameter

While the default behavior is to omit NaN values when calculating the median, this might not always be the desired action. The skipna parameter allows you to include NaN values as part of the calculation, which can alter the result.

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'A': [1, np.nan, 3, 4, 5],
    'B': [2, 3, np.nan, 5, 6],
    'C': [np.nan, 4, 5, 6, 7]
})

print(df.median(skipna=False))

Output:

A    NaN
B    NaN
C    NaN
type: float64

This outcome demonstrates the effect of including NaN values in the calculation, which, in this case, results in NaN for the median values. It’s important to be aware of this option, especially when dealing with datasets where NaN signifies more than just a missing value.

Advanced Usage: Grouped Median

Lastly, let’s examine a more nuanced application of the .median() method by computing the median within grouped data. This advanced example showcases the combination of .groupby() and .median() to extract more specific statistical insights from your dataset.

import pandas as pd

df = pd.DataFrame({
    'Category': ['A', 'A', 'B', 'B', 'C', 'C'],
    'Values': [1, 2, 3, 4, 5, 6]
})

df_grouped = df.groupby('Category').median()

print(df_grouped)

Output:

          Values
Category        
A            1.5
B            3.5
C            5.5

In this sophisticated example, we leveraged the power of .groupby() to segment our data into categories before applying .median() on each segment. It illustrates how one can derive statistical measures within subgroups of data, adding a layer of depth to our analysis.

Conclusion

Throughout this tutorial, we’ve explored the .median() method in Pandas DataFrames through a variety of examples. This method is invaluable for summarizing central tendencies, especially in large datasets where a direct inspection is impractical. From basic to advanced usages, understanding how to apply the .median() method effectively can provide powerful insights into your data’s distribution and dynamics.