# Utilizing DataFrame.var() method in Pandas (5 examples)

## Introduction

In the realm of data analysis and data science, Pandas is a cornerstone Python library that offers versatile data structures and operations for manipulating numerical data and time series. The `var()` method, in particular, is a powerful tool for computing variance of a DataFrameâ€™s numerical columns, a fundamental statistical operation. This article unpacks the usage of the `var()` method in Pandas through five progressive examples.

## Syntax & Parameters of `var()`

Before diving into the examples, ensure you have Pandas installed and imported in your Python environment:

``````import pandas as pd
``````

The `var()` method calculates the variance of the values in a DataFrame or a Series, optionally skipping NaN values. Variance measures how much the values in a dataset deviate from the mean. The syntax is straightforward:

``````DataFrame.var(axis=None, skipna=True, level=None, ddof=1, numeric_only=None, **kwargs)
``````

Key parameters include:

• axis: Whether to calculate variance column-wise (0 or â€˜indexâ€™) or row-wise (1 or â€˜columnsâ€™).
• skipna: Whether to exclude NaN values from the calculation.
• ddof: Delta Degrees of Freedom. The divisor used in calculations is N â€“ ddof, where N is the number of elements.
• numeric_only: Whether to include only number data types in the calculations.

## Example 1: Basic Variance Calculation

Create a simple DataFrame and compute variance for its numerical columns:

``````data = {'Name': ['Anna', 'Bob', 'Charlie', 'Diana'],
'Age': [23, 34, 29, 24],
'Height': [165, 175, 170, 169]}
df = pd.DataFrame(data)
print(df.var())
``````

Output:

``````Age       24.916667
Height     17.666667
dtype: float64
``````

This output presents the variance of the â€˜Ageâ€™ and â€˜Heightâ€™ columns, illustrating basic usage.

## Example 2: Skipping NaN Values

Consider a DataFrame with missing values. Hereâ€™s how `var()` handles them when `skipna` is True (the default setting):

``````data = {'Name': ['Anna', 'Bob', 'Charlie', null],
'Age': [23, null, 29, 24],
'Height': [165, 175, null, 169]}
df = pd.DataFrame(data)
print(df.var())
``````

Output:

``````Age       19.333333
Height     29.333333
dtype: float64
``````

Despite the null values, `var()` successfully computes the variance, showcasing its handling of missing data.

## Example 3: Variance by Rows

To calculate variance across rows, set the `axis` parameter to 1. This could be useful in analyzing variance across observations for each individual in the dataset:

``````data = {'Test_1': [75, 88, 92], 'Test_2': [88, 92, 75], 'Final': [82, 93, 88]}
df = pd.DataFrame(data)
print(df.var(axis=1))
``````

Output:

``````0     37.000000
1     12.333333
2     72.333333
dtype: float64
``````

This output reflects variance of scores within the individual rows, providing insight into the consistency of test scores for each person.

## Example 4: Handling Non-Numeric Columns

By default, `var()` excludes non-numeric columns from its computation. To include them, manipulate the data or filter the DataFrame. However, letâ€™s focus on how `var()` operates under normal circumstances on a mixed-type DataFrame:

``````data = {'Name': ['Anna', 'Bob', 'Charlie'], 'Age': [29, 34, 24], 'Score': [82.5, 88.9, 92.1]}
df = pd.DataFrame(data)
print(df.var())
``````

Output:

``````Age      25.333333
Score     24.943333
dtype: float64
``````

This shows variance calculations for the numeric columns, skipping the â€˜Nameâ€™ column automatically.

## Example 5: Advanced Variance Computation

For more sophisticated analysis, combine the `var()` method with other Pandas functions or apply it on grouped data. Here, we demonstrate its use with grouped data:

``````data = {'Group': ['A', 'A', 'B', 'B'], 'Score': [82, 88, 75, 92]}
df = pd.DataFrame(data)
grouped = df.groupby('Group')
print(grouped.var())
``````

Output:

``````           Score
Group
A       18.000000
B      144.500000
``````

Variance is calculated within each group, showing how scores vary within group â€˜Aâ€™ and â€˜Bâ€™.

## Conclusion

Through these examples, weâ€™ve explored the breadth of functionality offered by Pandasâ€™ `var()` method, from basic variance calculations to more complex analyses involving non-numeric data and grouped subsets. Embracing `var()` in your data science toolkit can provide deep insights into the variability of your datasets.

Search tutorials, examples, and resources