Pandas: How to get unique values in a Series

Updated: February 17, 2024 By: Guest Contributor Post a comment

Introduction

Pandas is an open-source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language. Among its primary objects is the Series, a one-dimensional labeled array capable of holding any data type. A common operation when working with data is identifying unique values within a Series, which is vital for data analysis, data cleaning, or feature engineering tasks. In this tutorial, we delve into various ways to extract unique values from a series using Pandas, inclusive of multiple examples that increment in complexity.

Understanding Pandas Series

Before we explore methods to get unique values, it’s vital to understand what a Pandas Series is. A Series is a one-dimensional array-like object containing a sequence of values (similar to a list in Python) and an associated array of data labels, called its index. You can think of a Series as a column in a table.

Example:

import pandas as pd

# Create a series
s = pd.Series([1, 2, 3, 4, 5, 2, 3])
print(s)

Output:

0    1
1    2
2    3
3    4
4    5
5    2
6    3
dtype: int64

Basic Method: Using the unique() Function

One of the simplest ways to get unique values from a series is by using the unique() function. The unique() function returns the unique values in the series in the order in which they appear.

Example:

import pandas as pd

# Create a series
s = pd.Series([1, 2, 3, 4, 5, 2, 3])

# Get unique values
unique_values = s.unique()
print(unique_values)

Output:

[1, 2, 3, 4, 5]

Advanced Method: Using the drop_duplicates() Method

While the unique() function is handy, the drop_duplicates() method provides more flexibility. It returns a new Series with duplicate values removed, and you can also specify whether to consider the first or last occurrence as unique.

Example:

import pandas as pd

# Create a series
s = pd.Series([1, 2, 3, 4, 5, 2, 3, 6, 7, 8, 2])

# Remove duplicates, keep first
unique_series = s.drop_duplicates(keep='first')
print(unique_series)

Output:

0    1
1    2
2    3
3    4
4    5
7    6
8    7
9    8
dtype: int64

Advanced Example: Using value_counts() for Unique Values and Counts

To not only identify unique values but also count their occurrences, the value_counts() method can be instrumental. This method returns a Series containing counts of unique values, sorted in descending order.

Example:

import pandas as pd

# Create a series
s = pd.Series([1, 2, 3, 4, 5, 2, 3, 6, 7, 8, 2])

# Get unique values and their counts
value_counts = s.value_counts()
print(value_counts)

Output:

2    3
3    2
1    1
4    1
5    1
6    1
7    1
8    1
dtype: int64

Conclusion

In this tutorial, we explored various methods to get unique values from a Pandas Series. Starting with the straightforward unique() function, progressing to the drop_duplicates() method for more control, and finally demonstrating how to obtain unique values along with their counts using value_counts(). Familiarizing yourself with these techniques is incredibly valuable for effective data analysis and preprocessing tasks.