What is pandas.Series.between() used for? (with examples)

Updated: February 18, 2024 By: Guest Contributor Post a comment

Introduction

The Python pandas library is a powerhouse for data manipulation and analysis, offering an extensive range of functions and methods to work efficiently with structured datasets. Among these myriad capabilities, the Series.between() method is a notably useful function for filtering data within a specific range. This tutorial dives deep into its workings, applications, and nuances with a series of progressively complex examples.

Syntax

Series.between(left, right, inclusive=True) allows you to filter values in a Series to identify those within a specified interval. The arguments left and right define the interval boundaries, while the inclusive parameter dictates whether these boundaries are included in the results.

Let’s start by importing pandas and creating a simple Series:

import pandas as pd
s = pd.Series([2, 3, 5, 7, 11, 13, 17, 19, 23, 29])

Basic Usage

First, let’s see how to use between() in its simplest form:

filtered = s.between(5, 15)
print(filtered)

This will output a boolean Series indicating which values fall within the range [5, 15]:

2    False
3    False
4     True
5     True
6     True
7    False
8    False
9    False
dtype: bool

Next, to get the actual values instead of boolean flags, we apply the filter to the Series:

print(s[filtered])

The output will show the values that fall within the specified range:

4      5
5     11
6     13
dtype: int64

Inclusive Parameter

Now, let’s explore the inclusive argument by setting it to False:

filtered_exclusive = s.between(5, 15, inclusive=False)
print(s[filtered_exclusive])

The output will no longer include the boundary values:

5    11
6    13
dtype: int64

Working with Dates

The between() method is extremely versatile and not limited to numeric data. It’s equally useful for working with dates. Let’s create a Series of dates:

dates = pd.Series(pd.date_range('2023-01-01', periods=10, freq='D'))
filtered_dates = dates.between('2023-01-03', '2023-01-08')
print(dates[filtered_dates])

This code snippet will output the dates that fall within the specified range:

2   2023-01-03
3   2023-01-04
4   2023-01-05
5   2023-01-06
6   2023-01-07
7   2023-01-08
dtype: datetime64[ns]

String Data

Similarly, between() can be applied to Series containing string data. This is particularly useful for filtering alphabetical ranges or specific text patterns. Here’s an example:

names = pd.Series(['Alice', 'Bob', 'Charlie', 'David', 'Eve', 'Frank'])
filtered_names = names.between('B', 'E')
print(names[filtered_names])

The resulting output:

1       Bob
2    Charlie
3      David

dtype: int64

Advanced Use Cases

For more advanced analysis, you can combine between() with other pandas capabilities. For instance, in dealing with a DataFrame, you may want to filter rows based on a specific column’s value range:

df = pd.DataFrame({
  'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve', 'Frank'],
  'Age': [25, 30, 35, 40, 45, 50],
  'Income': [50000, 60000, 70000, 80000, 90000, 100000]
})
age_filtered = df['Age'].between(30, 45)
print(df[age_filtered])

This demonstrates filtering DataFrame rows by applying a condition on a single column. Combining this with other logical conditions can significantly enhance your data analysis workflows.

Conclusion

The Series.between() method in pandas serves as a powerful tool for performing range-based filtering on data. Through a mixture of basic and advanced examples, this tutorial has showcased its versatility across different data types, including integers, dates, and strings, as well as its application in complex data analysis scenarios. Understanding and leveraging this function can greatly streamline your data manipulation tasks and enrich your analytical insights.