Sling Academy
Home/Pandas/Understanding pandas.DataFrame.loc[] through 6 examples

Understanding pandas.DataFrame.loc[] through 6 examples

Last updated: February 24, 2024

Introduction

The pandas library in Python is a powerhouse for data manipulation and analysis. Among its many features, DataFrame.loc[] stands out for its ability to select data based on label information. This tutorial will guide you through understanding and utilizing loc[] with six comprehensive examples.

Preparation

Ensure you have pandas installed and imported in your Python environment:

import pandas as pd

Creating a Sample DataFrame

data = {'Name': ['John', 'Anna', 'Peter', 'Linda'],
        'Age': [28, 34, 29, 32],
        'City': ['New York', 'Paris', 'Berlin', 'London']}
df = pd.DataFrame(data)

Example 1: Basic Selection

Starting with the basics, you can select a single row:

print(df.loc[0])

The output will show information for the first row, indexed at 0.

Example 2: Select Multiple Rows

Selecting multiple rows by specifying a list of indices:

print(df.loc[[0, 2]])

This will output rows 0 and 2.

Example 3: Slicing Rows

You can slice rows using a colon:

print(df.loc[1:3])

This slice includes rows 1 through 3.

Example 4: Selecting Rows and Columns

More selective data access by specifying row and column labels:

print(df.loc[0, 'Name'])
print(df.loc[[1, 3], ['Name', 'City']])

Outputs will show the name of the first person and names with cities of the second and fourth persons, respectively.

Example 5: Conditional Selection

Using conditions to filter rows:

print(df.loc[df['Age'] > 30])

This command lists all persons older than 30 years.

Example 6: Setting Values

loc[] can also be used to modify data:

df.loc[0, 'Age'] = 29
print(df)

The age for the first person has been updated to 29.

Advanced Use: Combining with Other Methods

Combining loc[] with other pandas methods can unlock even more power. For instance, using loc[] with groupby() for aggregated data selection:

# Assuming 'df' is a more complex DataFrame with multiple entries per city
city_group = df.groupby('City')
print(city_group.loc[city_group['Age'].mean() > 30, 'Name'])

Note: The above might require adjustments based on real data context, as groupby().loc[] isn’t directly applicable. This shows the concept of combining loc[] with other methods.

Conclusion

The pandas.DataFrame.loc[] method is essential for precise data selection and manipulation. Through these examples, you’ve seen its versatility – from basic to more sophisticated data operations. Experiment with these techniques on your own data sets to discover the true power of pandas.

Next Article: pandas.DataFrame.insert() – Inserting a new column at a specific location

Previous Article: Pandas DataFrame: Access and modify the value of a cell with .at[] and .iat[]

Series: DateFrames in Pandas

Pandas

You May Also Like

  • How to Use Pandas Profiling for Data Analysis (4 examples)
  • How to Handle Large Datasets with Pandas and Dask (4 examples)
  • Pandas – Using DataFrame.pivot() method (3 examples)
  • Pandas: How to ‘FULL JOIN’ 2 DataFrames (3 examples)
  • Pandas: Select columns whose names start/end with a specific string (4 examples)
  • 3 ways to turn off future warnings in Pandas
  • How to Integrate Pandas with Apache Spark
  • How to Use Pandas for Web Scraping and Saving Data (2 examples)
  • How to Clean and Preprocess Text Data with Pandas (3 examples)
  • Pandas – Using Series.replace() method (3 examples)
  • Pandas json_normalize() function: Explained with examples
  • Pandas: Reading CSV and Excel files from AWS S3 (4 examples)
  • Using pandas.Series.rank() method (4 examples)
  • Pandas: Dropping columns whose names contain a specific string (4 examples)
  • Pandas: How to print a DataFrame without index (3 ways)
  • Fixing Pandas NameError: name ‘df’ is not defined
  • Pandas – Using DataFrame idxmax() and idxmin() methods (4 examples)
  • Pandas FutureWarning: ‘M’ is deprecated and will be removed in a future version, please use ‘ME’ instead
  • Pandas: Checking equality of 2 DataFrames (element-wise)