Sling Academy
Home/Pandas/How to Use Pandas Profiling for Data Analysis (4 examples)

How to Use Pandas Profiling for Data Analysis (4 examples)

Last updated: March 02, 2024

Pandas Profiling is an invaluable tool for anyone looking to dive deeper into data analysis with Python. It generates descriptive statistics that are essential for understanding the basic structure of a dataset. This tutorial will cover how to use Pandas Profiling in various scenarios, ranging from basic to advanced examples. By the end, you’ll have a solid foundation to use this powerful library to accelerate your data analysis tasks.

Preparation

Before diving into the examples, ensure you have Pandas Profiling installed. If not, you can install it using pip:

pip install pandas-profiling

You can use your own CSV data or download one of the following datasets to practice:

Now, let’s go through four examples showing different ways you can leverage Pandas Profiling.

Example 1: Basic Overview

First, we’ll perform a basic analysis of a dataset. For simplicity, we’ll use the Iris dataset which is widely used for demonstrations.

import pandas as pd
from sklearn.datasets import load_iris
import pandas_profiling

iris = load_iris()
df = pd.DataFrame(data=iris.data, columns=iris.feature_names)
profile = df.profile_report(title='Iris Dataset Analysis')
profile.to_file("iris_analysis.html")

This generates an HTML report titled ‘Iris Dataset Analysis’ that provides an overview of the dataset including the distributions of features, missing values, and much more.

Example 2: Handling Large Datasets

With large datasets, generating a profile report can be time-consuming. You can use the minimal mode for a faster but less detailed overview.

df_large = pd.read_csv('your_large_dataset.csv')
profile = df_large.profile_report(minimal=True)
profile.to_file("large_dataset_analysis.html")

This approach reduces the generation time significantly by skipping correlations and other computationally intensive sections.

Example 3: Advanced Data Correlation

In this example, we explore advanced correlations and missing values analysis to identify patterns and relationships in our data.

df_complex = pd.read_csv('your_complex_dataset.csv')
profile = df_complex.profile_report(correlations={
    "pearson": {"calculate": True},
    "spearman": {"calculate": True},
    "kendall": {"calculate": True}
},
missing_diagrams={
    "heatmap": True,
    "dendrogram": True
})
profile.to_file("complex_dataset_analysis.html")

This detailed report helps in identifying both the linear and nonlinear relationships between variables, hence facilitating a more thorough analysis.

Example 4: Interactive Dashboard

Finally, we’ll use Pandas Profiling to create an interactive dashboard. This requires Jupyter Notebook or Jupyter Lab.

df = pd.read_csv('your_dataset.csv')
profile = df.profile_report(explorative=True, html={'style':{'full_width':True}})
profile.to_widgets()

This example showcases the interactive nature of Pandas Profiling in a Jupyter environment, making it a dynamic tool for exploratory data analysis.

Conclusion

Through these examples, we’ve explored the versatility and power of Pandas Profiling for data analysis. From quick assessments of large datasets to deep dives into complex relationships, Pandas Profiling equips you with the insights needed to make informed decisions. Its ease of use and broad capabilities make it an essential tool in the data analyst’s arsenal.

Next Article: How to Integrate Pandas with Apache Spark

Previous Article: Pandas: Reading CSV and Excel files from AWS S3 (4 examples)

Series: DateFrames in Pandas

Pandas

You May Also Like

  • How to Handle Large Datasets with Pandas and Dask (4 examples)
  • Pandas – Using DataFrame.pivot() method (3 examples)
  • Pandas: How to ‘FULL JOIN’ 2 DataFrames (3 examples)
  • Pandas: Select columns whose names start/end with a specific string (4 examples)
  • 3 ways to turn off future warnings in Pandas
  • How to Integrate Pandas with Apache Spark
  • How to Use Pandas for Web Scraping and Saving Data (2 examples)
  • How to Clean and Preprocess Text Data with Pandas (3 examples)
  • Pandas – Using Series.replace() method (3 examples)
  • Pandas json_normalize() function: Explained with examples
  • Pandas: Reading CSV and Excel files from AWS S3 (4 examples)
  • Using pandas.Series.rank() method (4 examples)
  • Pandas: Dropping columns whose names contain a specific string (4 examples)
  • Pandas: How to print a DataFrame without index (3 ways)
  • Fixing Pandas NameError: name ‘df’ is not defined
  • Pandas – Using DataFrame idxmax() and idxmin() methods (4 examples)
  • Pandas FutureWarning: ‘M’ is deprecated and will be removed in a future version, please use ‘ME’ instead
  • Pandas: Checking equality of 2 DataFrames (element-wise)
  • Understanding pandas.DataFrame.loc[] through 6 examples