Sling Academy
Home/Pandas/Pandas data types cheat sheet

Pandas data types cheat sheet

Last updated: February 19, 2024

Introduction

Pandas, the popular Python library for data analysis, offers a range of data types for handling data efficiently. Understanding these types is crucial for data manipulation and analysis. This cheat sheet attempts to provide a comprehensive guide to Pandas data types, from basic to advanced, with ample code examples.

Pandas Data Types – Cheat Sheet

Pandas is built on top of NumPy, thus it inherits its data types and also adds more specificity for handling diverse data formats, including mixed data types.

Basic Data Types

  • object: For storing text/string data.
  • int64: For integer numbers.
  • float64: For floating-point numbers.
  • bool: For True/False values.
  • datetime64: For date and time values.
  • timedelta[ns]: For differences between two datetimes.
  • category: For categorical values.
import pandas as pd

df = pd.DataFrame({'A': ['foo', 'bar', 'foo', 'bar'],
                   'B': [1, 2, 3, 4],
                   'C': [2.5, 3.5, 4.5, 5.5],
                   'D': [True, False, True, False],
                   'E': pd.to_datetime(['2023-01-01', '2023-01-02', '2023-03-01', '2023-04-01']),
                   'F': pd.to_timedelta(['1 days', '2 days', '3 days', '4 days']),
                   'G': pd.Categorical(['test', 'train', 'test', 'train'])})

print(df.dtypes)

The output showcases different data types in a Pandas DataFrame:

A            object
B              int64
C            float64
D              bool
E     datetime64[ns]
F   timedelta[ns]
G         category

Handling Custom Data Types

Beyond basic types, Pandas allows for more complex data type handling, making it a flexible tool for data analysis.

Converting Data Types

Converting data types in Pandas is straightforward. This can optimize memory usage and ensure proper handling of operations.

df['B'] = df['B'].astype('float64')
print(df['B'].dtypes)

The code above converts column ‘B’ from int64 to float64.

Using Extension Types

Pandas supports extension types such as Int64 (capital ‘I’) for nullable integer data, allowing for missing values handling.

import pandas as pd

df['B'] = pd.array([1, None, 3, 4], dtype=pd.Int64Dtype())
print(df['B'])

Output will display the use of nullable integers:

0       1
1    <NA>
2       3
3       4
Name: B, dtype: Int64

Advanced Data Types Handling

For more complex use cases, understanding and leveraging Pandas’ advanced data types is essential.

Working with Time Series Data

Time series data is common in finance, weather forecasting, and more. Pandas excels at handling these types with datetime64 and timedelta64.

rng = pd.date_range(start='1/1/2023', end='1/10/2023', freq='D')
tseries = pd.Series(range(len(rng)), index=rng)
print(tseries)

Dealing with Text Data

Text data can be manipulated using the object data type and various string methods provided by Pandas.

df['A'] = df['A'].str.upper()
print(df['A'])

This code sample demonstrates converting text data in column ‘A’ to uppercase.

Conclusion

Understanding Pandas data types is foundational for proficient data analysis and manipulation. This cheat sheet, with its step-by-step guide from basic to advanced data types, aims to enhance your Pandas skills. By mastering these data types and their conversions, you’re well on your way to handling various data processing tasks more efficiently.

Next Article: Pandas: Convert a list of dicts into a DataFrame

Previous Article: Pandas DataFrame Cheat Sheet

Series: DateFrames in Pandas

Pandas

You May Also Like

  • How to Use Pandas Profiling for Data Analysis (4 examples)
  • How to Handle Large Datasets with Pandas and Dask (4 examples)
  • Pandas – Using DataFrame.pivot() method (3 examples)
  • Pandas: How to ‘FULL JOIN’ 2 DataFrames (3 examples)
  • Pandas: Select columns whose names start/end with a specific string (4 examples)
  • 3 ways to turn off future warnings in Pandas
  • How to Integrate Pandas with Apache Spark
  • How to Use Pandas for Web Scraping and Saving Data (2 examples)
  • How to Clean and Preprocess Text Data with Pandas (3 examples)
  • Pandas – Using Series.replace() method (3 examples)
  • Pandas json_normalize() function: Explained with examples
  • Pandas: Reading CSV and Excel files from AWS S3 (4 examples)
  • Using pandas.Series.rank() method (4 examples)
  • Pandas: Dropping columns whose names contain a specific string (4 examples)
  • Pandas: How to print a DataFrame without index (3 ways)
  • Fixing Pandas NameError: name ‘df’ is not defined
  • Pandas – Using DataFrame idxmax() and idxmin() methods (4 examples)
  • Pandas FutureWarning: ‘M’ is deprecated and will be removed in a future version, please use ‘ME’ instead
  • Pandas: Checking equality of 2 DataFrames (element-wise)