Introduction
The DataFrame.truncate()
method in Pandas is a handy function for slicing portions of DataFrames or Series between specified dates or between particular row/column numbers. It can be particularly useful in time series analysis or when working with large datasets where you need to focus on specific intervals. This tutorial will walk you through the basics to more advanced use cases of the truncate()
method, complete with examples to help you understand how to incorporate this function into your data manipulation toolkit.
Syntax & Parameters
The truncate()
method is used to truncate a Series or DataFrame before and after some index values. This is mainly used for slicing time series data, but it can also work with generic index spans. Its syntax is quite straightforward:
Pandas.DataFrame.truncate(before=None, after=None, axis=None, copy=True)
Where:
before
andafter
denote the truncation limits.axis
specifies the truncation direction (0 for rows and 1 for columns).copy
indicates whether to return a copy of the truncated data or perform the operation in-place.
Basic Example
Let’s start with a basic example, where we have a DataFrame representing daily sales of a store:
import pandas as pd
rom datetime import datetime
df = pd.DataFrame({
'date': pd.date_range(start='2023-01-01', periods=10),
'sales': [234, 456, 324, 456, 678, 234, 590, 789, 456, 123]
})
d f.set_index('date', inplace=True)
print(df)
The output will be:
sales
2023-01-01 234
2023-01-02 456
2023-01-03 324
... ...
2023-01-09 456
2023-01-10 123
To truncate this DataFrame to only include sales from January 3 to January 8,
df_truncated = df.truncate(before='2023-01-03', after='2023-01-08')
print(df_truncated)
The truncated DataFrame:
sales
2023-01-03 324
2023-01-04 456
... ...
2023-01-08 789
Truncating Columns
Truncation is not limited to rows. You can also truncate columns by specifying the axis=1
argument. Consider a DataFrame with multiple columns:
df = pd.DataFrame({
'A': range(1, 11),
'B': range(11, 21),
'C': range(21, 31),
'D': range(31, 41)
})
df_truncated = df.truncate(before='B', after='C', axis=1)
print(df_truncated)
The resulting DataFrame will show only columns B and C:
B C
0 11 21
1 12 22
... ...
8 19 29
9 20 30
Working with Time Series Data
When dealing with time series data, the truncate()
method becomes exceptionally powerful. For datasets with DatetimeIndex, you can precisely cut the dataset to your time window of interest. Let’s work on a more complex dataset, a time series of hourly temperature readings:
temperature_df = pd.DataFrame({
'datetime': pd.date_range(start='2023-01-01', periods=24, freq='H'),
'temperature': [22, 21, 23, 24, 22, 20, 19, 18, 17, 22, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 28, 27, 26, 25]
})
temperature_df.set_index('datetime', inplace=True)
df_truncated = temperature_df.truncate(before='2023-01-01 09:00:00', after='2023-01-01 17:00:00')
print(df_truncated)
This results in a DataFrame that contains temperature readings from 9 AM to 5 PM:
temperature
2023-01-01 09:00:00 17
2023-01-01 10:00:00 22
... ...
2023-01-01 17:00:00 27
Advanced Uses: Truncating Based on Custom Indices
In addition to regular numeric and date indices, truncate()
can also be applied to DataFrames with custom index types. Suppose you have a DataFrame indexed by some category with an inherent order, such as business stages in a pipeline. You can truncate this DataFrame to focus on a particular stage range:
df = pd.DataFrame({
'stage': ['Lead', 'Opportunity', 'Negotiation', 'Closure'],
'value': [345, 810, 675, 935]
}).set_index('stage')
df_truncated = df.truncate( before='Opportunity', after='Negotiation')
print(df_truncated)
This will output:
value
Opportunity 810
Negotiation 675
Conclusion
Through this guide, we have seen how the truncate()
method in Pandas can be a powerful tool for data slicing, especially when working with time series data. The examples provided here span from the basic to more advanced applications, showcasing its flexibility and utility across different types of data. Armed with truncate()
, you’re now better equipped to handle data slicing tasks with precision.