Fixing Pandas Error – OutOfBoundsDatetime: Out of bounds nanosecond timestamp

Updated: February 21, 2024 By: Guest Contributor Post a comment

Understanding the Error

Pandas is a powerful and widely used Python library for data manipulation and analysis. One common error encountered when working with time series data in pandas is the OutOfBoundsDatetime error. This error occurs when a datetime object or string that is being converted to a pandas datetime type is out of the permissible range of pandas Timestamp, which is limited to roughly between the years 1677 and 2262.

This tutorial will discuss the reasons behind the OutOfBoundsDatetime error and provide practical solutions to address it, ensuring your data analysis workflow is smooth and error-free.

Possible Causes

The primary reason for an OutOfBoundsDatetime error is trying to create a pandas Timestamp object that falls outside the valid DateTime range that pandas supports. The root cause could be an incorrectly formatted datetime string, a conversion from a different datetime format that does not fit within the pandas range, or simply having data that spans beyond the supported years.

Solutions to Fix the ‘OutOfBoundsDatetime’ Error

Solution 1: Use a Different Datetime Format

Convert the datetime object or string to a format that remains within the pandas supported range, such as using a ‘%Y’ format for years only.

Steps:

  1. Identify the problematic datetime values causing the error.
  2. Convert these values into a format that is within the supported range. This could involve truncating parts of the datetime that are not needed for your analysis.
  3. Use the pandas to_datetime function to convert the adjusted datetime strings into pandas datetime objects.

Code Example:

# Assuming 'df' is your DataFrame and 'date_column' is the problematic column
# Convert year-month-day to year only
df['date_column'] = pd.to_datetime(df['date_column'].dt.year, format='%Y')

Notes: This approach is a quick fix but limits the granularity of your datetime data. It is suitable for analyses where only the year or another higher-level datetime component is relevant.

Solution 2: Handle Out-of-Range Dates as NaT (Not a Time)

Assign the pd.NaT value to dates that fall outside the pandas Timestamp range instead of trying to fit them into a valid range.

Steps:

  1. Define a function to check the datetime value’s validity against the pandas Timestamp range.
  2. If the datetime value is out of range, return pd.NaT.
  3. Else, return the original datetime value.
  4. Apply this function across the datetime column in your DataFrame.

Code Example:

import pandas as pd

def replace_out_of_bounds_dates(date):
    try:
        return pd.Timestamp(date)
    except ValueError:
        return pd.NaT

df['date_column'] = df['date_column'].apply(replace_out_of_bounds_dates)

Notes: This method preserves the integrity of the dataset by acknowledging missing or out-of-range data instead of forcing a fit. The key drawback is that it may result in a significant amount of missing data, depending on your dataset.

Solution 3: Use Custom Epochs

One can bypass the Timestamp limitation by using custom epochs. An epoch is a reference point from which time is measured.

Steps:

  1. Choose a custom epoch that is close to your dataset’s date range but still within the pandas Timestamp limits.
  2. Convert your datetime data to the total number of time units (e.g., days, seconds) since your chosen epoch.
  3. Operate on this numeric representation of time in your analyses.

Code Example:

# Example custom epoch: January 1, 2000
import numpy as np

def datetime_to_epoch(date, epoch=pd.Timestamp('2000-01-01')):
    return (pd.to_datetime(date) - epoch) / np.timedelta64(1, 'D')

df['date_column_epoch'] = df['date_column'].apply(datetime_to_epoch)

Notes: This approach is flexible and allows for analysis across a broad range of dates. However, it requires converting back to standard datetime formats for human-readable outputs, adding complexity to data handling.