How to Work with Time Series Data in NumPy

Updated: January 23, 2024 By: Guest Contributor Post a comment

Introduction

Time series data is a sequence of data points collected or recorded at equally spaced time intervals. This type of data is common in finance, economics, environmental science, and more. Handling time series data effectively can provide valuable insights when analyzing trends, forecasting, and making data-driven decisions. In this tutorial, we will explore how to work with time series data in NumPy, one of the fundamental packages for scientific computing with Python.

Getting Started

Before diving into time series data, we need to ensure that NumPy is installed in your Python environment. You can install NumPy by running pip install numpy.

import numpy as np

With NumPy installed and imported, let’s start by creating some basic time series data.

Creating a Time Series

# Generate a simple time series using numpy array
import numpy as np

# Let's create a time series that represents the daily temperature
# For simplicity, we start with an array of zeros
num_days = 10
temperatures = np.zeros(num_days)
print(temperatures)

Output:

[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]

Each element in the temperatures array can represent the temperature recorded on a particular day.

Indexing Time Series Data

Time series data is often accessed by its index, which typically corresponds to the time component. In our daily temperature example, if we want to access the temperature on the third day, we would do the following:

# Access the temperature on the third day
third_day_temp = temperatures[2] # Remember, indexing starts at 0
print(third_day_temp)

Output:

0.0

Performing Operations on Time Series Data

Now, let’s assign some values to our time series data and perform simple calculations.

# Assigning values to the temperature array
# Let's assume these temperatures are for ten consecutive days
temperatures = np.array([23,25,20,22,18,17,21,20,23,22])

# Calculate the mean temperature
mean_temperature = np.mean(temperatures)
print("Mean temperature:", mean_temperature)

Output:

Mean temperature: 21.1

We can also find the maximum and minimum temperatures, calculate differences, and more.

Analyzing Changes Over Time

To analyze changes in our time series data, such as temperature differences from one day to the next, we can utilize NumPy’s array operations. Let’s calculate the day-to-day differences in temperatures.

# Calculate day-to-day differences
temperature_diff = np.diff(temperatures)
print("Temperature differences:", temperature_diff)

Output:

Temperature differences: [ 2 -5  2 -4 -1  4 -1  3 -1]

Working with Dates and Times

NumPy itself doesn’t have built-in support for date and time series. However, we can use numpy.datetime64 and numpy.timedelta64 to work with dates and durations in NumPy arrays.

# Create a range of dates
start_date = np.datetime64('2023-01-01')
end_date = np.datetime64('2023-01-10')

# Generate an array of dates
dates = np.arange(start_date, end_date)
print(dates)

Output:

['2023-01-01' '2023-01-02' '2023-01-03' '2023-01-04' '2023-01-05' '2023-01-06' '2023-01-07' '2023-01-08' '2023-01-09']

Combining Dates with Time Series Data

We can combine our array of dates with the temperature array to create a structured array that represents our time series data with dates. Let’s see how it’s done.

# Combine dates with the temperature data
combined_data = np.zeros(dates.size, dtype={'names':('date', 'temperature'), 'formats':('datetime64[D]', 'f8')})
combined_data['date'] = dates
combined_data['temperature'] = temperatures[:dates.size] # Assuming we have matching temperature data
print(combined_data)

Output:

[('2023-01-01', 23.) ('2023-01-02', 25.) ('2023-01-03', 20.) ('2023-01-04', 22.)
 ('2023-01-05', 18.) ('2023-01-06', 17.) ('2023-01-07', 21.) ('2023-01-08', 20.)
 ('2023-01-09', 23.)]

Advanced Time Series with NumPy

For more advanced time series analysis, we might need to perform operations like filtering, aggregation, and custom transformations. NumPy provides us with the flexibility to perform these with ease.

Filtering Based on Date

Sometimes, we might want to analyze data for a specific time frame. We can apply filters using Boolean indexing based on the date.

# Filter data for temperatures after '2023-01-05'
filter_after_specific_date = combined_data['date'] > '2023-01-05'
print(combined_data[filter_after_specific_date])

Output:

[('2023-01-06', 17.) ('2023-01-07', 21.) ('2023-01-08', 20.) ('2023-01-09', 23.)]

Applying Aggregations Over Time Windows

Often, we might be interested in understanding trends across various time windows. Although NumPy doesn’t provide direct functions for rolling windows, we can use stride tricks for this purpose.

Example:

import numpy as np
from numpy.lib.stride_tricks import sliding_window_view

# Generate a sample time series data
# For example, daily temperature readings over 30 days
time_series = np.random.randint(20, 40, size=30)

# Define the window size for aggregation
window_size = 7  # A week

# Using stride tricks to create rolling windows
rolling_windows = sliding_window_view(time_series, window_shape=window_size)

# Applying aggregations over each rolling window
# Calculate the mean temperature for each week
weekly_averages = np.mean(rolling_windows, axis=1)

# Print the results
print("Original Time Series Data:", time_series)
print("Weekly Average Temperatures:", weekly_averages)

# The output gives us a weekly moving average of temperatures.
# Note: The length of `weekly_averages` will be `len(time_series) - window_size + 1`

In this code snippet:

  • We first generate a sample time series dataset representing daily temperature readings.
  • We define a window size (e.g., 7 days) for our rolling aggregation.
  • We use numpy.lib.stride_tricks.sliding_window_view to create rolling windows from the time series data. This function allows us to efficiently create a view of the data in rolling windows without copying it.
  • We then apply an aggregation function (mean) to each rolling window to compute the weekly average temperatures.
  • Finally, we print the original time series data and the computed weekly averages.

Conclusion

In this tutorial, we have gone over various methods to work with time series data in NumPy. We discussed generating and manipulating numerical data in the context of time series and reviewed handling dates and times. By now, you should be comfortable with the basics of time series analysis in NumPy and ready to apply these techniques in your data science projects. As you further your understanding, always explore NumPy’s documentation for the latest features and functions.