Integrating yfinance or pandas-datareader with Zipline

In the world of algorithmic trading, backtesting is a crucial step that evaluates the viability of a trading strategy by analyzing historical data. Zipline is an open-source backtesting library that enables users to write and test their trading algorithms. However, one challenge is sourcing the historical data needed for testing. Fortunately, Python libraries such as yfinance and pandas-datareader provide convenient access to financial data, and they can be integrated with Zipline to fetch data seamlessly.

Getting Started
Using yfinance
Using pandas-datareader
Integrating With Zipline

Getting Started

Before diving into integrating these libraries, ensure that you've installed them in your Python environment. You can use pip to do this:

pip install yfinance pandas-datareader

To use Zipline, you'll follow a similar process:

pip install zipline-reloaded # If using the community fork of Zipline

Using yfinance

The yfinance library allows you to easily download historical market data from Yahoo Finance. Here’s a simple way to fetch stock data:


import yfinance as yf

# Fetch historical data for Apple Inc.
apple_data = yf.download('AAPL', start='2022-01-01', end='2023-01-01')

print(apple_data.head())

This snippet fetches Apple’s daily stock data from January 1, 2022, to January 1, 2023, and prints the first few rows. To utilize this data in Zipline, you'd want to customize the data loading system Zipline uses with its custom data bundles.

Using pandas-datareader

pandas-datareader is another powerful tool for accessing stock data. It has support for multiple sources, including Yahoo Finance, and it integrates well with pandas DataFrame making it a popular choice among analysts and developers.


import pandas_datareader.data as web
from datetime import datetime

# Define date range
start = datetime(2022, 1, 1)
end = datetime(2023, 1, 1)

# Fetch stock data
apple_data = web.DataReader('AAPL', 'yahoo', start, end)

print(apple_data.head())

This example demonstrates how to fetch the same data as above but using pandas-datareader. Integrating this with Zipline involves a similar process to yfinance.

Integrating With Zipline

To integrate the data fetched from these APIs, you must format it according to Zipline’s data bundle requirements. You follow these steps:

Create a DataFrame with OHLCV (Open, High, Low, Close, Volume) data columns.
Write a custom ingestion script that wraps this data in Zipline’s data bundle format.
Register the ingestion script with Zipline.

The data bundle format requires data to be in minute equity data stored in MDSDataReader format. For smaller datasets or testing purposes, daily data is often preferable:


from zipline.data.bundles import register
from zipline.data.bundles.yahoofinance import yahoo_equities

start_session = pd.Timestamp('2022-01-01', tz='utc')
end_session = pd.Timestamp('2023-01-01', tz='utc')

def yahoo_bundle(environ, asset_db_writer, minute_bar_writer, daily_bar_writer,
                 adjustment_writer, calendar, start_session, end_session, cache,
                 show_progress, output_dir):
    metadata = pd.DataFrame({
        'start_date': [start_session],
        'end_date': [end_session],
        'symbol': ['AAPL']
    })

    daily_bars = {
        'AAPL': apple_data  # Assuming it's the DataFrame you've fetched and pre-processed with yfinance/pandas-datareader
    }

    yahoo_equities(daily_bar_writer, sessions=calendar.sessions_in_range(start_session, end_session), daily_bars=daily_bars)

register('yahoo_data_bundle', yahoo_bundle)

After creating the custom data bundle, you can run Zipline simulations using:

zipline run -b yahoo_data_bundle -s 2022-01-01 -e 2023-01-01

Integrating yfinance or pandas-datareader with Zipline will significantly enhance your algorithmic trading backtesting workflows by providing easy access to rich historical data and more accurate simulations. With Zipline, your trading strategies can be battle-tested with large arrays of financial data from multiple sources.

Next Article: Analyzing Performance and Risk with Zipline’s Built-in Tools

Previous Article: Handling Common Data Ingestion Issues in Zipline

Series: Algorithmic trading with Python

Python