ETL (Extract, Transform, Load) processes are critical for migrating data from one datastore to another in many applications. SQLite, a lightweight database manager, can be a powerful tool for planning and prototyping your ETL workflows due to its simplicity and zero-configuration setup.
Setting Up SQLite
To begin, you will need to set up SQLite in your environment. SQLite is serverless; hence, installation is typically straightforward. You can download the SQLite tools from the official website, which includes the command-line shell for accessing databases and the SQLite library itself.
Creating Your Initial Database
Starting with the creation of your initial database, which simulates the source from where data will be extracted, can help visualize the workflow. Assume you have a CSV of sales data which you can import into an SQLite database.
CREATE TABLE sales (
sale_id INTEGER PRIMARY KEY,
item_name TEXT,
item_price REAL,
sale_date TEXT
);
To load data from a CSV, use the following SQLite shell command:
sqlite3 sales.db ".mode csv" ".import sales_data.csv sales"
Extracting Data
The first step in our ETL process involves extracting relevant data from the source. With SQLite, you can write SQL queries to filter and fetch only necessary records:
SELECT * FROM sales WHERE item_price > 100;
This query extracts records of sales with an item price greater than 100, preparing it for the transformation phase.
Transforming Data
Once extracted, data often needs transforming: restructuring, cleansing, or combining. Simple transformations can be achieved using SQL. Consider converting sale_date strings into a more database-friendly format using SQLite functions:
UPDATE sales SET sale_date = date(sale_date);
More complex transformations may require external scripting languages like Python to process data. Here is an example using Python to normalize item prices:
import sqlite3
conn = sqlite3.connect('sales.db')
cur = conn.cursor()
cur.execute('SELECT sale_id, item_price FROM sales')
sales = cur.fetchall()
for sale in sales:
normalized_price = sale[1] * 1.1 # applying a simple normalization
cur.execute('UPDATE sales SET item_price = ? WHERE sale_id = ?', (normalized_price, sale[0]))
conn.commit()
conn.close()
Loading Transformed Data
The final step of the ETL process is loading the transformed data into a destination, often another database or a reporting tool. For demonstration, let's re-export our SQLite data into a JSON format:
import json
import sqlite3
conn = sqlite3.connect('sales.db')
cur = conn.cursor()
cur.execute('SELECT * FROM sales')
sales = cur.fetchall()
columns = [column[0] for column in cur.description]
sales_data = []
for row in sales:
sales_data.append(dict(zip(columns, row)))
with open('sales.json', 'w') as f:
json.dump(sales_data, f, indent=2)
conn.close()
Use this final JSON for further analytics or reporting, concluding your ETL workflow.
Conclusion
SQLite offers a simple yet capable environment for carrying out ETL processes effectively. By honing SQL queries and using extensions like Python, you can efficiently extract, transform, and load data with minimal setup.