PostgreSQL is renowned for its robust relational database capabilities. When paired with extensions like TimescaleDB, it transforms into a powerhouse for time-series data processing. In this article, we will explore how pg_prometheus
, an extension of TimescaleDB, can be utilized for efficient metrics storage.
Understanding TimescaleDB
TimescaleDB is an open-source time-series database that runs as a PostgreSQL extension. It is optimized for time-series operations such as data ingest, complex queries, and support for out-of-the-box database functionalities. To get started, make sure you have PostgreSQL and TimescaleDB installed on your server.
sudo apt install postgresql postgresql-contrib
sudo apt install timescaledb-postgresql-12
Introducing pg_prometheus
pg_prometheus
is a TimescaleDB extension that provides exporters for your PostgreSQL metrics with the Prometheus platform, specifically designed for high fidelity time-series data. By utilizing pg_prometheus
, you can enhance the capability of Prometheus in both data efficiency and storage.
Install the extension with:
CREATE EXTENSION prometheus;
pg_prometheus
leverages tables such as prom_data
for storing Prometheus metrics effectively. Notably, the combination of TimescaleDB and pg_prometheus
optimizes for inflow data, making them the ideal duo for metrics.
Setting Up pg_prometheus
Installing TimescaleDB with PostgreSQL is just the first step. The following example outlines steps to get pg_prometheus
up and running:
Step 1: Create a Database
CREATE DATABASE my_metrics;
Step 2: Enable Extensions
\c my_metrics
CREATE EXTENSION IF NOT EXISTS timescaledb CASCADE;
CREATE EXTENSION IF NOT EXISTS prometheus;
Step 3: Create Prometheus Data Table
Create a hypertable for storing the data provided by Prometheus:
SELECT create_hypertable('prom_data', 'time');
Hypertables in TimescaleDB are similar to traditional tables, but they partition data automatically based on time intervals, improving performance in querying and management.
Ingesting Metrics
Now that your environment is ready, you will want to populate your hypertable prom_data
with the data. Dedicate a simple Python script to ingest data from Prometheus to TimescaleDB:
import requests
import psycopg2
PROMETHEUS_ENDPOINT = "http://localhost:9090/api/v1/query?query=up"
POSTGRES_CONN = "dbname='my_metrics' user='username' host='localhost' password='password'"
# Fetch data from Prometheus
response = requests.get(PROMETHEUS_ENDPOINT)
if response.status_code == 200:
data = response.json()
connection = psycopg2.connect(POSTGRES_CONN)
cursor = connection.cursor()
# Iterate over Prometheus data and insert into PostgreSQL
for result in data['data']['result']:
time_series = result['metric']
values = result['value']
query = """
INSERT INTO prom_data (time, metric_name, value)
VALUES (%s, %s, %s);
"""
cursor.execute(query, (values[0], time_series['__name__'], float(values[1])))
connection.commit()
cursor.close()
connection.close()
Modify the connection string and endpoint to match your setup. This will send HTTP requests to the Prometheus API and store the resulting data in the PostgreSQL database.
Querying and Managing Data
One advantage of pairing TimescaleDB with PostgreSQL and pg_prometheus
is the ability to handle queries with standard SQL.
For example, to fetch averaged metrics over the last 24 hours:
SELECT time_bucket('1 hour', time) as hour,
avg(value)
FROM prom_data
WHERE time > now() - interval '24 hours'
GROUP BY hour
ORDER BY hour;
This query breaks down the data into hourly buckets, aggregating the results, which is efficient and insightful for identifying trends in metrics over any specified period.
Conclusion
Combining PostgreSQL with TimescaleDB and pg_prometheus
can be a powerful setup for metrics storage and analysis, scaling efficiently even in high-load scenarios. This integration enables seamless metrics collection and querying, equipping teams with the tools necessary for detailed and real-time insights into their data environments.