Sling Academy
Home/PostgreSQL/PostgreSQL: Using json_agg to aggregate data

PostgreSQL: Using json_agg to aggregate data

Last updated: January 06, 2024

Introduction

JSON aggregation in PostgreSQL is a powerful tool to structure query results in JSON format directly, often useful for API responses. With json_agg, we can summarize data as a JSON array of records.

Getting Started with json_agg

The json_agg function in PostgreSQL takes a record or a value as its argument and aggregates the results into a JSON array. Here’s a simple example:

SELECT json_agg(name) FROM customers;

Assuming customers is a table with a column name, this query would return all customer names as a JSON array.

Aggregating Multiple Columns

Not limited to a single column, json_agg can encapsulate multiple columns:

SELECT json_agg(row(name, email)) AS customer_info FROM customers;

With this, each customer’s name and email are aggregated into a JSON array of records.

Nested JSON Structures

You’re not confined to basic arrays. Nested JSON objects can be achieved like this:

SELECT json_agg(
    JSON_BUILD_OBJECT(
        'name', name,
        'email', email,
        'orders', (SELECT json_agg(product_name) FROM orders WHERE orders.customer_id = customers.id)
    )
) AS customer_data
FROM customers;

This query gathers a JSON array of customers, where each customer object includes an array of ordered products.

Joining Tables

json_agg becomes dramatically more powerful when you start joining tables:

SELECT categories.name, json_agg(products.*)
FROM categories
JOIN products ON products.category_id = categories.id
GROUP BY categories.name;

Every category will have an associated JSON array of products.

Filtering Results

You can also filter which records to include in your JSON array:

SELECT json_agg(
    JSON_BUILD_OBJECT(
        'name', name,
        'price', price
    ))
FROM products
WHERE price > 50.00;

All products costing more than $50 are included in the aggregated JSON.

Adding Order to Aggregated Data

To order the data within the json_agg, use a subquery:

SELECT json_agg(product ORDER BY price DESC) FROM (
    SELECT JSON_BUILD_OBJECT('name', name, 'price', price) AS product FROM products
) sub;

The resulting JSON array is ordered by price in descending order.

Handling NULL values

What if you have NULL values in your dataset? You can choose how to represent these using coalesce:

SELECT json_agg(
    COALESCE(name, 'Unknown name')
) FROM customers;

null names are now represented as ‘Unknown name’ in the JSON array.

Conclusion

The use of json_agg to produce structured JSON directly from SQL queries opens up great flexibility for delivering data. It is indispensable for modern, data-driven applications where PostgreSQL can hand off well-formed JSON to backend services.

Next Article: PostgreSQL: Update and Delete JSON Data

Previous Article: How to Drop an Index in PostgreSQL

Series: PostgreSQL Tutorials: From Basic to Advanced

PostgreSQL

You May Also Like

  • PostgreSQL with TimescaleDB: Querying Time-Series Data with SQL
  • PostgreSQL Full-Text Search with Boolean Operators
  • Filtering Stop Words in PostgreSQL Full-Text Search
  • PostgreSQL command-line cheat sheet
  • How to Perform Efficient Rolling Aggregations with TimescaleDB
  • PostgreSQL with TimescaleDB: Migrating from Traditional Relational Models
  • Best Practices for Maintaining PostgreSQL and TimescaleDB Databases
  • PostgreSQL with TimescaleDB: Building a High-Performance Analytics Engine
  • Integrating PostgreSQL and TimescaleDB with Machine Learning Models
  • PostgreSQL with TimescaleDB: Implementing Temporal Data Analysis
  • Combining PostgreSQL, TimescaleDB, and Airflow for Data Workflows
  • PostgreSQL with TimescaleDB: Visualizing Real-Time Data with Superset
  • Using PostgreSQL with TimescaleDB for Energy Consumption Analysis
  • PostgreSQL with TimescaleDB: How to Query Massive Datasets Efficiently
  • Best Practices for Writing Time-Series Queries in PostgreSQL with TimescaleDB
  • PostgreSQL with TimescaleDB: Implementing Batch Data Processing
  • Using PostgreSQL with TimescaleDB for Network Traffic Analysis
  • PostgreSQL with TimescaleDB: Troubleshooting Common Performance Issues
  • Building an IoT Data Pipeline with PostgreSQL and TimescaleDB