Working with Binary data in MongoDB (with examples)

Updated: February 2, 2024 By: Guest Contributor Post a comment

Introduction

Binary data can encompass a variety of data types, such as images, files, and blobs that are not natively supported by JSON document structures. However, MongoDB offers a solution in the form of Binary JSON (BSON), an extension of the JSON format which allows for the encoding of binary data and other types not natively supported in JSON.

In this article, we will explore how to work with binary data in MongoDB by discussing relevant BSON data types, showcasing how to insert and retrieve binary data with actual code examples, and offering insights into best practices for binary data management within MongoDB databases.

Understanding BSON

BSON extends the JSON model to provide additional data types, such as the ‘binData’ type, which is specifically designed to hold binary data. Binary data is represented in BSON as a base64-encoded string along with a subtype byte that describes the type of data that the binary represents. Subtypes can signal generic binary data, UUIDs, MD5 hashes, and user-defined binary types.

Prerequisites

Before proceeding with the examples, please ensure you have:

  • An installation of MongoDB.
  • A familiarity with MongoDB operations.
  • A programming environment set up for MongoDB with preferred Drivers and tools like Compass or Mongo Shell.

Basic CRUD Operations with Binary Data

CRUD operations with binary data are similar to working with any other BSON data type within MongoDB. The following are basic code examples highlighting operations you can perform with binary data:

Inserting Binary Data

The MongoDB Drivers provide a way to create a Binary object by directly inserting binary data:

// MongoDB JavaScript Shell
const fs = require('fs');

// Read binary data from a file
const binaryData = fs.readFileSync('/path/to/your/file');

// Save binary data to MongoDB
db.collection('binaries').insertOne({
    file_data: new BinData(0, binaryData.toString('base64'))
});

When you insert binary data, it is stored as a binData object in the database. The zero in the BinData constructor represents the binary data subtype; in this case, ‘generic binary’.

Querying Binary Data

To retrieve binary data, you can query the collection like any other:

// MongoDB JavaScript Shell
const binaryDoc = db.collection('binaries').findOne();
const fileData = binaryDoc.file_data;

// Convert BinData to a binary buffer
const buffer = Buffer.from(fileData, 'base64');

In this example, we convert the base64 BinData string back to a binary buffer to work with the data in its original form. This buffer can now be saved to a file, processed, or manipulated as needed.

Working with Specific Binary Subtypes

MongoDB defines multiple binary subtypes to categorize the binary data. Each subtype suggests how the binary data might be utilized. A common subtype is ’04’ for UUIDs.

Inserting and Retrieving UUIDs

Enforcing the UUID subtype can ensure applications interpret binary data as intended:

// MongoDB JavaScript Shell
const { Binary, UUID } = require('mongodb')

// Create UUID instance
const uuid = UUID();

// Save UUID to MongoDB
db.collection('uuids').insertOne({
    uniqueId: new Binary(uuid, Binary.SUBTYPE_UUID)
});

// Retrieve UUID from MongoDB
const document = db.collection('uuids').findOne();
const myUuid = new UUID(document.uniqueId.buffer);

In these examples, we manipulate the binary data as an UUID object. When retrieving the data, we ensure that the application correctly interprets it as a UUID, not just a set of bytes.

Handling Larger Files

For larger binary files (such as images or video clips), it may be more efficient to use MongoDB’s GridFS specification, which avoids issues with the size limits of BSON documents by storing files in chunks.

Using GridFS to Store Large Files

The gridfs module allows for the streamlined storage and retrieval of large files in MongoDB:

// JavaScript using MongoDB Driver with GridFS
const { MongoClient, GridFSBucket } = require('mongodb');
const fs = require('fs');

// Connect to the MongoDB client
const client = new MongoClient('connection_string');

async function uploadFile(path) {
    const db = client.db('database_name');
    const bucket = new GridFSBucket(db);

    // Read the file stream and upload
    fs.createReadStream(path).pipe(bucket.openUploadStream(path.split('/').pop()))
        .on('error', function(error) {
            console.log('Error:', error);
        })
        .on('finish', function() {
            console.log('File uploaded successfully.');
        });
}

uploadFile('/path/to/large/file');

Likewise, downloading a file is simply a matter of streaming the file from the database back to the local file system or to another destination.

Best Practices for Managing Binary Data

Here are some best practices to follow when working with binary data in MongoDB:

  • Use the appropriate binary subtype to aid in data interpretation.
  • Leverage GridFS for storing files larger than the BSON document size limit, which is currently 16MB.
  • Be cognizant of memory usage when working with large binary data in your application.
  • When possible, save references to binary data if the data is reused across documents, rather than duplicating the data itself.

Conclusion

By following this tutorial, you should be more comfortable storing, retrieving, and managing binary data in MongoDB. Leveraging BSON’s capabilities and MongoDB’s GridFS, managing binary data alongside traditional JSON documents is efficient and practical for a wide variety of application needs.