PyMongo: How to Define and Use Custom Types

Updated: February 12, 2024 By: Guest Contributor Post a comment

Overview

Working with MongoDB in Python becomes remarkably intuitive with the aid of PyMongo, the official MongoDB Python driver. However, as developers dive deeper, they might find themselves in need of storing and querying data of custom types that aren’t natively supported by MongoDB. This tutorial guides you through the process of defining and using custom types in PyMongo, which can significantly extend its utility in real-world applications.

Getting Started

Firstly, ensure that PyMongo is installed in your Python environment. You can install PyMongo using pip:

pip install pymongo

With PyMongo installed, the initial step involves understanding how MongoDB manages custom types. MongoDB stores documents in BSON format, which supports various data types, including integers, strings, and dates. However, for custom types, we use the Binary format, converted from Python objects using serialization and deserialization techniques.

Defining Custom Types

Let’s define a simple custom type. Imagine a Python application that manages books, including a field for book dimensions which isn’t a supported BSON type.

class BookDimension:
    def __init__(self, width, height, depth):
        self.width = width
        self.height = height
        self.depth = depth

    def __repr__(self):
        return f'BookDimension(width={self.width}, height={self.height}, depth={self.depth})'

This custom type needs to be serialized before being stored in MongoDB. PyMongo allows for this through the use of custom encoders and decoders.

Serializing Custom Types

To serialize the BookDimension object into a BSON-friendly format, you can use the `bson.Binary` class along with Python’s `pickle` module for the serialization process.

import pickle
from bson.binary import Binary

def serialize_book_dimension(book_dimension):
    return Binary(pickle.dumps(book_dimension))

Deserializing Custom Types

Equally important is the ability to recover the original Python object from the stored Binary data. This process is known as deserialization.

def deserialize_book_dimension(book_dimension_binary):
    return pickle.loads(book_dimension_binary)

With these functions in hand, you can now insert documents containing custom types into your MongoDB database.

Inserting Documents with Custom Types

To demonstrate, let’s consider a MongoDB collection named ‘books’ and insert a document including our custom type.

from pymongo import MongoClient

client = MongoClient('mongodb_connection_string')
books_collection = client.mydatabase.books

document = {
    'title': 'Python Programming',
    'dimensions': serialize_book_dimension(BookDimension(7, 10, 1.5))
}

books_collection.insert_one(document)

This document will now contain the ‘dimensions’ field as Binary data, which could be deserialized back to a BookDimension object upon retrieval.

Retrieving and Using Custom Types

To retrieve documents containing custom types, you’ll essentially reverse the serialization process.

document = books_collection.find_one({'title': 'Python Programming'})
deserialized_dimensions = deserialize_book_dimension(document['dimensions'])

print(deserialized_dimensions)

This will output something similar to:

BookDimension(width=7, height=10, depth=1.5)

Advanced Use Case: Custom Type with Query Support

For more advanced scenarios, such as querying based on attributes of a custom type, you will need additional strategies, such as storing serialized data alongside queryable fields or using MongoDB’s aggregation framework for more complex queries.

For instance, storing dimensions separately for querying could look like this:

document = {
    'title': 'Python Programming',
    'dimensions': serialize_book_dimension(BookDimension(7, 10, 1.5)),
    'width': 7,
    'height': 10,
    'depth': 1.5
}

books_collection.insert_one(document)

With this structure, it’s possible to query the collection based on height, width, or depth directly.

Conclusion

Incorporating and managing custom types in a MongoDB powered application using PyMongo requires understanding of serialization and deserialization processes. This knowledge enables the seamless integration and querying of complex data structures, providing enhanced flexibility for your applications. By carefully implementing these techniques, developers can efficiently extend the capabilities of their MongoDB collections.