PyMongo bulk insert: How to insert a list of documents/dicts

Updated: February 6, 2024 By: Guest Contributor Post a comment

Introduction

In the world of MongoDB, a popular document-oriented NoSQL database, efficiently inserting multiple documents into a collection is a common operation. This becomes even more crucial when dealing with large datasets or in scenarios requiring quick data insertion. Python, with its PyMongo library, provides robust methods for interacting with MongoDB, including the ability to perform bulk inserts. In this article, we will explore how to effectively use PyMongo for bulk inserting a list of documents or dictionaries into MongoDB.

Getting Started with PyMongo

Before diving into the bulk insert operation, it’s essential to have MongoDB installed and running, and the PyMongo library installed in your Python environment. You can install PyMongo using pip:

pip install pymongo

Connecting to MongoDB

To perform any operations with PyMongo, you first need to establish a connection with your MongoDB database:

from pymongo import MongoClient
client = MongoClient('localhost', 27017)
db = client['mydatabase']
collection = db['mycollection']

Basic Bulk Insert

The most straightforward way to insert multiple documents is by using the insert_many() method. Here’s a simple example that inserts a list of dictionaries into the collection:

documents = [
  {'name': 'John Doe', 'age': 30, 'city': 'New York'},
  {'name': 'Jane Doe', 'age': 25, 'city': 'Los Angeles'}
]

result = collection.insert_many(documents)
print(f'Inserted {result.inserted_ids}')

Handling Duplicates

If your collection has a unique index on one of the fields and you’re trying to insert documents with duplicate values in that field, you’ll encounter a DuplicateKeyError. To handle this, you can use the ordered=False parameter:

documents = [
  {'name': 'John Doe', 'age': 30, 'city': 'New York', '_id': 1},
  {'name': 'Jane Doe', 'age': 25, 'city': 'Los Angeles', '_id': 1}  # Duplicate _id
]

try:
  collection.insert_many(documents, ordered=False)
except pymongo.errors.BulkWriteError as e:
  print(e.details)
  # Handle the error as needed

Advanced Bulk Operations

For more complex bulk operations, you can use the bulk_write() method combined with the InsertOne, UpdateOne, ReplaceOne, or DeleteOne/DeleteMany operations. This allows for maximum flexibility and efficiency. Here’s an example of inserting and updating documents in a single operation:

from pymongo import InsertOne, UpdateOne
bulk_operations = [
  InsertOne({'name': 'Alice', 'age': 29}),
  UpdateOne({'name': 'John Doe'}, {'$set': {'age': 31}})
]

result = collection.bulk_write(bulk_operations)
print(f'Operations completed. Inserted: {result.inserted_count}, Updated: {result.modified_count}')

Performance Considerations

When performing bulk operations, especially with large datasets, there are a few considerations to keep in mind:

  • Batch size: By default, MongoDB splits bulk operations into smaller batches to optimize performance. The maximum batch size is 100,000 operations.
  • Network latency: If you’re inserting a very large amount of data, consider splitting the operation into smaller batches to mitigate potential network latency issues.
  • Write concern: Adjusting the write concern can improve performance but at the potential cost of durability. Be cautious when using a lower write concern.

Conclusion

Using PyMongo’s insert_many() or bulk_write() methods, Python developers can efficiently insert or perform multiple operations on a collection in MongoDB. These methods are powerful tools for dealing with large datasets or complex data manipulation needs. By understanding their nuances and how to manage potential pitfalls, such as duplicates or performance issues, developers can harness the full potential of MongoDB through PyMongo.