Sling Academy
Home/Python/PyMongo bulk insert: How to insert a list of documents/dicts

PyMongo bulk insert: How to insert a list of documents/dicts

Last updated: February 06, 2024

Introduction

In the world of MongoDB, a popular document-oriented NoSQL database, efficiently inserting multiple documents into a collection is a common operation. This becomes even more crucial when dealing with large datasets or in scenarios requiring quick data insertion. Python, with its PyMongo library, provides robust methods for interacting with MongoDB, including the ability to perform bulk inserts. In this article, we will explore how to effectively use PyMongo for bulk inserting a list of documents or dictionaries into MongoDB.

Getting Started with PyMongo

Before diving into the bulk insert operation, it’s essential to have MongoDB installed and running, and the PyMongo library installed in your Python environment. You can install PyMongo using pip:

pip install pymongo

Connecting to MongoDB

To perform any operations with PyMongo, you first need to establish a connection with your MongoDB database:

from pymongo import MongoClient
client = MongoClient('localhost', 27017)
db = client['mydatabase']
collection = db['mycollection']

Basic Bulk Insert

The most straightforward way to insert multiple documents is by using the insert_many() method. Here’s a simple example that inserts a list of dictionaries into the collection:

documents = [
  {'name': 'John Doe', 'age': 30, 'city': 'New York'},
  {'name': 'Jane Doe', 'age': 25, 'city': 'Los Angeles'}
]

result = collection.insert_many(documents)
print(f'Inserted {result.inserted_ids}')

Handling Duplicates

If your collection has a unique index on one of the fields and you’re trying to insert documents with duplicate values in that field, you’ll encounter a DuplicateKeyError. To handle this, you can use the ordered=False parameter:

documents = [
  {'name': 'John Doe', 'age': 30, 'city': 'New York', '_id': 1},
  {'name': 'Jane Doe', 'age': 25, 'city': 'Los Angeles', '_id': 1}  # Duplicate _id
]

try:
  collection.insert_many(documents, ordered=False)
except pymongo.errors.BulkWriteError as e:
  print(e.details)
  # Handle the error as needed

Advanced Bulk Operations

For more complex bulk operations, you can use the bulk_write() method combined with the InsertOne, UpdateOne, ReplaceOne, or DeleteOne/DeleteMany operations. This allows for maximum flexibility and efficiency. Here’s an example of inserting and updating documents in a single operation:

from pymongo import InsertOne, UpdateOne
bulk_operations = [
  InsertOne({'name': 'Alice', 'age': 29}),
  UpdateOne({'name': 'John Doe'}, {'$set': {'age': 31}})
]

result = collection.bulk_write(bulk_operations)
print(f'Operations completed. Inserted: {result.inserted_count}, Updated: {result.modified_count}')

Performance Considerations

When performing bulk operations, especially with large datasets, there are a few considerations to keep in mind:

  • Batch size: By default, MongoDB splits bulk operations into smaller batches to optimize performance. The maximum batch size is 100,000 operations.
  • Network latency: If you’re inserting a very large amount of data, consider splitting the operation into smaller batches to mitigate potential network latency issues.
  • Write concern: Adjusting the write concern can improve performance but at the potential cost of durability. Be cautious when using a lower write concern.

Conclusion

Using PyMongo’s insert_many() or bulk_write() methods, Python developers can efficiently insert or perform multiple operations on a collection in MongoDB. These methods are powerful tools for dealing with large datasets or complex data manipulation needs. By understanding their nuances and how to manage potential pitfalls, such as duplicates or performance issues, developers can harness the full potential of MongoDB through PyMongo.

Next Article: PyMongo: How to get the latest/earliest document

Previous Article: PyMongo: How to insert a document and get the ID

Series: Data Persistence in Python – Tutorials & Examples

Python

You May Also Like

  • Introduction to yfinance: Fetching Historical Stock Data in Python
  • Monitoring Volatility and Daily Averages Using cryptocompare
  • Advanced DOM Interactions: XPath and CSS Selectors in Playwright (Python)
  • Automating Strategy Updates and Version Control in freqtrade
  • Setting Up a freqtrade Dashboard for Real-Time Monitoring
  • Deploying freqtrade on a Cloud Server or Docker Environment
  • Optimizing Strategy Parameters with freqtrade’s Hyperopt
  • Risk Management: Setting Stop Loss, Trailing Stops, and ROI in freqtrade
  • Integrating freqtrade with TA-Lib and pandas-ta Indicators
  • Handling Multiple Pairs and Portfolios with freqtrade
  • Using freqtrade’s Backtesting and Hyperopt Modules
  • Developing Custom Trading Strategies for freqtrade
  • Debugging Common freqtrade Errors: Exchange Connectivity and More
  • Configuring freqtrade Bot Settings and Strategy Parameters
  • Installing freqtrade for Automated Crypto Trading in Python
  • Scaling cryptofeed for High-Frequency Trading Environments
  • Building a Real-Time Market Dashboard Using cryptofeed in Python
  • Customizing cryptofeed Callbacks for Advanced Market Insights
  • Integrating cryptofeed into Automated Trading Bots