Sling Academy
Home/SQLAlchemy/SQLAlchemy: How to Bulk Insert Data into a Table

SQLAlchemy: How to Bulk Insert Data into a Table

Last updated: January 04, 2024

Introduction

Bulk inserting data efficiently can be crucial to the performance of your application. SQLAlchemy provides several mechanisms for batch operations, which can minimize overhead and speed up database transaction times. In this guide, we’ll explore how to perform bulk inserts using SQLAlchemy, ranging from basics to advanced techniques.

Setting Up the Environment

Before you start, ensure that you have installed Python and SQLAlchemy. Install SQLAlchemy using pip install SQLAlchemy if you haven’t done so already. You will also need a database to interact with; for this guide, we’ll use SQLite as it requires no additional setup.

from sqlalchemy import create_engine, Column, Integer, String, MetaData, Table
from sqlalchemy.orm import sessionmaker

# Create an engine that stores data in the local directory's
# sqlalchemy_example.db file.
engine = create_engine('sqlite:///sqlalchemy_example.db')

# Define a table
metadata = MetaData()
users_table = Table('users', metadata,
    Column('id', Integer, primary_key=True),
    Column('name', String),
    Column('age', Integer))

# Create the table
metadata.create_all(engine)

# Create a Session class
Session = sessionmaker(bind=engine)

Basic Bulk Insert

To perform a basic bulk insert, you can use the Session.bulk_insert_mappings() method. This method allows you to insert many objects at once without needing to instantiate model instances.

session = Session()

users = [
    {'name': 'John Doe', 'age': 28},
    {'name': 'Jane Doe', 'age': 25},
    # ... more user dicts
]

# Bulk insert mappings
session.bulk_insert_mappings(users_table, users)
session.commit()

Using Insert Constructs for Large Batches

For even larger datasets, the SQL insert construct, available through the SQLAlchemy core, can be more efficient. Here’s how to use it:

insert_stmt = users_table.insert().values(
    [{'name': 'Alice', 'age': 30}, {'name': 'Bob', 'age': 22}]  # list of dicts
)

conn = engine.connect()
conn.execute(insert_stmt)
conn.close()

Dealing with Duplicates During Bulk Insert

Handle duplicates during bulk insert operations by using database mechanisms such as PostgreSQL’s ON CONFLICT. Here’s an example using the SQLAlchemy core:

insert_stmt = users_table.insert().values(
    [{'name': 'Charlie', 'age': 35}, {'name': 'Dana', 'age': 19}]
).on_conflict_do_update(
    index_elements=['name'],
    set_=dict(age=users_table.c.age + 1)
)

conn = engine.connect()
conn.execute(insert_stmt)
conn.close()

Performance Considerations

When performing bulk operations, there are some best practices to follow for better performance:

  • Disable the ORM’s autoflush during bulk operations toprevent unnecessary flushes.
  • Use a transaction and commit only after all the data has been inserted.
  • Understand your database’s bulk insert capabilities and limitations.

Advanced Bulk Operations

SQLAlchemy version 1.3 introduced the bulk_save_objects() function for complex bulk operations that might involve primary key and foreign key handling. Here’s a quick example:

session = Session()

users = [
    {'name': 'Eva', 'age': 28},
    # ... more user dicts
]

# Handle bulk save objects
session.bulk_save_objects([
    users_table(**data) for data in users
])
session.commit()

Using the SQLAlchemy ORM for Bulk Inserts

If you require the features of the ORM for your bulk inserts, for instance, to trigger events or handle relationships, you can use the session.add_all() method:

session = Session()

class User(Base):
    __tablename__ = 'users'
    id = Column(Integer, primary_key=True)
    name = Column(String)
    age = Column(Integer)

# Define some user instances
users = [User(name='Frank', age=45), User(name='Grace', age=30)]

# Use add_all for bulk insertion
session.add_all(users)
session.commit()

Conclusion

Using SQLAlchemy for bulk inserts can improve the performance of your application by reducing transaction times and resource usage. Whether you’re dealing with basic bulk inserts or managing more complex scenarios with duplicate values or cascade operations, SQLAlchemy offers the tools you’ll need. Remember to benchmark and profile your bulk operations to ensure they are optimized for your specific case.

Next Article: SQLAlchemy: How to Update a Record by ID

Previous Article: SQLAlchemy: How to Insert a Record into a Table

Series: SQLAlchemy Tutorials: From Basic to Advanced

SQLAlchemy

You May Also Like

  • SQLAlchemy: Counting rows for each category (2 approaches)
  • SQLAlchemy: Adding a calculated column to SELECT query
  • SQLAlchemy: Grouping data on multiple columns
  • SQLAlchemy: How to temporarily delete a row
  • SQLAlchemy Composite Indexes: The Complete Guide
  • Full-Text Search in SQLAlchemy: The Ultimate Guide
  • SQLAlchemy: What if you don’t close database connections?
  • SQLAlchemy: How to Remove FOREIGN KEY Constraints (2 Ways)
  • SQLAlchemy: How to Create and Use Temporary Tables
  • SQLAlchemy: Saving Categories and Subcategories in the Same Table
  • SQLAlchemy: How to Automatically Delete Old Records
  • Weighted Random Selection in SQLAlchemy: An In-Depth Guide
  • SQLAlchemy: created_at and updated_at columns
  • How to Use Regular Expressions in SQLAlchemy
  • SQLAlchemy: Ways to Find Results by a Keyword
  • SQLAlchemy: How to Connect to MySQL Database
  • SQLAlchemy: How to Update a Record by ID
  • SQLAlchemy: Singular vs Plural Table Names
  • SQLAlchemy: How to Add/Remove a Primary Key Constraint