4 ways to bulk insert data in Doctrine & Symfony

Updated: January 13, 2024 By: Guest Contributor Post a comment

Introduction

Working with databases efficiently is crucial for web applications. One common task is inserting a large amount of data into a database. This can be particularly resource-intensive, especially when adding records one by one. In a Symfony application using Doctrine ORM, handling bulk inserts efficiently is key to maintaining performance. This tutorial will provide an overview of multiple methods to perform bulk data insertions in Doctrine when building applications with Symfony framework.

Understanding Doctrine’s UnitOfWork

Before diving into bulk insert operations, it is important to understand Doctrine’s UnitOfWork pattern. UnitOfWork is responsible for tracking changes to entities during an application’s lifecycle and performs database operations accordingly. However, managing too many entities at once can lead to memory issues.

Method 1: Using EntityManager’s flush()

Starting with the most basic way to perform bulk insert, we will use the Doctrine ORM’s EntityManager:

// src/Service/DataImporter.php
namespace App\Service;

use Doctrine\ORM\EntityManagerInterface;
use App\Entity\YourEntity;

class DataImporter {
    private $em;

    public function __construct(EntityManagerInterface $entityManager) {
        $this->em = $entityManager;
    }
    public function bulkInsert(array $data) {
        foreach ($data as $item) {
            $entity = new YourEntity();
            // ... set properties of the entity
            $this->em->persist($entity);

            // Detach the entity to free memory
            $this->em->detach($entity);
        }

        $this->em->flush();
    }
}

The persist() method informs Doctrine that the entity should be saved, and flush() executes the insert query for all persisted entities. However, calling flush() outside the loop is crucial for bulk inserts to avoid performance degradation.

Method 2: Batch Processing

For better memory management and performance, Doctrine recommends batch processing:

public function bulkInsert(array $data) {
    $batchSize = 20;

    foreach ($data as $index => $item) {
        $entity = new YourEntity();
        // ... set properties of the entity
        $this->em->persist($entity);

        if (($index % $batchSize) === 0) {
            $this->em->flush();
            $this->em->clear(); // Detaches all objects from Doctrine
        }
    }

    $this->em->flush(); // Persist objects that did not meet the condition
    $this->em->clear();
}

This approach is more efficient as it processes data in batches, reducing memory usage and increasing insert speed.

Method 3: Using Native SQL

If maximum performance is required, you can resort to native SQL for bulk inserts:

// src/Service/DataImporter.php

// ... class boilerplate

public function bulkInsert(array $data) {
    $connection = $this->em->getConnection();
    $sql = 'INSERT INTO your_table (column1, column2) VALUES (?, ?)';
    $stmt = $connection->prepare($sql);

    foreach ($data as $item) {
        $stmt->bindValue(1, $item['column1']);
        $stmt->bindValue(2, $item['column2']);
        $stmt->execute();
    }
}

Native SQL circumvents Doctrine’s UnitOfWork entirely and interacts with the database connection directly, providing better performance for bulk insertions.

Method 4: Bulk Inserts with Third-party Bundles

Symfony has a vibrant ecosystem of third-party bundles. For bulk inserts, you may consider tools like DoctrineBatchUtils:

This library provides convenient methods for handling batch operations that can significantly enhance performance for bulk inserts. Check for documentation specific to the bundle for proper implementation.

Considerations When Bulk Inserting

When dealing with bulk insert operations in Doctrine, it’s crucial to optimize your approach for efficiency and performance. Here are some key considerations:

Disabling SQL Logging

In Doctrine, the SQL logger can retain a log of all executed queries. This feature is invaluable for debugging but can lead to significant memory usage during bulk inserts. By default, Doctrine keeps a log of all queries executed in an instance, which can become a problem with large batch inserts.

To mitigate this, you should disable SQL logging for your batch processing. This can be achieved by fetching the Doctrine connection and turning off SQL logging:

$entityManager->getConnection()->getConfiguration()->setSQLLogger(null);

This step is critical in reducing memory usage during bulk inserts.

Tuning the Batch Size for Your Environment

The batch size, or the number of records processed in a single operation, can significantly impact performance. A smaller batch size might not fully utilize the database’s capabilities, while a very large batch size might strain the database or consume too much memory.

To find the optimal batch size, you should experiment with different sizes and monitor the memory usage and execution time. The optimal batch size can vary based on the complexity of your data and the database’s specifications.

Here’s an example of batch processing in Doctrine:

$batchSize = 20;
for ($i = 1; $i <= 1000; $i++) {
    $entity = new MyEntity();
    // ... set entity data
    $entityManager->persist($entity);

    if (($i % $batchSize) === 0) {
        $entityManager->flush();
        $entityManager->clear(); // Detaches all objects from Doctrine
    }
}
$entityManager->flush(); // Persist objects that did not make up an entire batch
$entityManager->clear();

Maintaining Database Indexes

Database indexes are crucial for performance, especially for read operations. However, during bulk insert operations, these indexes can become a bottleneck as the database needs to update the indexes after each insert.

One strategy is to temporarily disable indexes during the bulk insert and rebuild them once the operation is complete. This approach, however, depends on the specific database you’re using and its capabilities.

Using Asynchronous Processing

For extremely large datasets, consider using asynchronous processing. This approach involves queueing the data and using a background process to perform the inserts. Libraries like Symfony Messenger or Laravel Queue can be used to manage asynchronous jobs in PHP.

Asynchronous processing reduces the load on your main application and provides a more responsive user experience. However, it adds complexity to your system and requires a reliable way to process the queue.

Conclusion

In this tutorial, we explored various methods to perform bulk data insertions in Doctrine & Symfony from standard flush operations to native SQL and using third-party libraries. By leveraging the right technique, you can maintain a well-performing Symfony application even when handling large volumes of data.