How to mass insert data into a table in MySQL 8

Updated: January 25, 2024 By: Guest Contributor Post a comment

Introduction

When dealing with database operations, especially in a system that requires high throughput, the efficiency of data insertion becomes crucial. In MySQL 8, several techniques can optimize the mass insertion of data, reducing I/O overhead, and consequently, the total execution time. This tutorial will guide you through various methods to achieve mass data insertion into a MySQL table.

Preparation

Before we jump into mass insertion techniques, ensure that you have the following:

  • MySQL 8 installed and accessible on your system.
  • Proper understanding of SQL syntax.
  • A database and table where you wish to insert the data.

For simple demonstration purposes, let’s create a table named users that we will work with:

CREATE TABLE users (
  id INT AUTO_INCREMENT PRIMARY KEY,
  username VARCHAR(50) NOT NULL,
  email VARCHAR(100) NOT NULL,
  registered_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

Basic Single Insert Statement

We start with the most straightforward approach, using a basic INSERT statement:

INSERT INTO users (username, email) VALUES ('johndoe', '[email protected]');

This approach is simple but not efficient for mass insertion since it requires multiple round trips to the server for each individual record.

Inserting Multiple Rows with a Single Statement

A better way is to insert multiple records with a single statement:

INSERT INTO users (username, email)
VALUES 
  ('janelowe', '[email protected]'),
  ('alexsmith', '[email protected]'),
  ('lisagray', '[email protected]');

This method is more efficient as it reduces the number of server round trips. However, the length of an SQL statement is limited by the max_allowed_packet configuration parameter, so for very large data sets, this approach may require splitting the data into smaller chunks.

Using LOAD DATA INFILE

For the most efficient mass insertion, especially with very large data sets, the LOAD DATA INFILE command is the optimal choice:

LOAD DATA INFILE '/path/to/your/data.csv'
INTO TABLE users
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\n'
IGNORE 1 LINES;

This bulk load operation is faster than using multiple INSERT statements as it is optimized to minimize index rebuilding and bypass some of the transactional overhead.

Optimizations and Best Practices

  • Disabling Autocommit: For large transactions, disabling autocommit can greatly enhance performance. Wrap your insertions within a transaction.
  • Using Extended Inserts: Where possible, use the multi-row syntax as shown in the second method to reduce round trips.
  • Adjusting Bulk Insert Buffer Size: For bulk operations, adjust the bulk_insert_buffer_size system variable to allocate more memory for bulk insert operations.
  • Indexing Strategies: Avoid updating secondary indexes with each insert operation. If possible, create indexes after the bulk data load.

Advanced Inserts Using Stored Procedures

For complex insertion logic or when you need to process data before inserting, a stored procedure might be your answer:

DELIMITER $
CREATE PROCEDURE MassInsert()
BEGIN
  DECLARE i INT DEFAULT 1;
  WHILE i <= 1000 DO
    INSERT INTO users (username, email) VALUES (CONCAT('user', i), CONCAT('user', i, '@example.com'));
    SET i = i + 1;
  END WHILE;
END$
DELIMITER ;

Remember that for massive amounts of data, a stored procedure might not be the most efficient approach due to the row-by-row processing.

Monitoring Performance

To ensure that your mass insert operations perform as expected, monitor them using MySQL’s performance schema or the SHOW PROCESSLIST command.

Conclusion

In summary, mass data insertion in MySQL 8 is a process that can be greatly optimized by using multiple row syntax, `LOAD DATA INFILE` for large data sets, and system variable adjustments. The method you choose will largely depend on the size of your data and the specific requirements of your system. Always test different approaches to identify the most efficient one for your scenario.