MySQL 8: Can a table store 100 million rows

Updated: January 25, 2024 By: Guest Contributor Post a comment

Introduction

As businesses and applications continue to grow and generate massive amounts of data, database administrators and developers increasingly encounter the challenge of scaling databases to handle large datasets. A common question is whether MySQL, one of the most popular open-source relational database management systems, can efficiently store and handle tables with 100 million rows or more.

In this tutorial, we will explore the feasibility of storing 100 million rows in a MySQL 8 table and discuss the various aspects that need to be considered, including hardware requirements, table design, indexing, query optimization, and partitioning. Let’s kick-start this journey in our MySQL 8 database. Buckle up as we dive deep into the realm of large datasets!

Understanding MySQL Limits

Before we start pushing boundaries, we must understand the theoretical and practical limits of MySQL. The database itself does not impose a hard limit on the number of rows a table can contain; rather, the limit is imposed by available disk space and the size of the table’s index file. MySQL uses a table format (InnoDB by default in MySQL 8) which can theoretically handle tables with up to 64TB of data.

Hardware Considerations

Managing such a large dataset requires powerful hardware. MySQL performance often depends on disk I/O, memory, and processing power. Ensure you have:

  • SSDs for faster data access and transfer rates.
  • Adequate memory (RAM) to hold as much of the InnoDB buffer pool as possible.
  • Multiple cores on your server to handle concurrent processes efficiently.

Example Configuration

[mysqld]
innodb_buffer_pool_size = 64G
innodb_log_file_size = 1G
innodb_flush_log_at_trx_commit = 2
innodb_file_per_table = ON

Database and Table Design

Effective design and normalization are crucial. Avoid using unnecessary columns and ensure data types are appropriate for the data. For example, use INT for integer numbers, not BIGINT, unless absolutely necessary:

CREATE TABLE users (
    user_id INT UNSIGNED NOT NULL AUTO_INCREMENT,
    username VARCHAR(50) NOT NULL,
    email VARCHAR(100) NOT NULL,
    join_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    PRIMARY KEY (user_id)
) ENGINE=InnoDB;

Indexing Strategies

Indexes are essential to enhance performance. However, more indexes mean a longer time to write, as each index must be updated on INSERT or UPDATE operations. Choose indexes wisely:

CREATE INDEX idx_username ON users (username);

This command creates an index for faster searching by the username column.

Query Optimization

Minimize scanning large datasets. Use EXPLAIN to understand how MySQL executes a query:

EXPLAIN SELECT * FROM users WHERE username = 'johndoe';

This will reveal how indexes are used and if any table scans are involved. Adjust your queries and indexes based on this feedback.

Partitioning

Partitioning divides a table into parts, making it easier to manage and query large data volumes. You can partition by range, list, hash, or key:

CREATE TABLE users (
    ...
) ENGINE=InnoDB
PARTITION BY RANGE (user_id) (
    PARTITION p0 VALUES LESS THAN (1000000),
    PARTITION p1 VALUES LESS THAN (2000000),
    PARTITION p2 VALUES LESS THAN MAXVALUE
);

However, partitioning is not always the right choice. Consider it if you frequently run queries that read or write a huge amount of data with distinguishable ranges or characteristics.

Load Testing and Monitoring

Use load testing tools like mysqlslap, Sysbench, or others to simulate workloads on your database. Monitoring with tools like Innotop, MySQL Workbench, Prometheus, and Grafana can help you understand practical workloads and optimize further.

Data Insertion Example

When initially populating your table with large amounts of data, you want to go big:

LOAD DATA INFILE '/path/to/large_dataset.csv' INTO TABLE users
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\n';

This example uses the LOAD DATA INFILE statement which is much faster than inserting rows individually.

Conclusion

In conclusion, MySQL is more than capable of storing and handling tables with 100 million rows or more, given the necessary hardware resources and appropriate database design and optimization. Ensuring effective design, indexing, and partitioning will empower your MySQL 8 database to handle large data volumes confidently.