MySQL 8: How to change charset of a whole database

Updated: January 25, 2024 By: Guest Contributor Post a comment

Introduction

Understanding how to manipulate and manage character sets in databases is crucial for proper data storage and retrieval, especially when working with internationalization and localization. With the recent updates in MySQL 8.0, we have a more robust support for character sets which enhances performance, security, and reliability of data storage. In this guide, we will dive into the step-by-step process to change the charset of a whole MySQL 8 database, ensuring that your database characters are consistent and accurately represented.

Understanding Character Sets and Collations

In MySQL, a character set is a set of symbols and encodings. A database’s character set determines the possible characters that can be stored in the text-based data types like CHAR, VARCHAR, and TEXT. Collations determine how sorting and comparison operations are performed on string data. MySQL 8.0 supports multiple character sets and collations, which you can configure according to the nature and requirements of your information.

It is important to know the current character set and collation of the database before making any changes. With the following SQL query, you can fetch the character set and collation for each schema in your MySQL server:

SELECT DEFAULT_CHARACTER_SET_NAME, DEFAULT_COLLATION_NAME
FROM INFORMATION_SCHEMA.SCHEMATA;

This query will output the default character set and collation for each database on your MySQL server.

Choosing the Right Character Set and Collation

For most applications, ‘utf8mb4’ is recommended because it supports every character in the Unicode standard, including emojis. ‘utf8mb4’ is the default character set in MySQL 8.0. Here is an example of setting this character set along with its collation:

SET NAMES 'utf8mb4' COLLATE 'utf8mb4_0900_ai_ci';

Be sure to choose the collation that best suits your needs in terms of sensitivity to case, accent, and language.

Changing the Default Charset and Collation for New Tables

If you wish to set a default charset and collation for future tables created within a database, you can use the following command:

ALTER DATABASE database_name
CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci;

Replace ‘database_name’ with the name of your database. This command does not convert existing tables and data, it sets the default for future tables.

Converting Existing Tables

To change the character set and collation for existing tables, you’ll need to go through each table separately. This is crucial because, otherwise, there can be data loss if the conversion cannot be performed without issues. The below SQL code changes the character set and collation for a single table:

ALTER TABLE table_name CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

Make sure to take a backup of your database before doing this because the conversion may modify the data.

Automating the Charset Change for All Tables

If you need to convert all the tables in a database to a new charset and collation, a script like the following could be used:

SELECT CONCAT('ALTER TABLE `', table_name, '` CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;') 
FROM information_schema.TABLES 
WHERE table_schema = "database_name";

Executing this query will list all the ‘ALTER TABLE’ commands necessary to convert your tables. You can copy the output and execute it to change the charset for every table within the database.

Handling Errors and Potential Issues

During charset conversion, you may encounter errors or issues, such as:

  • Data truncation: If the new charset cannot store some of the characters, those might get lost.
  • Index length: With ‘utf8mb4’, the maximum length of index fields is effectively less. Adjustment of indexes may be needed.

Always test changes on a non-production database and ensure all issues are resolved before applying changes to the live system.

Final Steps and Considerations

Once the conversion has been completed, it’s critical that all applications using the database are aware of the charset change. This means updating connections and ensuring that all new data is stored using the correct character set and collation. In MySQL 8, these settings can be applied on a per-client basis as such:

SET collation_connection = 'utf8mb4_unicode_ci';

Lastly, keep in mind to configure backups and other maintenance tools to use the new character set and collation.

Conclusion

In summary, changing the character set and collation of a MySQL 8 database requires a structured approach involving changing the defaults for the database, converting each table, and ensuring that existing applications continue to function correctly. With careful planning and execution, you can upgrade your database’s text handling capabilities to be fully internationalized, optimized, and future-proof.