How to correctly save Emoji in MySQL 8

Updated: January 26, 2024 By: Guest Contributor Post a comment

Introduction

With the rise of social media and international communication, emoticons, better known as emojis, have become a ubiquitous part of our digital conversations. MySQL 8.0 makes it possible to store these characters. This tutorial will guide you on how to correctly save emoji in MySQL 8 using the UTF8MB4 character set and adequate collation settings. Additionally, we’ll cover best practices and show several code examples to help you get started.

Understanding UTF8MB4

Before we dive into the implementation, it is essential to understand that standard UTF8 encoding in MySQL does not fully support all Unicode characters. Emojis are part of the supplementary characters which are encoded by 4-byte sequences in UTF-8. To accommodate these, MySQL introduced the utf8mb4 character set, which is an extension of the utf8 encoding capable of storing such 4-byte sequences.

Setting up the Database

First, let’s ensure that our database can store emojis by creating a database with the appropriate settings:

CREATE DATABASE emoji_database CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

Alternatively, if you need to update an existing database:

ALTER DATABASE existing_database CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

Adjusting Table and Column Settings

Next, we make sure the tables and columns are also set to use utf8mb4.

CREATE TABLE emoji_table (
  id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
  content TEXT NOT NULL,
) DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

If you have an existing table:

ALTER TABLE existing_table CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

To change a specific column:

ALTER TABLE existing_table CHANGE column_name column_name TEXT CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

Inserting and Selecting Emoji Data

Now that our database and tables are ready, let’s insert some emoji data into our table.

INSERT INTO emoji_table (content)
VALUES (‘I ❤️ MySQL 8’);
INSERT INTO emoji_table (content)
VALUES (‘Database fun! 😂👍🏻’);

Next, let’s verify that our emoji data has been saved correctly.

SELECT content FROM emoji_table WHERE id = 1;

You should see your emoji in the results:

I ❤️ MySQL 8

And for the second record:

Database fun! 😂👍🏻

Emojis in SQL Query Statements

Filtering emojis in SQL query statements can be tricky since there might be more than one representation of the same emoji (e.g., differing skin tones or genders). Here’s how you can filter for specific ones:

SELECT * FROM emoji_table WHERE content LIKE '%😂%';

You should see all rows containing the ‘face with tears of joy’ emoji in the content.

Dealing with Collations

Collations in MySQL control how strings are compared and sorted. For emojis, we often use the utf8mb4_unicode_ci collation because it compares characters based on their Unicode values in a case-insensitive manner, which generally leads to the most intuitive sort order for multi-language databases.

However, for more accurate emoji sorting (where smiling face comes before the thinking face), you could use the utf8mb4_0900_ai_ci, which is the default as of MySQL 8.0 and is based on Unicode 9.0 and above standards.

Back-Up and Restore

When backing up your databases, always ensure that your backup procedures and tools support utf8mb4. Similarly, when performing a restore, verify that your database server’s settings for default character sets and collations are appropriate to prevent data loss or corruption.

Troubleshooting Common Issues

Incorrect String Length Calculations

Due to the length of emoji characters, functions like CHAR_LENGTH() might return unexpected results, because they count the number of characters, not bytes. Instead, use LENGTH() to get the number of bytes.

SELECT LENGTH('😂'), CHAR_LENGTH('😂');

The first function will return 4 (bytes), reflecting the correct emoji length, while the latter will return 1 (character).

Connection and Client Settings

To round off the correct emoji support, don’t neglect the client and connection character sets:

SET NAMES 'utf8mb4';

Run this statement before performing any operations in your MySql session to ensure that the session operates with the correct character set. You should also configure your MySQL client applications and drivers to use utf8mb4 to avoid issues when retrieving or sending emoji data from your application to the database.

Notes

Full-Text Searches with Emojis

Unicode 9.0 and higher full-text search index doesn’t tokenize emojis. Particular applications might demand that emojis be treated as separate words during searches. To achieve this, you might need to create your text parsing tools or adapt the MySQL full-text parser accordingly.

Storing Variations of Emojis

Variation selectors allow you to store different styles (like text-style vs emoji-style) of a Unicode character, which can be important for maintaining user intent. Proper handling of these selectors can lead to better representation and user experience.

Conclusion

Supporting emoji in MySQL 8 involves careful planning, from setting up your database with the correct character set and collation to properly managing client connections. By following the steps outlined in this guide, you’ll ensure that your application can handle emojis seamlessly, allowing users to emote to their heart’s content.