This concise and straight-to-the-point article will show you how to convert a character to a Unicode code point and turn a Unicode code point into a character in Python.
What are Unicode code points?
A Unicode code point is a numerical value that represents a specific character in the Unicode standard. It is a unique identifier assigned to each character and can range from 0
to 0x10FFFF
.
The Unicode code point allows for the universal representation and encoding of characters from various writing systems and languages. It provides a standardized way to represent and process text in different scripts and symbols.
Convert a character to a Unicode code point in Python
Python provides a function named ord()
, which takes a character as input and returns its corresponding Unicode code point. This is the tool we’ll use to get the job done.
Example:
character = 'A'
code_point = ord(character)
print(f"The code point of '{character}' is {code_point}")
character = '😄'
code_point = ord(character)
print(f"The code point of '{character}' is {code_point}")
Output:
The code point of 'A' is 65
The code point of '😄' is 128516
If you inadvertently (or intentionally) provide an invalid input to the ord()
function, it will not be happy and raise a ValueError
as shown below:
character = 'ABC_DEF'
code_point = ord(character)
# TypeError: ord() expected a character, but string of length 7 found
Convert a Unicode code point to a character in Python
In order to convert a Unicode code point to its corresponding character, you can use the Python built-in chr()
function.
Example:
code_point = 102
character = chr(code_point)
print(f"The character corresponding to code point {code_point} is '{character}'")
Output:
The character corresponding to code point 102 is 'f'
What if you feed the chr() function with an invalid, out-of-range input (the valid range is from 0
to 0x10FFFF
)? Well, it will raise a ValueError as the ord()
function does, like this:
code_point = 10222111
character = chr(code_point)
# ValueError: chr() arg not in range(0x110000)