Sling Academy
Home/JavaScript/JavaScript: Convert a string to Unicode code points (2 ways)

JavaScript: Convert a string to Unicode code points (2 ways)

Last updated: August 05, 2023

Unicode code points are the numerical values that represent each character in the Unicode standard, which covers over a million characters from various languages, scripts, symbols, and emojis. Converting a string to Unicode code points can be useful for various purposes, such as encoding, decoding, escaping, or analyzing text data.

This concise, example-based article will walk you through a couple of different ways to turn a given string into an array of Unicode code points in both modern JavaScript (ES6 and beyond) and classic JavaScript (that can run on ancient browsers like IE 10). Without any further ado, let’s get started.

Using String.prototype.codePointAt()

This approach uses the built-in method codePointAt() of the String.prototype object to return the Unicode code point of a character at a given index in the string. It can handle any valid Unicode character in the string.

The steps to get the job done are:

  1. Declare an empty array to store the output code points.
  2. Use a for-of loop to iterate over each character in the string.
  3. Use the codePointAt() method with the index of the current character as the argument to get its code point value.
  4. Push the code point value to the output array using the Array.push() method.
  5. Return or log the output array.

Words might be confusing. Here’s an example:

// Input string
const str = 'Welcome to Sling Academy!';

// Output array
const codePoints = [];

// Loop over each character in the string
for (const char of str) {
  // Get the code point value of the character
  const codePoint = char.codePointAt(0);
  // Push the code point value to the output array
  codePoints.push(codePoint);
}

// Log the output array
console.log(codePoints);

Output:

[87, 101, 108, 99, 111, 109, 101, 32, 116, 111, 32, 83, 108, 105, 110, 103, 32, 65, 99, 97, 100, 101, 109, 121, 33]

This approach may not work in older browsers or environments that do not support ES6 features. If you cannot accept that, the next section of this article is the way to go.

Using String.prototype.charCodeAt() and bitwise operations

This approach is compatible with older browsers and environments that do not support ES6 features. It can also handle any valid Unicode character in the string. The trade-off is that it is more complex and verbose than the preceding technique.

The core idea here is to use the built-in method charCodeAt() of the String.prototype object to return the UTF-16 code unit value of a character at a given index in the string. This method can only handle 2-byte characters (BMP characters) by returning their code unit value directly. For 4-byte characters (supplementary characters), it returns two separate values for each half of their surrogate pair. To get their full code point value, some bitwise operations are needed to combine their high and low surrogates.

The steps are as follows:

  1. Declare an empty array to store the output code points.
  2. Use a for-of loop to iterate over each character in the string.
  3. Use the charCodeAt() method with the index of the current character as the argument to get its UTF-16 code unit value.
  4. Check if the code unit value is between 0xD800 and 0xDBFF, which means it is a high surrogate of a supplementary character.
  5. If yes, use another charCodeAt() method with the index of the next character as the argument to get its low surrogate value. Then use some bitwise operations to combine them into a full code point value. The formula is: (high - 0xD800) * 0x400 + (low - 0xDC00) + 0x10000.
  6. If no, use the code unit value as the code point value directly.
  7. Push the code point value to the output array.
  8. Return or log the output array.

Code example:

// Input string
const str = 'Welcome to Sling Academy!';

// Output array
const codePoints = [];

// Loop over each character in the string
for (let i = 0; i < str.length; i++) {
  // Get the UTF-16 code unit value of the character
  let codeUnit = str.charCodeAt(i);
  // Check if it is a high surrogate of a supplementary character
  if (codeUnit >= 0xd800 && codeUnit <= 0xdbff) {
    // Get the low surrogate value of the next character
    let lowSurrogate = str.charCodeAt(i + 1);
    // Combine them into a full code point value
    let codePoint =
      (codeUnit - 0xd800) * 0x400 + (lowSurrogate - 0xdc00) + 0x10000;
    // Push the code point value to the output array
    codePoints.push(codePoint);
    // Skip the next character as it is already processed
    i++;
  } else {
    // Use the code unit value as the code point value directly
    codePoints.push(codeUnit);
  }
}

// Log the output array
console.log(codePoints);

Output:

(25) [87, 101, 108, 99, 111, 109, 101, 32, 116, 111, 32, 83, 108, 105, 110, 103, 32, 65, 99, 97, 100, 101, 109, 121, 33]

The result is the same as the first approach. However, the code is far longer.

Next Article: JavaScript: Convert a byte array to a hex string and vice versa

Previous Article: JavaScript: Ways to Compare 2 Strings Ignoring Case

Series: JavaScript Strings

JavaScript

You May Also Like

  • Handle Zoom and Scroll with the Visual Viewport API in JavaScript
  • Improve Security Posture Using JavaScript Trusted Types
  • Allow Seamless Device Switching Using JavaScript Remote Playback
  • Update Content Proactively with the JavaScript Push API
  • Simplify Tooltip and Dropdown Creation via JavaScript Popover API
  • Improve User Experience Through Performance Metrics in JavaScript
  • Coordinate Workers Using Channel Messaging in JavaScript
  • Exchange Data Between Iframes Using Channel Messaging in JavaScript
  • Manipulating Time Zones in JavaScript Without Libraries
  • Solving Simple Algebraic Equations Using JavaScript Math Functions
  • Emulating Traditional OOP Constructs with JavaScript Classes
  • Smoothing Out User Flows: Focus Management Techniques in JavaScript
  • Creating Dynamic Timers and Counters with JavaScript
  • Implement Old-School Data Fetching Using JavaScript XMLHttpRequest
  • Load Dynamic Content Without Reloading via XMLHttpRequest in JavaScript
  • Manage Error Handling and Timeouts Using XMLHttpRequest in JavaScript
  • Handle XML and JSON Responses via JavaScript XMLHttpRequest
  • Make AJAX Requests with XMLHttpRequest in JavaScript
  • Customize Subtitle Styling Using JavaScript WebVTT Integration