JavaScript codePointAt(index) function


The codePointAt(index) method in JavaScript returns the Unicode code point (a non-negative integer) of the character at the specified position (index) in a string. Unlike charCodeAt(), which returns the UTF-16 code unit (which may represent only part of a character for characters outside the Basic Multilingual Plane), codePointAt() can handle characters that are represented by surrogate pairs, such as emojis or special symbols.

Syntax:

string.codePointAt(index)
  • index: The position (index) of the character you want to get the Unicode code point for. This is zero-based, meaning the first character is at index 0.

Return Value:

  • It returns a number that represents the full Unicode code point of the character at the given index.
  • If the index is out of range (e.g., negative or greater than the string's length), it returns undefined.

Why Use codePointAt()?

JavaScript strings use UTF-16 encoding, meaning that characters in the Basic Multilingual Plane (BMP) — such as common letters, symbols, and numbers — are represented by a single 16-bit unit. However, characters outside the BMP, such as emojis or rare symbols, are represented by surrogate pairs (two 16-bit units). charCodeAt() only retrieves the first 16-bit unit, but codePointAt() returns the full code point, making it more accurate for those cases.

Example 1: Basic Usage

let text = "A"; // Regular character console.log(text.codePointAt(0)); // 65 (Unicode for 'A')

In this case, the character 'A' has a Unicode code point of 65.

Example 2: Handling Surrogate Pairs

For characters outside the BMP, such as emojis, codePointAt() can return the full code point, unlike charCodeAt().

let emoji = "💻"; // Laptop emoji console.log(emoji.codePointAt(0)); // 128187
  • The emoji "💻" (laptop) has a Unicode code point of 128187, but it's stored internally as a surrogate pair in UTF-16.
  • If you were to use charCodeAt(), it would return only the first half of the surrogate pair:
console.log(emoji.charCodeAt(0)); // 55357 (high surrogate) console.log(emoji.charCodeAt(1)); // 56473 (low surrogate)

Example 3: Working with Characters in a String

You can loop through a string and get the code points for each character, handling both regular characters and surrogate pairs:

let text = "ABC💻"; for (let i = 0; i < text.length; i++) { console.log(text.codePointAt(i)); // 65, 66, 67, 128187 }

Handling Surrogate Pairs

Because surrogate pairs take up two positions in the string, you should skip the second half of the surrogate pair when iterating over strings that may contain such characters:

let text = "A💻B"; for (let i = 0; i < text.length; i++) { let codePoint = text.codePointAt(i); console.log(`Character: ${String.fromCodePoint(codePoint)}, Code point: ${codePoint}`); if (codePoint > 0xFFFF) i++; // Skip the next index if it's a surrogate pair } // Output: // Character: A, Code point: 65 // Character: 💻, Code point: 128187 // Character: B, Code point: 66

Example 4: Out-of-Range Index

If you specify an index that is out of range (greater than or equal to the string’s length), codePointAt() returns undefined:

let text = "ABC"; console.log(text.codePointAt(10)); // undefined

Difference Between codePointAt() and charCodeAt()

  • charCodeAt(index): Returns the UTF-16 code unit at the specified index. This may return only part of a character for surrogate pairs.
  • codePointAt(index): Returns the full Unicode code point at the specified index, making it accurate for handling characters represented by surrogate pairs (e.g., emojis, rare symbols).

Example: Difference in Handling Emojis

let emoji = "💻"; // Using charCodeAt() console.log(emoji.charCodeAt(0)); // 55357 (high surrogate part) console.log(emoji.charCodeAt(1)); // 56473 (low surrogate part) // Using codePointAt() console.log(emoji.codePointAt(0)); // 128187 (full code point for "💻")

Summary:

  • codePointAt() returns the full Unicode code point for any character, including those represented by surrogate pairs (e.g., emojis).
  • It is more reliable than charCodeAt() for characters outside the BMP.
  • If the index is out of bounds, it returns undefined.