JavaScript codePointAt(index) function
The codePointAt(index)
method in JavaScript returns the Unicode code point (a non-negative integer) of the character at the specified position (index) in a string. Unlike charCodeAt()
, which returns the UTF-16 code unit (which may represent only part of a character for characters outside the Basic Multilingual Plane), codePointAt()
can handle characters that are represented by surrogate pairs, such as emojis or special symbols.
Syntax:
index
: The position (index) of the character you want to get the Unicode code point for. This is zero-based, meaning the first character is at index0
.
Return Value:
- It returns a number that represents the full Unicode code point of the character at the given index.
- If the
index
is out of range (e.g., negative or greater than the string's length), it returnsundefined
.
Why Use codePointAt()
?
JavaScript strings use UTF-16 encoding, meaning that characters in the Basic Multilingual Plane (BMP) — such as common letters, symbols, and numbers — are represented by a single 16-bit unit. However, characters outside the BMP, such as emojis or rare symbols, are represented by surrogate pairs (two 16-bit units). charCodeAt()
only retrieves the first 16-bit unit, but codePointAt()
returns the full code point, making it more accurate for those cases.
Example 1: Basic Usage
In this case, the character 'A'
has a Unicode code point of 65.
Example 2: Handling Surrogate Pairs
For characters outside the BMP, such as emojis, codePointAt()
can return the full code point, unlike charCodeAt()
.
- The emoji "💻" (laptop) has a Unicode code point of 128187, but it's stored internally as a surrogate pair in UTF-16.
- If you were to use
charCodeAt()
, it would return only the first half of the surrogate pair:
Example 3: Working with Characters in a String
You can loop through a string and get the code points for each character, handling both regular characters and surrogate pairs:
Handling Surrogate Pairs
Because surrogate pairs take up two positions in the string, you should skip the second half of the surrogate pair when iterating over strings that may contain such characters:
Example 4: Out-of-Range Index
If you specify an index that is out of range (greater than or equal to the string’s length), codePointAt()
returns undefined
:
Difference Between codePointAt()
and charCodeAt()
charCodeAt(index)
: Returns the UTF-16 code unit at the specified index. This may return only part of a character for surrogate pairs.codePointAt(index)
: Returns the full Unicode code point at the specified index, making it accurate for handling characters represented by surrogate pairs (e.g., emojis, rare symbols).
Example: Difference in Handling Emojis
Summary:
codePointAt()
returns the full Unicode code point for any character, including those represented by surrogate pairs (e.g., emojis).- It is more reliable than
charCodeAt()
for characters outside the BMP. - If the
index
is out of bounds, it returnsundefined
.