JavaScript charCodeAt(index) function


The charCodeAt(index) method in JavaScript returns the Unicode (UTF-16) code of the character at a specified position (index) in a string. The method gives you the numeric representation of the character, which can be useful when you need to work with character codes, for example, when processing or comparing characters based on their Unicode values.

Syntax:

string.charCodeAt(index)
  • index: The position of the character you want to retrieve the Unicode value for. This is a zero-based index, meaning that the first character is at index 0, the second at 1, and so on.

Return Value:

  • It returns an integer representing the Unicode code unit of the character at the given index.
  • If the index is out of range (e.g., negative or greater than the string's length), it returns NaN.

Example 1: Basic Usage

let text = "JavaScript"; let codeAt0 = text.charCodeAt(0); // 74 (Unicode for 'J') let codeAt4 = text.charCodeAt(4); // 83 (Unicode for 'S')

In this example:

  • 'J' has the Unicode value 74.
  • 'S' has the Unicode value 83.

Example 2: Handling Out-of-Range Index

If the provided index is outside the bounds of the string, charCodeAt() returns NaN.

let text = "JavaScript"; let codeOutOfBounds = text.charCodeAt(20); // NaN

Unicode Explanation:

  • Unicode is a global character encoding standard that assigns a unique number (code point) to every character, symbol, and punctuation mark across all languages. JavaScript represents these using UTF-16 encoding.
  • For example, the character 'A' has a Unicode code point of 65, and 'a' has a code point of 97.

Example 3: Using charCodeAt() to Get Character Codes

let text = "Hello!"; for (let i = 0; i < text.length; i++) { console.log(text.charAt(i) + ": " + text.charCodeAt(i)); } // Output: // H: 72 // e: 101 // l: 108 // l: 108 // o: 111 // !: 33

In this example, charCodeAt() is used in a loop to print each character along with its Unicode code value.

Example 4: Comparing Characters Using charCodeAt()

You can use charCodeAt() to compare the Unicode values of characters:

let a = 'A'; let b = 'a'; if (a.charCodeAt(0) < b.charCodeAt(0)) { console.log("'A' comes before 'a'"); } else { console.log("'a' comes before 'A'"); } // Output: 'A' comes before 'a' (since 'A' has a code point of 65 and 'a' has 97)

Supplement: UTF-16 Surrogate Pairs

JavaScript strings are stored as sequences of 16-bit units (UTF-16). For characters outside the Basic Multilingual Plane (BMP) (i.e., code points above 0xFFFF, like emojis or rare symbols), two 16-bit code units are used. charCodeAt() will only return the first 16-bit unit in these cases.

For example:

let emoji = "💻"; console.log(emoji.charCodeAt(0)); // 55357 (high surrogate) console.log(emoji.charCodeAt(1)); // 56473 (low surrogate)

If you need the full Unicode code point for characters that use surrogate pairs, you should use the codePointAt() method introduced in ES6, which can handle such characters correctly.

Summary:

  • charCodeAt() returns the Unicode value of the character at the specified index.
  • It returns NaN for invalid indices.
  • It provides the UTF-16 value of the character, which is essential for understanding and working with different character encodings in JavaScript.
  • Use codePointAt() for handling characters outside the BMP (e.g., emojis).