JavaScript normalize() method


The normalize() method in JavaScript is used to convert a string into a standard form. This is particularly useful when dealing with Unicode characters, as it helps to ensure that text is represented in a consistent way, especially when performing string comparisons or manipulations.

Syntax:

string.normalize(form)
  • form: A string representing the normalization form to use. This parameter is optional and can take one of the following values:
    • "NFC" (Normalization Form C): This form composes characters by combining multiple code points into a single composed character when possible. For example, an 'e' followed by an acute accent can be represented as a single character ('é').
    • "NFD" (Normalization Form D): This form decomposes characters into their constituent parts. For example, 'é' would be represented as an 'e' followed by an acute accent.
    • "NFKC" (Normalization Form KC): Similar to NFC, but also applies compatibility transformations (e.g., replacing characters with their compatible forms).
    • "NFKD" (Normalization Form KD): Similar to NFD, but also applies compatibility transformations.

Return Value:

  • Returns a new string that is the normalized version of the original string, represented in the specified normalization form.

Example 1: Normalizing with NFC

let str1 = "é"; // Composed character let str2 = "é"; // Decomposed (e + combining acute accent) console.log(str1 === str2); // false (not equal) let normalizedStr1 = str1.normalize("NFC"); let normalizedStr2 = str2.normalize("NFC"); console.log(normalizedStr1 === normalizedStr2); // true (both now represent the same character)

Example 2: Normalizing with NFD

let str1 = "é"; // Composed character let str2 = "é"; // Decomposed (e + combining acute accent) let normalizedStr1 = str1.normalize("NFD"); let normalizedStr2 = str2.normalize("NFD"); console.log(normalizedStr1 === normalizedStr2); // true (both now represent the same character in decomposed form)

Example 3: Compatibility Normalization

Compatibility normalization forms are useful when you want to ensure that text is represented using compatible characters.

let str = "⅓"; // This is the fraction one-third let normalizedStrNFKC = str.normalize("NFKC"); // Converts it to "1/3" console.log(normalizedStrNFKC); // Outputs "1/3"

Example 4: String Comparison

Normalization can help ensure consistent results when comparing strings.

let str1 = "café"; let str2 = "cafe\u0301"; // 'e' + combining acute accent console.log(str1 === str2); // false (different representations) let normalizedStr1 = str1.normalize("NFC"); let normalizedStr2 = str2.normalize("NFC"); console.log(normalizedStr1 === normalizedStr2); // true (both now represent the same string)

Example 5: Normalizing a String with Multiple Forms

You can also use different normalization forms depending on your use case. Here’s how to apply each form to the same string:

let str = "café"; console.log(str.normalize("NFC")); // "café" (composed) console.log(str.normalize("NFD")); // "caf" + "é" (decomposed) console.log(str.normalize("NFKC")); // "café" (using compatibility) console.log(str.normalize("NFKD")); // "caf" + "é" (decomposed using compatibility)

Summary:

  • The normalize() method converts a string into a standard Unicode form, which is essential for consistent string operations, especially with Unicode characters.
  • It supports different normalization forms: NFC, NFD, NFKC, and NFKD, allowing for various ways to represent the same text.
  • This method is particularly useful when comparing strings, as it ensures that equivalent characters are treated as equal, regardless of how they are composed or decomposed.