Programming

Simplify Strings for Comparison by Removing Special Characters and Diacritic Marks

I was working on a search system that needed to simplify the strings for comparison. It needed to compare the text regardless of special characters (diacritic marks) or casing. The following function breaks the special characters into their component parts, before removing the “special” parts and lower-casing the whole thing.

function normalise(term) {
    // Simplifies diacritic characters, accents, and casing
    term.normalize('NFD').replace(/[\u0300-\u036f]/g, '').toLowerCase();
 }

You can see the impact using this sample from a couple of languages:

const strings = ['example', 'façade', 'résumé', 'černá', 'piñata'];

for (let s of strings) {
    const simple = s.normalize('NFD').replace(/[\u0300-\u036f]/g, '');
    console.log(simple);
}

The output is:

  1. example
  2. facade
  3. resume
  4. cerna
  5. pinata