Simplify strings for comparison by removing special characters and diacritic marks
This is the JavaScript edition, but I also have a C# method to remove special characters diacritic marks. I was working on a search system that needed to simplify the strings for comparison. It needed to compare the text regardless of special characters (diacritic marks) or casing. The following function breaks the special characters into their component parts, before removing the “special” parts and lower-casing the whole thing.
function normalise(term) { // Simplifies diacritic characters, accents, and casing term.normalize('NFD').replace(/[\u0300-\u036f]/g, '').toLowerCase(); }
You can see the impact using this sample from a couple of languages:
const strings = ['example', 'façade', 'résumé', 'černá', 'piñata']; for (let s of strings) { const simple = s.normalize('NFD').replace(/[\u0300-\u036f]/g, ''); console.log(simple); }
The output is:
- example
- facade
- resume
- cerna
- pinata
Written by Steve Fenton on