Programming

Removing Special Characters and Diacritic Marks in C#

I did this trick in JavaScript to remove diacritic marks a while back and the need to perform a similar transformation in C# came up this week.

The following method simplifies strings such as “façade” into simple string like “façade”.

private static string Simplify(string input) 
{
    string normalizedString = input.Normalize(NormalizationForm.FormD);
  
    StringBuilder stringBuilder = new StringBuilder();

    foreach (char c in normalizedString)
    {
        UnicodeCategory unicodeCategory = CharUnicodeInfo.GetUnicodeCategory(c);
      
        if (unicodeCategory != UnicodeCategory.NonSpacingMark)
        {
            stringBuilder.Append(c);
        }
    }

    return stringBuilder.ToString().Normalize(NormalizationForm.FormC);
}