Tech Blog‎ > ‎

C# - Normalize Whitespace Inside String

posted Aug 14, 2015, 11:42 AM by Victor Zakharov   [ updated Aug 14, 2015, 11:42 AM ]
Simple task - replace all sequential whitespace (tabs, spaces, newlines) with your character of choice, usually a space.
Fact - StackOverflow is on top in google results, when searching for "C# normalize whitespace". Namely:
Why this article? Highest voted answers are not the best performing, and some answers are just wrong.
I propose a solution which is based on this StackOverflow answer. Referenced answer has a problem - it fails when input = " " (single space).
I wasn't sure about it being the only unhandled corner case, so I changed method to use StringBuilder, to simplify string manipulation. Probably the same level of performance, just easier to read.
Below version should be much faster than using Regex (link 1 - highest voted) and slightly faster than NormalizeWithSplitAndJoin by @JonSkeet.
/// <summary>
///  Any consecutive white-space (including tabs, newlines) is replaced with whatever is in normalizeTo.
/// </summary>
/// <param name="input">Input string.</param>
/// <param name="normalizeTo">Character which is replacing whitespace.</param>
/// <remarks>Based on http://stackoverflow.com/a/25023688/897326 </remarks>
private static string NormalizeWhiteSpace(string input, char normalizeTo = ' ')
{
    if (string.IsNullOrEmpty(input))
    {
        return string.Empty;
    }

    StringBuilder output = new StringBuilder();
    bool skipped = false;

    foreach (char c in input)
    {
        if (char.IsWhiteSpace(c))
        {
            if (!skipped)
            {
                output.Append(normalizeTo);
                skipped = true;
            }
        }
        else
        {
            skipped = false;
            output.Append(c);
        }
    }

    return output.ToString();
}