FixWordAutoFormat (C#)

Text pasted from Word into an ASP.NET Web Form can cause issues when posting to a database (the characters are replaced with '?' when the page content-type is not windows-1252). This function helps to fix the issue, by replacing those characters with ISO-8859-1 / UTF-8 friendly alternatives.

/// <summary>
/// Fixes text auto formatted by Word (em/en dashes, smart quotes, bullet, ellipses)
/// </summary>
/// <param name="input">String containing auto formatted text</param>
/// <returns>String without auto formatting</returns>
public static string FixWordAutoFormat(string input)
{
 // replace en-dash
 input = input.Replace("&#8211;", "-");
 // replace em-dash
 input = input.Replace("&#8212;", "-");
 // replace open single quote
 input = input.Replace("&#8216;", "'");
 // replace close single quote
 input = input.Replace("&#8217;", "'");
 // replace open double quote
 input = input.Replace("&#8220;", "\"");
 // replace close double quote
 input = input.Replace("&#8221;", "\"");
 // replace bullets
 input = input.Replace("&#8226;", "*");
 // replace ellipses
 input = input.Replace("&#8230;", "...");
 return input;
}

Tags: , ,

Comments

Popular posts from this blog

Select box manipulation with jQuery

Basic Excel Spreadsheet Generation (ASP/ASP.NET)

Link: HTML Agility Pack (.NET)