Parse CSV Files in C# and ASP.NET

Parsing CSV files is a common task for data-import programs. I’ve used Sébastien Lorion’s CSV reader several times, as it’s lightweight, fast, and gives you a number of options on delimited and how to handle parsing errors. I’ve also had great success using SqlBulkCopy to quickly load large amounts of data into SQL Server.

Recently, we noticed a certain CSV file was failing with some strange parsing errors in one of our projects. It turns out that there was a double quote in the middle of a quote-delimited, field. So a few rows looked like this

134,Some Data,"field has, commas",some data,"field has "quotes" in it",1234

Notice how the bold red field has quotes in the middle of the field value, even though quotes are also a field quote character (allowing you to have values containing commas, as seen in the blue field).

[As an aside, CSV field values are allowed to contain quote quotes as long as they’re doubled, so “blah”blah” is invalid, but “blah””blah” is fine.]

Anyhow, that intra-field quote character was throwing off the CSV reader, so we decided to use a little RegEx to get rid of those quotes:

Encoding fileEncoding = GetFileEncoding(csvFile);
// get rid of all doublequotes except those used as field delimiters
string fileContents = File.ReadAllText(csvFile, fileEncoding);
string fixedContents = Regex.Replace(fileContents, @"([^^,rn])""([^$,rn])", @"$1$2");
using (CsvReader csv =
       new CsvReader(new StringReader(fixedContents), true))
{
       // ... parse the CSV

That resolved the problem & allowed the CSV file to go in seamlessly. We could have also replaced the mid-field quote characters with a single quote (instead of just removing them) by using something like @”$1’$2″ as the third param to the Regex.Replace call.

Note that the above fix is pulling all the CSV data into a string & doing a global replace, so for extremely large files you could improve performance by doing the substitution a different way (perhaps going line-by-line, or modifying Sébastien’s code).

1