One of my current projects involves creating a data warehouse from multiple datafeeds. For cost reasons we’re going with MySQL Community Edition, but we’re sticking with ASP.NET, C#, and Windows on the code end.
I’ll admit that Windows, C#, and MySQL are an unusual combination, but I love C#, and MySQL recently came out with version 5.0, which features stored procedures, transactions, triggers, foreign keys — lots of good stuff. How could I resist? If you’ve never installed MySQL before, don’t worry…the installer is pretty easy to follow, and the install guide helps you through. Just make sure you also download the tools and the ODBC connector driver.
Since I’m going with an ELT approach (bulk load into an unconstrained temporary table, perform cleanup & transformations, then import into the “real” tables), I needed a way to review the import process and check the errors. A lot of the examples on the web have you doing everything from the command line, calling the MySQL command line tools from PHP, piping stuff out to text files, etc. I wanted to instead wrap the import process into a nice little C# app. So let’s jump into the code!
First we need to connect to the database. I’m using the ODBC driver instead of the new ADO.NET provider, because the ODBC overhead isnt’ an issue with so few queries, and the ADO.NET provider is so new that I don’t quite trust it yet. 🙂
string conString = “DRIVER={MySQL ODBC 3.51 Driver}; SERVER=localhost; DATABASE=test; UID=theuser; PASSWORD=thepassword; OPTION=3”;
OdbcConnection mysqlConnection = new OdbcConnection(conString);
mysqlConnection.Open();
Next we’re going to bump the max error count to something big. The default of 64 just isn’t enough when you’re importing a lot of data. You want to see all the problems encountered during the import.
OdbcCommand updateErrorCount = new OdbcCommand(“set max_error_count = 10000;”, mysqlConnection);
updateErrorCount.ExecuteNonQuery();
Now let’s import data from a flat file into a staging table using the MySQL LOAD DATA bulk import command. This staging table should have a schema that matches the flat file, but shouldn’t have any constraints other than a primary key. If there are any constraints (foreign key, etc), and even one record violates them, then the whole import will roll back. And if you have 1 bad row out of 100,000, you would probably prefer to get the 99,999 good records instead of zero. If you designate a primary key, then you can write a LOAD DATA statement that will only import records whose primary key isn’t already in the table. This allows you to import multiple files (or even the same file) repeatedly without worrying about duplicate keys.
string importData =
“LOAD DATA LOCAL INFILE ‘F:/imports/mls_import_file.txt’ “+
“IGNORE “+
“INTO TABLE mls_staging “+
“FIELDS TERMINATED BY ‘|’ “+
“LINES STARTING BY ” “+
“TERMINATED BY ‘rn’ “+
“IGNORE 1 LINES; “;
Let me quickly go over the different lines:
- LOAD DATA LOCAL INFILE ‘F:/imports/mls_import_file.txt’ causes a bulk load from the F:/imports/mls_import_file.txt file. The LOCAL keyword has the client program read the file and send it to the server. If you omit the LOCAL keyword, then MySQL will try to read the file directly, which is faster, but may have permissions issues.
- IGNORE tells MySQL to not import any records whose primary key already exists in the destination table
- INTO TABLE mls_staging is the destination table. It needs to have the same columns as the flat file, otherwise you need to specify which columns to import.
- FIELDS TERMINATED BY ‘|’ says that the fields are pipe-delimited
- LINES STARTING BY ” means the records don’t start with any special character. Those are two single-quotes, btw.
- TERMINATED BY ‘rn’ means each record is terminated by a CRLF
- IGNORE 1 LINES; tells MySQL to skip the first line (e.g. if it contains column header information). And the semicolon indicates the end of a statement, just like in C#.
Now we’re actually execute the import statement:
OdbcCommand importCommand = new OdbcCommand(importData, mysqlConnection);
int importRecords = importCommand.ExecuteNonQuery();
Debug.WriteLine(importRecords + ” records imported.”);
When the LOAD DATA command runs, it’ll pull all those rows and load as many into the table as possible, skipping any duplicates. Since there are no foreign key constraints, the main errors we’ll see are if the file can’t be read or the server is having problems. However, we might see a number of warnings, e.g. missing columns, too-large field values that get truncated, etc. We want to know about all that stuff, so we’re first going to get the number of warnings and errors.
OdbcCommand warningCommand = new OdbcCommand(“select @@warning_count;”, mysqlConnection);
int warningCount = Convert.ToInt32(warningCommand.ExecuteScalar());
OdbcCommand errorCommand = new OdbcCommand(“select @@error_count;”, mysqlConnection);
int errorCount = Convert.ToInt32(errorCommand.ExecuteScalar());
Debug.WriteLine(String.Format(“{0} warnings, {1} errors”, warningCount, errorCount));
Now we can inspect the number of warnings and errors, and if there are any, we can log them, email someone, etc. The below code just loops through the warnings and errors and displays them in the output window.
if (warningCount > 0)
{
// show any warnings
OdbcCommand warningListCommand = new OdbcCommand(“show warnings;”, mysqlConnection);
OdbcDataReader warningList = warningListCommand.ExecuteReader();
while (warningList.Read())
{
Debug.WriteLine(warningList.GetString(0) + ” “ + warningList.GetString(1) + ” “ + warningList.GetString(2));
}
warningList.Close();
}
if (errorCount > 0)
{
// show any errors
OdbcCommand errorListCommand = new OdbcCommand(“show errors;”, mysqlConnection);
OdbcDataReader errorList = errorListCommand.ExecuteReader();
while (errorList.Read())
{
Debug.WriteLine(errorList.GetString(0) + ” “ + errorList.GetString(1) + ” “ + errorList.GetString(2));
}
errorList.Close();
}
And lastly we close up shop by calling
mysqlConnection.Close();
Hopefully the above will help you out when importing data in the unholy(?) alliance of MySQL and C#. MySQL’s new version bring along a ton of goodies, so don’t be surprised if you see MySQL databases popping up more frequently.