My god how I hate Excel. Often it’s because people use it for some purpose it’s not really suitable for, where a relational database (even Access!) would be much better because (they think) they know how to use Excel and god forbid anyone take the time to learn to do something properly. Sometimes however, Excel is just plain fucking stupid. This is one of those times.
Recently I was trying to save an Excel spreadsheet as a CSV, to import into SQL Server (importing direct from Excel without converting to CSV gives other problems, like the Excel driver ‘helping’ you by detecting the wrong column types).
It was during this apparently innocuous task that I discovered a massive fail in Excel, which resulted in the import failing because some columns were missing. SQL Server’s helpful error message read:
Data conversion failed when converting column x to column y. The conversion returned status value 2 and status text “The value could not be converted because of a potential loss of data.”
Trying the same thing using BULK INSERT gave the error:
Bulk load data conversion error (type mismatch or invalid character for the specified codepage) for row 17
followed by the same message for a number of other rows.
If that sounds familiar to you, you can blame your agonising headache on the way Microsoft Excel exports files to CSV. You see, rather than working out how many columns to have in each record by doing something sensible (like taking the maximum number of columns used in any record, or counting column headings) Microsoft decided that Excel should instead read rows in blocks of 16 and count the number of columns in each of these blocks. That means if you go 16 rows with no data in your final column(s), Excel will assume those columns don’t exist (even where there’s a column heading!) and instead of putting empty fields (i.e. ‘,,,’) Excel will simply pretend the fields don’t exist. Or in Microsoft’s words:
Excel saves text files in 16-row blocks. Therefore, if all the cells in the last column in a 16-row block are empty, Excel saves that area as if the column does not exist. No tabs or commas are saved for that 16-row block.
To make sure that Excel saves tab or comma delimiters for all empty columns, verify that the last column in the file contains some data in at least every 16 rows throughout the file. If the blocks of rows do not contain data, add spaces or other characters in every 16 rows to the cells in the last column, or reorder the columns in the worksheet so that the last column on the worksheet always contains information.
Make sure that the last column in every row has data in it (or at least one row every 16 records). Not great. In practical terms this means adding a ‘dummy’ column to the end of your records, putting in some standard value (e.g. -1), running your import, and then deleting this column again afterwards.
Fun, eh? According to the Microsoft Knowledgebase article, this should only affect versions up to Excel 2002, however I encountered it on Excel 2003 (SP3), so I’ve no idea whether it’s fixed in Excel 2007 or 2010.