Truncation errors trying to import from Excel - sql-server

I'm trying to import the NDC database that you can download here: http://www.fda.gov/drugs/informationondrugs/ucm142438.htm
When I initially tried to import the excel in the zip file it complained about the format, so I started with a blank excel, and imported it into excel from the txt file.
I've created a table to import the data into and set all the columns to nvarchar(MAX). The column it complains about is the SUBSTANCENAME column. I checked, and the longest value in that column is about 2700 characters.
My understanding is that the nvarchar(MAX) should easily hold that much. I'm not sure what to do about this other than changing that column to a text field. Should that fit into that column how it is?
I've tried setting it to ignore errors, but as far as I can tell that does nothing, at least it never seems to ignore them when I try.

How are you importing the data into the SQL Server table? If I remember correctly, SSIS uses the first 5 or 10 rows of the Excel file to determine the datatype and length. I remember I had to make a change to the registry in order to get a larger sample size
HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Microsoft\Jet\4.0\Engines\Excel
The TypeGuessRows entry can be modified to get a larger sample size.
That is assuming you are using SSIS - but if you are using SQL Server Import then it would be doing the same thing as well.

Related

Importing Excel Data Seems to Randomly Give Null Values

Using SSIS for Visual Studio 2017 for some excel file imports.
I've created a package with several loop containers that call to specific packages to handle some files. I have an issue with one particular package being executed in that it seemingly randomly decides the data for columns is NULL per excel file. I was/am under the impression that this is part of the registry setting for TypeGuessRows (changed initially to 0 then to 1000 as a test) located at
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office\14.0\Access Connectivity Engine\Engines\Excel
The reason I think this is because the various files being brought in generally have the same data, but it seems that if the first few rows of columns in the source data contains only numbers, that the data with mixed values will not be brought in correctly. All other columns aside from this seems fine.
Looking at the source files, all have the same datatype.
I've tried changing the registry TypeGuessRows value and ensured that the output column property was string-based instead of numerical.
The connection string has IMEX=1
So I fixed it. Or at least found a sufficient workaround that should help anyone in my situation. I think it has to do with the cache of SSIS.
I ended up putting a sort function on the problem column so the records getting read as NULL for having a random data type are read first, and not being considered random. I will say, I tried this initially and it didn't work.
Through a little experiment of making a new data flow in the same package I discovered that this solution actually does work, hence me thinking the cache was the issue.
If anyone has any further questions on this, let me know.
This issue is related to the OLEDB provider used to read excel files: Since excel is not a database where each column has a specific data type, OLEDB provider tries to identify the dominant data types found in each column and replace all other data types that cannot be parsed with NULLs.
There are many articles found online discussing this issue and giving several workarounds (links listed below).
But after using SSIS for years, i can say that best practice is to convert excel files to csv files and read them using Flat File components.
Or, if you don't have the choice to convert excel to flat files then you can force excel connection manager to ignore headers from the first row bu adding HDR=NO to the connection string and adding IMEX=1 to tell the OLEDB provider to specify data types from the first row (which is the header - all string most of the time), in this case all columns are imported as string and no values are replaced with NULLs but you will lose the headers and a additional row (header row is imported).
If you cannot ignore the header row, just add a dummy row that contains dummy string values (example: aaa) after the header row and add IMEX=1 to the connection string.
Helpful links
SSIS Excel Data Import - Mixed data type in Rows
Mixed data types in Excel column
Importing data from Excel having Mixed Data Types in a column (SSIS)
Why SSIS always gets Excel data types wrong, and how to fix it!
EXCEL IN SSIS: FIXING THE WRONG DATA TYPES
IMEX= 1 extended properties in ssis

Work around to 255 column limit in SQL import export wizard for excel

I am trying to import an excel sheet to SQL database. This sheet has 700+ columns, but i understand there is a limit of 255 columns. Is there a work around to include all the columns while uploading to database. I selected Excel 2007 while selecting excel version.
Sadly, of the resources I saw online, there wasn’t an easy way to do this. I tried the above solution of choosing “Excel 2007” but that didn’t work for me. One would expect Excel and SQL Server to have tighter integration.
However, by converting to a .txt file and then working through some of the truncation errors, I managed to load my data set. Below is some detail of my use case and process.
I had a fairly small dataset row-wise (~200 rows) but with a large set of columns (+500 columns). I first converted the Excel file to a text, 'tab-delimited' file (i.e. *.txt). In loading through the SQL Server Import and Export Wizard, I faced two truncation errors: one, a few of the column names were greater than 128 characters and two, the length of values in some of the rows where the default datatype was DT_STR was greater than 50 (the default output column width). For the first error, I just renamed the column(s) to something shorter. For the second error, I manually ran a count of lengths and found the maximum length of values for each column, which allowed me to isolate which columns would throw up an error. I followed the steps below:
1) In the SQL Server Import & Export Wizard 'General' Tab, I selected 'Flat File source' and accepted all the normal defaults.
2) In the second tab ('Columns'), ensure column delimiter is selected as Tab. I preferred this over a CSV format since there was some heavy text with commas in my dataset.
3) If the mapping works out, in the third tab ('Advanced'), you should get a laundry list of all your columns with their natural defaults. As detailed above, I isolated which columns had values that exceeded the default of DT_STR (50) and changed that to DT_TEXT.
4) The remaining steps just specify the destination, and whether you want to save the SSIS steps.
A more simple, straightforward solution: on excel, sort your rows so the one with the longest value is moved to the top. It looks like the import wizard only looks into the first 8 rows of your data file to determine the width of each column (weird!).
source: How does one change the default varchar 255 of a column when importing data from Excel to Sql Server using Import Export Wizard?

Import Flat File into SQL Server Repeated Headers Break Integrity Constraints

I'm having several issues with importing a flat file into MS SQL Server using the SQL Server import / export wizard. I'd like to know how to effectively load the file into a SQL Server table.
File Conditions:
The flat file is fairly large (800MB, and serveral million rows)
It's poorly formatted
The first column is empty
The header is a 3 row set: top blank, middle has field names, bottom blank
This 3 row header is repeated approximately every 60,000 rows
Some values are nulls
It's tab delimited
First, I tried to load it in as Flat File, but SQL server failed to recognize the tab delimiters. Excel opens it correctly (although partially), but SQL Server sticks it all in 1 column.
Second, I tried opening and saving it as an excel file and loading it as an excel file into the SQL Server import wizard (which I'm not sure if it resaves all the data anyway). Now SQL Server parses the columns correctly, but it says integrity constrints are broken when it hits the repeated headers (every numeric type field has a string header every 60000 rows).
If anyone can tell me how to get around this that would be great. I'd ideally like to upload it without the integrity constraints and remove the extra headers with a DELETE WHERE header or blank clause. Not the only solution I'll take, but an idea.
Also, this is my first stackoverflow post, so patience is appreciated.
Thanks,
Since I don't have a formal answer yet, I'll post what I ended up doing.
Essentially, I just made everything a varchar so it would just load into a table. Then I wrote several queries to clean up the garbage in it. Later I made new typed fields and filled them with an insert and cast from the varchar typed fields.
I don't know that this will ever help someone, but at least there's an answer here.

Error converting data types when importing from Excel to SQL Server 2008

Every time that I try to import an Excel file into SQL Server I'm getting a particular error. When I try to edit the mappings the default value for all numerical fields is float. None of the fields in my table have decimals in them and they aren't a money data type. They're only 8 digit numbers. However, since I don't want my primary key stored as a float when it's an int, how can I fix this? It gives me a truncation error of some sort, I'll post a screen cap if needed. Is this a common problem?
It should be noted that I cannot import Excel 2007 files (I think I've found the remedy to this), but even when I try to import .xls files every value that contains numerals is automatically imported as a float and when I try to change it I get an error.
http://imgur.com/4204g
SSIS doesn't implicitly convert data types, so you need to do it explicitly. The Excel connection manager can only handle a few data types and it tries to make a best guess based on the first few rows of the file. This is fully documented in the SSIS documentation.
You have several options:
Change your destination data type to float
Load to a 'staging' table with data type float using the Import Wizard and then INSERT into the real destination table using CAST or CONVERT to convert the data
Create an SSIS package and use the Data Conversion transformation to convert the data
You might also want to note the comments in the Import Wizard documentation about data type mappings.
Going off of what Derloopkat said, which still can fail on conversion (no offense Derloopkat) because Excel is terrible at this:
Paste from excel into Notepad and save as normal (.txt file).
From within excel, open said .txt file.
Select next as it is obviously tab delimited.
Select "none" for text qualifier, then next again.
Select the first row, hold shift, select the last row, and select the text radial button. Click Finish
It will open, check it to make sure it's accurate and then save as an excel file.
There is a workaround.
Import excel sheet with numbers as float (default).
After importing, Goto Table-Design
Change DataType of the column from Float to Int or Bigint
Save Changes
Change DataType of the column from Bigint to any Text Type (Varchar, nvarchar, text, ntext etc)
Save Changes.
That's it.
When Excel finds mixed data types in same column it guesses what is the right format for the column (the majority of the values determines the type of the column) and dismisses all other values by inserting NULLs. But Excel does it far badly (e.g. if a column is considered text and Excel finds a number then decides that the number is a mistake and insert a NULL instead, or if some cells containing numbers are "text" formatted, one may get NULL values into an integer column of the database).
Solution:
Create a new excel sheet with the name of the columns in the first row
Format the columns as text
Paste the rows without format (use CVS format or copy/paste in Notepad to get only text)
Note that formatting the columns on an existing Excel sheet is not enough.
There seems to be a really easy solution when dealing with data type issues.
Basically, at the end of Excel connection string, add ;IMEX=1;"
Provider=Microsoft.Jet.OLEDB.4.0;Data Source=\\YOURSERVER\shared\Client Projects\FOLDER\Data\FILE.xls;Extended Properties="EXCEL 8.0;HDR=YES;IMEX=1";
This will resolve data type issues such as columns where values are mixed with text and numbers.
To get to connection property, right click on Excel connection manager below control flow and hit properties. It'll be to the right under solution explorer. Hope that helps.
To avoid float type field in a simple way:
Open your excel sheet..
Insert blank row after header row and type (any text) in all cells.
Mouse Right-Click on the head of the columns that cause a float issue and select (Format Cells), then choose the category (Text) and press OK.
And then export the excel sheet to your SQL server.
This simple way worked with me.
A workaround to consider in a pinch:
save a copy of the excel file, modify the column to format type 'text'
copy the column values and paste to a text editor, save the file (call it tmp.txt).
modify the data in the text file to start and end with a character so that the SQL Server import mechanism will recognize as text. If you have a fancy editor, use included tools. I use awk in cygwin on my windows laptop. For example, I start end end the column value with a single quote, like "$ awk '{print "\x27"$1"\x27"}' ./tmp.txt > ./tmp2.txt"
copy and paste the data from tmp2.txt over top of the necessary column in the excel file, and save the excel file
run the sql server import for your modified excel file... be sure to double check the data type chosen by the importer is not numeric... if it is, repeat the above steps with a different set of characters
The data in the database will have the quotes once the import is done... you can update the data later on to remove the quotes, or use the "replace" function in your read query, such as "replace([dbo].[MyTable].[MyColumn], '''', '')"

SQL import wizard drops leading zero

I've read all the posts about prefixing the numbers with ', and setting IMEX=1 in the connection string; nothing seems to do the trick for me.
Here's the setup: Excel column with mixed data - 99% numbers (some start with 0) 1% text.
PROGRAMATICALLY mporting into SQL Server 2005 table / column type - varchar(255).
Import works fine locally, but once i move the code to production (GoDaddy), it drops the leading 0's in the column.
Any ideas?
p.s. I knew about the registry change solution, matter of fact - the value was set to 0 on my dev box, but the answer made me realize that the value wasn't set on the PRODUCTION SERVER :)
The ISAM driver only samples the first 8 rows, but you can change that behaviour through a registry change:
http://sqlserversd.wordpress.com/2008/09/14/ssis-excel-values-import-as-nulls/
But yes, using Excel for machine-to-machine data transfer is a nightmare...Is there no other way you can be sent the data?
Yes. The Excel driver only sample the 1st 8 rows to determine the data type.
This means that it assumes the column is numeric is "bob" does not appear in rows 1 to 8.
The target table column datatype is irrelevant.
Ths issue has been there for a long time, I saw it in 2003 the first time.
BOL notes on excel import
We usually save the file as a .csv file or .txt file and then the issue doesn't occur.
There is a quick and tricky way for this. following these steps BUT first copy all the data / columns and rows from the actual excel sheet into another excel sheet just to be of the safe side so that you have the actual data to compare with.
Steps:
Copy all the values in the column and paste them into a notepad.
Now change the column type to text in the Excel sheet (it will trim the preceding / trailing Zeros), don't worry about that.
Go to Notepad and copy all the values that you have pasted just now.
Go to your excel sheet and paste the values in same column.
If you have more than one column with 0 values then just repeat the same.
Now your excel document is ready to be imported with Zero Values :).
Happy Days.

Resources