SSIS - Excel data shows as scientific notations and Null Values - sql-server

I have some excel data which contains scientific numbers like 5e+00.
When the see the value in excel by clicking edit button I can see the full value. But when I import the data into table I am getting the data loaded as Null. I need to import the data without doing any changes in excel. Please suggest how to do it in SSIS.
I tried imported by changing the format in excel side. I want it to be done in ssis level without doing any changes in excel
Data in my Column as
Amounts
15880
5e+19
57892
I expect the output should be like as follows
1588007
500000000019
57892
But I am getting Null value for second item
Please suggest.

In the question above, there are 2 problems:
Numbers are shown in scientific format
Data is replaced by Null values while importing
Scientific Format issue
You mentioned that:
I tried imported by changing the format in excel side. I want it to be done in SSIS level without doing any changes in excel
Unfortunately, this cannot be done without changing the Excel file, since the only way to solve this issue is to change the Number Format property of the cells. You can automate this step by adding a Script Task that uses Microsoft.Office.Interop.Excel.dll assembly to automate this process instead of doing it manually from Excel.
You can refer to the following post as an example:
Format excel destination column in ssis script task
But make sure to use:
m_XlWrkSheet.Columns(1).NumberFormat = "0"
To force a Numeric format.
Null Values issue
This issue is caused by the OLE DB provider used to read from Excel files, This error occurs when the Excel column contains mixed data types, the OLE DB provider read the values with dominant data types and replace all other values with Nulls.
You can refer to the following links for more information/workarounds:
Importing Excel Data Seems to Randomly Give Null Values
SQL JOIN on varchar with special characters and leading zeros
Dynamically Creating Excel table through SSIS

Related

Importing Excel Data Seems to Randomly Give Null Values

Using SSIS for Visual Studio 2017 for some excel file imports.
I've created a package with several loop containers that call to specific packages to handle some files. I have an issue with one particular package being executed in that it seemingly randomly decides the data for columns is NULL per excel file. I was/am under the impression that this is part of the registry setting for TypeGuessRows (changed initially to 0 then to 1000 as a test) located at
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office\14.0\Access Connectivity Engine\Engines\Excel
The reason I think this is because the various files being brought in generally have the same data, but it seems that if the first few rows of columns in the source data contains only numbers, that the data with mixed values will not be brought in correctly. All other columns aside from this seems fine.
Looking at the source files, all have the same datatype.
I've tried changing the registry TypeGuessRows value and ensured that the output column property was string-based instead of numerical.
The connection string has IMEX=1
So I fixed it. Or at least found a sufficient workaround that should help anyone in my situation. I think it has to do with the cache of SSIS.
I ended up putting a sort function on the problem column so the records getting read as NULL for having a random data type are read first, and not being considered random. I will say, I tried this initially and it didn't work.
Through a little experiment of making a new data flow in the same package I discovered that this solution actually does work, hence me thinking the cache was the issue.
If anyone has any further questions on this, let me know.
This issue is related to the OLEDB provider used to read excel files: Since excel is not a database where each column has a specific data type, OLEDB provider tries to identify the dominant data types found in each column and replace all other data types that cannot be parsed with NULLs.
There are many articles found online discussing this issue and giving several workarounds (links listed below).
But after using SSIS for years, i can say that best practice is to convert excel files to csv files and read them using Flat File components.
Or, if you don't have the choice to convert excel to flat files then you can force excel connection manager to ignore headers from the first row bu adding HDR=NO to the connection string and adding IMEX=1 to tell the OLEDB provider to specify data types from the first row (which is the header - all string most of the time), in this case all columns are imported as string and no values are replaced with NULLs but you will lose the headers and a additional row (header row is imported).
If you cannot ignore the header row, just add a dummy row that contains dummy string values (example: aaa) after the header row and add IMEX=1 to the connection string.
Helpful links
SSIS Excel Data Import - Mixed data type in Rows
Mixed data types in Excel column
Importing data from Excel having Mixed Data Types in a column (SSIS)
Why SSIS always gets Excel data types wrong, and how to fix it!
EXCEL IN SSIS: FIXING THE WRONG DATA TYPES
IMEX= 1 extended properties in ssis

SSIS receive Excel column as DT_IMAGE

Good day to you, Experts.
I'm stuck on a problem I'm having with an Excel 97-02 .xls file.
When adding it as a source in SSIS, I'm getting an External Columns Datatype of DT_IMAGE .
The column represents an ID and is numeric only. I can't extract and work with the data because of the DT_IMAGE datatype.
Setting IMEX=1 didn't help.
Thank you in advance.
Reading Excel files in SSIS is done using OLEDB provider which may not detect the appropriate Excel column type.
There are many other questions mentioning similar issues such as:
SSIS Excel Import Forcing Incorrect Column Type
SSIS Excel Data Source - Is it possible to override column data types?
SSIS keeps force changing excel source string to float
As you mentioned in the question, if you added ;Extended Properties="IMEX=1" to the connectionstring with no luck then i think there is 4 things you can try:
Sorting column data inside Excel
Change the entire column formatting manually
Go to the advanced editor on the Excel source >> into the output column list and set the type for each of the columns.
Adding IMEX=1; MAXROWSTOSCAN=0 to the connectionstring
If nothing of the above steps worked then you should save the Excel sheet as a text file and then you use Flat File Connection manager

SSIS importing extra decimal values in to destination sql table from excel

I am trying to import "Financial data" from Excel files in to sql table. Problem I am facing is that My ssis package is incrementing decimal values. e.g -(175.20) from Excel is being loaded as "-175.20000000000005" in SQL.
I am using nVArChar (20) in destination SQL table. Images attached. What's the best data type in destination table. I have done a lot of reading and people seem to suggest decimal data but Package throws error for Decimal data type.Need help please.
Ended up changing the Data type to "Currency" in my SQL destination. Then added a data conversion task to change "DT_R8" data type from excel source to "currency[DT_CY]. This resolved the issue. could have used decimal or Numeric (16,2)data type in my destination as well but then i just went ahead with currency and it worked.
You could use a Derived Column Transformation in your Data Flow Task, with an expression like ROUND([GM],2) (you might need to replace GM with whatever your actual column name is).
You can then go to the Advanced Editor of the Derived Column Transformation and set the data type to decimal with a Scale of 2 on the 'Input and Output Properties' tab (look under 'Derived Column Output').
You'll then be able to use a decimal data type in your SQL Server table.

Error converting data types when importing from Excel to SQL Server 2008

Every time that I try to import an Excel file into SQL Server I'm getting a particular error. When I try to edit the mappings the default value for all numerical fields is float. None of the fields in my table have decimals in them and they aren't a money data type. They're only 8 digit numbers. However, since I don't want my primary key stored as a float when it's an int, how can I fix this? It gives me a truncation error of some sort, I'll post a screen cap if needed. Is this a common problem?
It should be noted that I cannot import Excel 2007 files (I think I've found the remedy to this), but even when I try to import .xls files every value that contains numerals is automatically imported as a float and when I try to change it I get an error.
http://imgur.com/4204g
SSIS doesn't implicitly convert data types, so you need to do it explicitly. The Excel connection manager can only handle a few data types and it tries to make a best guess based on the first few rows of the file. This is fully documented in the SSIS documentation.
You have several options:
Change your destination data type to float
Load to a 'staging' table with data type float using the Import Wizard and then INSERT into the real destination table using CAST or CONVERT to convert the data
Create an SSIS package and use the Data Conversion transformation to convert the data
You might also want to note the comments in the Import Wizard documentation about data type mappings.
Going off of what Derloopkat said, which still can fail on conversion (no offense Derloopkat) because Excel is terrible at this:
Paste from excel into Notepad and save as normal (.txt file).
From within excel, open said .txt file.
Select next as it is obviously tab delimited.
Select "none" for text qualifier, then next again.
Select the first row, hold shift, select the last row, and select the text radial button. Click Finish
It will open, check it to make sure it's accurate and then save as an excel file.
There is a workaround.
Import excel sheet with numbers as float (default).
After importing, Goto Table-Design
Change DataType of the column from Float to Int or Bigint
Save Changes
Change DataType of the column from Bigint to any Text Type (Varchar, nvarchar, text, ntext etc)
Save Changes.
That's it.
When Excel finds mixed data types in same column it guesses what is the right format for the column (the majority of the values determines the type of the column) and dismisses all other values by inserting NULLs. But Excel does it far badly (e.g. if a column is considered text and Excel finds a number then decides that the number is a mistake and insert a NULL instead, or if some cells containing numbers are "text" formatted, one may get NULL values into an integer column of the database).
Solution:
Create a new excel sheet with the name of the columns in the first row
Format the columns as text
Paste the rows without format (use CVS format or copy/paste in Notepad to get only text)
Note that formatting the columns on an existing Excel sheet is not enough.
There seems to be a really easy solution when dealing with data type issues.
Basically, at the end of Excel connection string, add ;IMEX=1;"
Provider=Microsoft.Jet.OLEDB.4.0;Data Source=\\YOURSERVER\shared\Client Projects\FOLDER\Data\FILE.xls;Extended Properties="EXCEL 8.0;HDR=YES;IMEX=1";
This will resolve data type issues such as columns where values are mixed with text and numbers.
To get to connection property, right click on Excel connection manager below control flow and hit properties. It'll be to the right under solution explorer. Hope that helps.
To avoid float type field in a simple way:
Open your excel sheet..
Insert blank row after header row and type (any text) in all cells.
Mouse Right-Click on the head of the columns that cause a float issue and select (Format Cells), then choose the category (Text) and press OK.
And then export the excel sheet to your SQL server.
This simple way worked with me.
A workaround to consider in a pinch:
save a copy of the excel file, modify the column to format type 'text'
copy the column values and paste to a text editor, save the file (call it tmp.txt).
modify the data in the text file to start and end with a character so that the SQL Server import mechanism will recognize as text. If you have a fancy editor, use included tools. I use awk in cygwin on my windows laptop. For example, I start end end the column value with a single quote, like "$ awk '{print "\x27"$1"\x27"}' ./tmp.txt > ./tmp2.txt"
copy and paste the data from tmp2.txt over top of the necessary column in the excel file, and save the excel file
run the sql server import for your modified excel file... be sure to double check the data type chosen by the importer is not numeric... if it is, repeat the above steps with a different set of characters
The data in the database will have the quotes once the import is done... you can update the data later on to remove the quotes, or use the "replace" function in your read query, such as "replace([dbo].[MyTable].[MyColumn], '''', '')"

How do I format dd-mmm-yy values in flat file to smalldatetime during data import?

I have a flat file which is imported into SQL Server via an existing SSIS package. I need to make a change to the package to accommodate a new field in the flat file. The new field is a date field which is in the format dd-mmm-yy (e.g. 25-AUG-11). The date field in the flat file will either be empty (e.g. a space/whitespace) or populated with a date. I don’t have any control over the date format in the flat file.
I need to import the date field in the flat file into an existing SQL Server table and the target field data type is smalldatetime.
I was proposing to import the date as a string into a load table and then convert to smalldatetime when taking the data from the load table. But is there another possible way to parse the date format dd-mmm-yy to load this straight into a smalldatetime field without having to use convert to smalldatetime from the load table. I can’t quite think how to parse the date format, particularly the month. Any suggestions welcome.
Here is an example that might give you an idea of what you can do. Ideally, in an SSIS package or in any ETL job, you should take into account that data may not be exactly what you would like it to be. You need to take appropriate steps to handle the incorrect or invalid data that might pop up now and then. That's why SSIS comes up with lots of Transformation tasks within Data Flow Task which you can make use of to clean up the data.
In your case, you can make use of Derived Column transformation or Data conversion transformation to achieve your requirements.
The example was created in SSIS 2008 R2. It shows how to read a flat file containing the dates and load into an SQL table.
I created a simple SQL table to import the flat file data.
On the SSIS package, I have a connection manager to SQL and one for Flat file. Flat file connection is configured as shown below.
On the SSIS package, I placed a Data Flow Task on the Control Flow tab. Inside, the Data Flow task, I have a Flat File Source, Derived Column transformation and an OLE DB Destination. Since the Flat file source and OLE DB destination are straightforward, I will leave those out here. The Derived transformation creates a new column with the expression (DT_DBDATE)SmallDate. Note that you can also use Data Conversion transformation to do the same. This new column SmallDateTimeValue should be mapped to the database column in OLE DB Destination.
If you execute this package, it will fail because not all the values in the file are valid.
The reason why it fails in your case is because the invalid data is directly inserted into the table. In your case, the table will throw an exception making the package to fail. In this example, the package fails because the default setting on the Derived column transformation is to fail the component if there is any error. So, let's place a dummy transformation to redirect the error rows. We will Multicast transformation for this purpose. It won't really do anything. Ideally, you should redirect the error rows to another table using OLE DB Destination or other Destination component of your choice so you can analyze the data that causes the errors.
Drag the red arrow from Derived transformation and connect it to the Multicast transformation. This will popup the Configure Error Output dialog. Change the values under the column Error and Truncation from Fail component to Redirect row. This will redirect any error rows to the Multicast transformation and will not get into the tables.
Now, if we execute the package, it will run successfully. Note the number of rows displayed in each direction.
Here is the data that got into the table. Only 2 rows were valid. You can look at the first screenshot that showed the data in the file and you can see only 2 rows were valid.
Hope that gives you an idea to implement your requirement in the SSIS package.
It should load straight into a SMALLDATETIME field as it is. Remember, dates are just numbers in SQL Server, which are presented to the user in the desired date/time format. The SSIS package should read 25-AUG-2011 just fine as a date data type, and insert it into a SMALLDATETIME field without issues.
Was the package throwing an error or something?

Resources