Bad Data in Excel source does not generate error in SSIS - sql-server

I have a quick question regarding SSIS. I am developing a package that performs a Data Flow task from an Excel Source into OLE DB Connection. The columns in the database should allow nulls. However there is a problem in that when I enter bad data into the numeric columns in the excel spreadsheet, it will not cause the Data Flow task to fail as I would like it to. I tried to remedy this by explicitly trying to convert any numeric columns in the Derived Column step, however the same thing occurs-- if I enter abc into the Excel numeric column, if just turns out as NULL in the db after the package runs. I do want to allow for NULLS, but I'd like the package to fail if the data is corrupt.
Any advice would be appreciated :)

I've just tried this and Ignore/Redirect/Fail setting doesn't appear to have any effect, NULLs get updated into the database regardless.
If you didn't want NULLs I would suggest that you amend the definition of your destination table to specify a NOT NULL constraint on the columns you wish to be numeric. That way the database update and the package will fail.
But since you want null columns the only thing I can suggest is that you write a script task or script component to read and validate the data before accepting it.
Alternatively, read the Excel file into a staging area where all the columns are VARCHAR and then validate it via SQL

If you edit your SSIS task where you define the import you can choose the error handling for each column. There you can choose to set it to fail and stop, to ignore and go on, etc.
This links should help you to handle it on your needs:
http://sqlblog.com/blogs/rushabh_mehta/archive/2008/04/24/gracefully-handing-task-error-in-ssis-package.aspx
and
http://sqlserver360.blogspot.de/2011/03/error-handling-in-ssis.html
and
http://msdn.microsoft.com/en-us/library/ms141679.aspx

Related

Count how many fields where cleansed and which fields on SSIS

I'm doing an exercise in which I have to clean data from a Flat File Source and write it on my Database. I have already managed to clean all of the fields by using some data quality rules for each field and also generate error codes which I write to a different table when a rule is broken.
My problem is that for the final step of the exercise I have to generate some Power BI graphics in which it shows how many fields were fixed from the source and which fields where cleansed. The only thing that I have thought compares the DB table to the flat file source or maybe do something with script components but I don't really think that those are really good solutions.
Has anybody encountered this problem? if somebody could point me out for info for something like this, it would be great. Thanks!
If I am facing a similar issue, I will do this in three steps:
Importing data without any transformation to a staging table
Cleaning data and loading it into the destination table
Comparing staging and destination table to get how many values were fixed.
From design standpoint - establishing a key is central before starting to clean.
Use could use SSIS derived column transformation to create a business key that is a concatenation of available fields to create a unique key, using FindString function and string functions.
Similar to the above step add a column in your staging table or use a derived column (depending on if you are using sql cleanup or ssis tasks to cleanup) to indicate if it was cleaned or not.

SSIS - Excel data shows as scientific notations and Null Values

I have some excel data which contains scientific numbers like 5e+00.
When the see the value in excel by clicking edit button I can see the full value. But when I import the data into table I am getting the data loaded as Null. I need to import the data without doing any changes in excel. Please suggest how to do it in SSIS.
I tried imported by changing the format in excel side. I want it to be done in ssis level without doing any changes in excel
Data in my Column as
Amounts
15880
5e+19
57892
I expect the output should be like as follows
1588007
500000000019
57892
But I am getting Null value for second item
Please suggest.
In the question above, there are 2 problems:
Numbers are shown in scientific format
Data is replaced by Null values while importing
Scientific Format issue
You mentioned that:
I tried imported by changing the format in excel side. I want it to be done in SSIS level without doing any changes in excel
Unfortunately, this cannot be done without changing the Excel file, since the only way to solve this issue is to change the Number Format property of the cells. You can automate this step by adding a Script Task that uses Microsoft.Office.Interop.Excel.dll assembly to automate this process instead of doing it manually from Excel.
You can refer to the following post as an example:
Format excel destination column in ssis script task
But make sure to use:
m_XlWrkSheet.Columns(1).NumberFormat = "0"
To force a Numeric format.
Null Values issue
This issue is caused by the OLE DB provider used to read from Excel files, This error occurs when the Excel column contains mixed data types, the OLE DB provider read the values with dominant data types and replace all other values with Nulls.
You can refer to the following links for more information/workarounds:
Importing Excel Data Seems to Randomly Give Null Values
SQL JOIN on varchar with special characters and leading zeros
Dynamically Creating Excel table through SSIS

Importing Excel Data Seems to Randomly Give Null Values

Using SSIS for Visual Studio 2017 for some excel file imports.
I've created a package with several loop containers that call to specific packages to handle some files. I have an issue with one particular package being executed in that it seemingly randomly decides the data for columns is NULL per excel file. I was/am under the impression that this is part of the registry setting for TypeGuessRows (changed initially to 0 then to 1000 as a test) located at
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office\14.0\Access Connectivity Engine\Engines\Excel
The reason I think this is because the various files being brought in generally have the same data, but it seems that if the first few rows of columns in the source data contains only numbers, that the data with mixed values will not be brought in correctly. All other columns aside from this seems fine.
Looking at the source files, all have the same datatype.
I've tried changing the registry TypeGuessRows value and ensured that the output column property was string-based instead of numerical.
The connection string has IMEX=1
So I fixed it. Or at least found a sufficient workaround that should help anyone in my situation. I think it has to do with the cache of SSIS.
I ended up putting a sort function on the problem column so the records getting read as NULL for having a random data type are read first, and not being considered random. I will say, I tried this initially and it didn't work.
Through a little experiment of making a new data flow in the same package I discovered that this solution actually does work, hence me thinking the cache was the issue.
If anyone has any further questions on this, let me know.
This issue is related to the OLEDB provider used to read excel files: Since excel is not a database where each column has a specific data type, OLEDB provider tries to identify the dominant data types found in each column and replace all other data types that cannot be parsed with NULLs.
There are many articles found online discussing this issue and giving several workarounds (links listed below).
But after using SSIS for years, i can say that best practice is to convert excel files to csv files and read them using Flat File components.
Or, if you don't have the choice to convert excel to flat files then you can force excel connection manager to ignore headers from the first row bu adding HDR=NO to the connection string and adding IMEX=1 to tell the OLEDB provider to specify data types from the first row (which is the header - all string most of the time), in this case all columns are imported as string and no values are replaced with NULLs but you will lose the headers and a additional row (header row is imported).
If you cannot ignore the header row, just add a dummy row that contains dummy string values (example: aaa) after the header row and add IMEX=1 to the connection string.
Helpful links
SSIS Excel Data Import - Mixed data type in Rows
Mixed data types in Excel column
Importing data from Excel having Mixed Data Types in a column (SSIS)
Why SSIS always gets Excel data types wrong, and how to fix it!
EXCEL IN SSIS: FIXING THE WRONG DATA TYPES
IMEX= 1 extended properties in ssis

SSIS getting wrong column type with OLEDB connector

Halfway through a SSIS project certain table fields changed from char(30) to nvarchar(30)
However, when running the SSIS packages, an error stating cannot convert from unicode to non-unicode appears.
I am trying to transfer data directly from a database source to its destination.
Both connections use the same database schema, so there should be no conversion.
When checking the external column data type it shows D_STR, which is not the case anymore.
I tried deleting both source and destination in hope that it would clean any sort of cached data, but it did not work.
Any ideas?
Sounds to me like the metadata in your data flow task is cached and needs to be refreshed to reflect the new type.
Open the source, go to columns, and uncheck the column, then check the column. Click ok. The metadata should refresh now.
nvarchar and nchar are unicode. Conversely, varchar and char are non-unicode.
http://msdn.microsoft.com/en-us/library/ms187752.aspx
As a result if you are moving data from one data type to another you will have to perform some additional transformation (CAST or CONVERT). The other option is to look at your adapters such that char will use SSIS DataType of DT-STR and nvarchar will use SSIS DataType DT-WSTR
http://msdn.microsoft.com/en-us/library/ms141036.aspx
Without knowing how your packages work I cannot be much more specific but hopefully this will get you going.

How do I format dd-mmm-yy values in flat file to smalldatetime during data import?

I have a flat file which is imported into SQL Server via an existing SSIS package. I need to make a change to the package to accommodate a new field in the flat file. The new field is a date field which is in the format dd-mmm-yy (e.g. 25-AUG-11). The date field in the flat file will either be empty (e.g. a space/whitespace) or populated with a date. I don’t have any control over the date format in the flat file.
I need to import the date field in the flat file into an existing SQL Server table and the target field data type is smalldatetime.
I was proposing to import the date as a string into a load table and then convert to smalldatetime when taking the data from the load table. But is there another possible way to parse the date format dd-mmm-yy to load this straight into a smalldatetime field without having to use convert to smalldatetime from the load table. I can’t quite think how to parse the date format, particularly the month. Any suggestions welcome.
Here is an example that might give you an idea of what you can do. Ideally, in an SSIS package or in any ETL job, you should take into account that data may not be exactly what you would like it to be. You need to take appropriate steps to handle the incorrect or invalid data that might pop up now and then. That's why SSIS comes up with lots of Transformation tasks within Data Flow Task which you can make use of to clean up the data.
In your case, you can make use of Derived Column transformation or Data conversion transformation to achieve your requirements.
The example was created in SSIS 2008 R2. It shows how to read a flat file containing the dates and load into an SQL table.
I created a simple SQL table to import the flat file data.
On the SSIS package, I have a connection manager to SQL and one for Flat file. Flat file connection is configured as shown below.
On the SSIS package, I placed a Data Flow Task on the Control Flow tab. Inside, the Data Flow task, I have a Flat File Source, Derived Column transformation and an OLE DB Destination. Since the Flat file source and OLE DB destination are straightforward, I will leave those out here. The Derived transformation creates a new column with the expression (DT_DBDATE)SmallDate. Note that you can also use Data Conversion transformation to do the same. This new column SmallDateTimeValue should be mapped to the database column in OLE DB Destination.
If you execute this package, it will fail because not all the values in the file are valid.
The reason why it fails in your case is because the invalid data is directly inserted into the table. In your case, the table will throw an exception making the package to fail. In this example, the package fails because the default setting on the Derived column transformation is to fail the component if there is any error. So, let's place a dummy transformation to redirect the error rows. We will Multicast transformation for this purpose. It won't really do anything. Ideally, you should redirect the error rows to another table using OLE DB Destination or other Destination component of your choice so you can analyze the data that causes the errors.
Drag the red arrow from Derived transformation and connect it to the Multicast transformation. This will popup the Configure Error Output dialog. Change the values under the column Error and Truncation from Fail component to Redirect row. This will redirect any error rows to the Multicast transformation and will not get into the tables.
Now, if we execute the package, it will run successfully. Note the number of rows displayed in each direction.
Here is the data that got into the table. Only 2 rows were valid. You can look at the first screenshot that showed the data in the file and you can see only 2 rows were valid.
Hope that gives you an idea to implement your requirement in the SSIS package.
It should load straight into a SMALLDATETIME field as it is. Remember, dates are just numbers in SQL Server, which are presented to the user in the desired date/time format. The SSIS package should read 25-AUG-2011 just fine as a date data type, and insert it into a SMALLDATETIME field without issues.
Was the package throwing an error or something?

Resources