How to Identify Problematic Data in SSIS CSV to SQL - sql-server

I have an Data Flow step in an SSIS package that simply reads data from a CSV with 180 or so columns and inserts it into a MS SQL Server table.
It works.
However, there's a CSV file with 110,000+ and it fails. In the Output window in Visual Studio there a message that says:
The data conversion for column "Address_L2" returned status value 2 and status text "The value could not be converted because of a potential loss of data.
In the Flat File Connection Manager Editor, the data type for the column is string [DT_STR] 50. TextQualified is True.
The SQL column with the same name is a varchar(100).
Anyhow, in the Flat File Source Editor I set all Truncation errors to be ignored, so I don't think this has to do with truncation.
My problem is identifying the "offending" data.
In the same Output window it says:
... SSIS.Pipeline: "Staging Table" wrote 93217 rows.
I looked at row 93218 and a few before and after (Notepad++, Excel, SQL) and nothing caught my attention.
So I went ahead and removed rows from the CSV file up to what I thought was the offending row and when I tried the process again I got the same error, but when I look at the last entry that was actually inserted into the SQL table it doesn't match the last, or close to the last rows in the CSV file.
Is it because it doesn't necessarily insert them in the same order?
In any case, how do I know what the actual issue is, especially with a file this size that you can't go through it manually?

You can simply change the length of the column in the flat file connection manager to meet the destination table specifications. Just open the flat file connection manager, go to the Advanced tab and change the column length.
Note that you can select multiple columns and change the data type and length at once

You could add an Error output to the SSIS component which is causing the error (not sure from your question whether it's the flat file source or the Staging Table destination).
Hook up the Error output to "nowhere" (I use the Konesans Trash Destination), activate a data viewer on it, and select just the problem column (along with any thing which helps you identify the row) into the data viewer. Run in Visual Studio, and you'll see which rows are failing.

Related

Error while loading pipe delimited data from flat file into a SQL Server table

I am looking at existing SSIS package.
I download the package from source control. Change connection managers to point to my local dev environment.
Task is to read from Flat file and to load into the SQL Server database.
File is a | delimited flat file. In connection manager, Header row delimiter has set as {LF}.
While executing task, I ran into several error and one of that is
curr user file [2]] Error: Data conversion failed. The data conversion for column "customfield_storeid" returned status value 4 and status text "Text was truncated or one or more characters had no match in the target code page.".
I looked into the connection manager, column has set to string[DT_STR] and 'OutputColumnWidth' to 50.
In my database table, that column is of type varchar(50). So it seems there is not anything wrong.
If I could change error output to ignore truncation error, it would not show error. But I do not want to change what has previously set for the package.
How do I resolve it?

Piping Data from CSV File to OLEDB Destination in SSIS

I have a SSIS package in which I use a ForEach Container to loop through a folder destination and pull a single .csv file.
The Container takes the file it finds and uses the file name for the ConnectionString of a Flat File Connection Manager.
Within the Container, I have a Data Flow Task to move row data from the .csv file (using the Flat File Connection Manager) into an OLEDB destination (this has another OLEDB Connection Manager it uses).
When I try to execute this container, it can grab the file name, load it into the Flat File Connection Manager, and begin to transfer row data; however, it continually errors out before moving any data - namely over two issues:
Error: 0xC02020A1 at Move Settlement File Data Into Temp Table, SettlementData_YYYYMM [1143]: Data conversion failed. The data conversion for column ""MONTHS_REMAIN"" returned status value 2 and status text "The value could not be converted because of a potential loss of data.".
Error: 0xC02020A1 at Move Settlement File Data Into Temp Table, Flat File Source [665]: Data conversion failed. The data conversion for column ""CUST_NAME"" returned status value 4 and status text "Text was truncated or one or more characters had no match in the target code page.".
In my research so far, I know that you can set what conditions to force an error-out failure and choose to ignore failures from Truncation in the Connection Manager; however, because the Flat File Connection Manager's ConnectionString is re-made each time the Container executes, it does not seem to hold on to those option settings. It also, in my experience, should be picking the largest value from the dataset when the Connection Manager chooses the OutputColumnWidth for each column, so I don't quite understand how it is truncating names there (the DB is set up as VARCHAR(255) so there's plenty of room there).
As for the failed data conversions, I also do not understand how that can happen when the column referenced is using simple Int values, and both the Connection Manager AND the receiving DB are using floats, which should encompass the Int data (am I unaware that you cannot convert Int into Float?).
It's been my experience that some .csv files don't play well in SSIS when going directly into a DB destination; so, would it be better to transform the .csv into a .xlsx file, which plays much nicer going into a DB, or is there something else I am missing to easily move massive amounts of data from a .csv file into a DB - OR, am I just being stupid and turning a trivial matter into something bigger than it is?
Note: The reason I am dynamically setting the file in the Flat File Connection Manager is that the .csv file will have a set name appended with the month/year it was produced as part of a repeating process, and so I use the constant part of the name to grab it regardless of the date info
EDIT:
Here is a screen cap of my Flat File Connection Manager previewing some of the data that it will try to pipe through. I noticed some of these rows have quotes around them, and wanted to make sure that wouldn't affect anything adversely - the column having issues is the MONTHS_REMAIN one
Is it possible that one of the csv files in the suite you are processing is malformed? For instance, if one of the files had an extra column/comma, then that could force a varchar column into an integer column, producing error similar to the ones you have described. Have you tried using error row redirection to confirm that all of your csv files are formed correctly?
To use error row redirection, update your Flat File Source and adjust the Error Output settings to redirect rows. Your Flat File Source component will now have an extra red arrow which you can connect to a destination. Drag the red arrow from your source component to a new conditional split. Next, right-click the red line and add dataviewer. Now, when error rows are processed, they will flow over the red line into the data viewer so you can examine them. Last, Execute the package and wait for the dataviewer to capture the errant rows for examination.
Do the data values captured by the data viewer look correct? Good luck!

SSIS Flat File Import to SQL Server - All Files Flowing through Error Path

I am importing a .csv file to sql via SSIS. All is going ok although the file that we are trying to import is not what I would call "clean". I can't do anything about that as it is from another company. I have therefore created an error table in SQL as well as the table for the import.
Basically all of the files (over 14000) flow into the error table. I have opened the .csv file in Excel and although some of the rows have blanks spaces instead of NULLs, there are some that I thought would have gone through.
looking at the error codes I get the following:
In my data flow there is a fLat file Source with error out put going to ole db destination
The data flow from the flat file source goes to a derived column to add a column with the System Start Date and then on to a ole db destination.
Nothing too fancy I though
ErrorCode - -1071607675
ErrorColumn - 211
All columns in the sql table where the rows are supposed to be imported are (varchar(1500), null)
Looking at the import file iself with SSIS the DataType for the columns are string[DT_STR] with output column width of 150 which I though would be ok.
The plan is to import the majority of the data and then have a data steward look at the erros to import manually or something similar from the SQL2012 3 column error output which I must say is pretty poor really although that is a different matter.
I can't figure out why not even one of these rows will import in to the correct sql table?
errcode equates to error C0209085 which is The data was truncated. Unless you're handling truncation then this is a fatal message. Either ignore truncation or debug run with a data viewer to identify truncating entries.
This was a date time issue. Manipulation of the .csv file before import resolved. Date changed from YYYYMMDD to YYYY-MM-DD.
Many thanks.

How do I format dd-mmm-yy values in flat file to smalldatetime during data import?

I have a flat file which is imported into SQL Server via an existing SSIS package. I need to make a change to the package to accommodate a new field in the flat file. The new field is a date field which is in the format dd-mmm-yy (e.g. 25-AUG-11). The date field in the flat file will either be empty (e.g. a space/whitespace) or populated with a date. I don’t have any control over the date format in the flat file.
I need to import the date field in the flat file into an existing SQL Server table and the target field data type is smalldatetime.
I was proposing to import the date as a string into a load table and then convert to smalldatetime when taking the data from the load table. But is there another possible way to parse the date format dd-mmm-yy to load this straight into a smalldatetime field without having to use convert to smalldatetime from the load table. I can’t quite think how to parse the date format, particularly the month. Any suggestions welcome.
Here is an example that might give you an idea of what you can do. Ideally, in an SSIS package or in any ETL job, you should take into account that data may not be exactly what you would like it to be. You need to take appropriate steps to handle the incorrect or invalid data that might pop up now and then. That's why SSIS comes up with lots of Transformation tasks within Data Flow Task which you can make use of to clean up the data.
In your case, you can make use of Derived Column transformation or Data conversion transformation to achieve your requirements.
The example was created in SSIS 2008 R2. It shows how to read a flat file containing the dates and load into an SQL table.
I created a simple SQL table to import the flat file data.
On the SSIS package, I have a connection manager to SQL and one for Flat file. Flat file connection is configured as shown below.
On the SSIS package, I placed a Data Flow Task on the Control Flow tab. Inside, the Data Flow task, I have a Flat File Source, Derived Column transformation and an OLE DB Destination. Since the Flat file source and OLE DB destination are straightforward, I will leave those out here. The Derived transformation creates a new column with the expression (DT_DBDATE)SmallDate. Note that you can also use Data Conversion transformation to do the same. This new column SmallDateTimeValue should be mapped to the database column in OLE DB Destination.
If you execute this package, it will fail because not all the values in the file are valid.
The reason why it fails in your case is because the invalid data is directly inserted into the table. In your case, the table will throw an exception making the package to fail. In this example, the package fails because the default setting on the Derived column transformation is to fail the component if there is any error. So, let's place a dummy transformation to redirect the error rows. We will Multicast transformation for this purpose. It won't really do anything. Ideally, you should redirect the error rows to another table using OLE DB Destination or other Destination component of your choice so you can analyze the data that causes the errors.
Drag the red arrow from Derived transformation and connect it to the Multicast transformation. This will popup the Configure Error Output dialog. Change the values under the column Error and Truncation from Fail component to Redirect row. This will redirect any error rows to the Multicast transformation and will not get into the tables.
Now, if we execute the package, it will run successfully. Note the number of rows displayed in each direction.
Here is the data that got into the table. Only 2 rows were valid. You can look at the first screenshot that showed the data in the file and you can see only 2 rows were valid.
Hope that gives you an idea to implement your requirement in the SSIS package.
It should load straight into a SMALLDATETIME field as it is. Remember, dates are just numbers in SQL Server, which are presented to the user in the desired date/time format. The SSIS package should read 25-AUG-2011 just fine as a date data type, and insert it into a SMALLDATETIME field without issues.
Was the package throwing an error or something?

SSIS: Capture Truncation Warning from Flat File Source with "Ignore Failure" Enabled

I have a 2005 SQL Server Integration Services (SSIS) package that is loading delimited flat files into some tables. A very small percentage of records have a text field that is longer than the file format specification says it can be. Rather than try to play an ongoing game of "guess the real maximum length", the customer has requested I just truncate anything over the size in the spec.
I have set the Trunctation event to "Ignore Failure" in the Flat File Source Editor, and that takes care of my extra data. However, it seems to be a completely silent truncation (it does not write any warning to the log). I am concerned that if there is ever a question about what data has been truncated, I have no way to identify it.
What is a simple way to log the fact the truncation happened?
It would be enough to identify that the file had a truncated row in it, but if I could also specify the actual row that would be great. Whether it is captured as part of the built in package logging or I have to make a special call makes no difference to me.
Before you do the actual insert have a conditional split task that takes the records longer than the actual field length and puts them into a logging table. Then you can truncate the data and rejoin them to the orginal path using a Merge or Merge Join transformation.
You can do the truncation yourself as part of the data flow. Set the flat file column width to a value that is very big (larger than any expected values). You can use a conditional split to identify rows that violate the length.
In the data flow path for invalid rows, you can record the information to your log. Then, you can convert the values to the valid length and merge them back with the valid rows. And, finally add the rows to the destination.

Resources