Using SQL 2008 R2, I've created an SSIS Package that rips through a flat file and imports them into a SQL table.
If any record in the data file does not contain all the required fields, that record should be skipped in the import process. All skipped records should be emailed to me when the package completes.
Here's the data file structure:
123|ABC|Y|Y
784
456|DEF|Y|Y
789|GHI|Y|N
812||Y|N
...
So, in this scenario, I would want the 1st, 3rd, and 4th record to be imported, and the 2nd and 5th record to be skipped and emailed.
I tried testing this out as is, and since it looks for a pipe delimiter, it reads the second line together with the third as:
784456|DEF|Y|Y
I'm about 3 days old working with SSIS, so if someone can assist me in accomplishing this task, I'd be grateful.
How big are the files? One way, is to use a staging table. NOT a temporary table.. The staging table is a physical table that retains its existence in the database. You dump all records there, then insert the good data into the production/main table, then export the bad rows into a file which you can append to the sendmail task..
(then you can truncate the staging table for the next interval/run/loop/file)
Another way would be to use conditional splits, and then set each row to a variable which then has a format applied to it, appending a delimiter other than a pipe, then into the export file.
Since it's merging the second line with the third, it sounds like either the row delimiter is incorrect on line 3 or it's not set correctly in the connection manager. I'd take a look at the file in Notepad ++ (or a text editor that will expose hidden characters like Cr and Lf) and verify that the row delimiter is consistent for each row and that it matches what's been set in the connection manager.
Once the row delimiter issue is straightened out, you can separate the erroneous records with a conditional split. Under condition, type [YourColumnName] == "" and under Output name, type Error. Name the default output name "Correct". Now map the "Correct" output to your table and map the "Error" output to a flat file, script component, table, or whatever format you want the errors to go to.
Related
We receive a flat file that is delimited from our third-party client.
Row Delimiter = LF;
Column Delimiter = Tab
The file has 8 columns.
The delimited formatting in the file is correct for the most part except for three records where the 6th column splits and the record continues into the second row. There are two tab column delimiters after the column breaks into the second row.
We use SSIS to insert the records from the file into our DB and the ETL breaks because of this inconsistent formatting.
We had to manually tweak the column so that the job runs successfully.
Is there a way to correct the formatting issue in SSIS? I need help with writing a parser to correct these abnormal records in the file before inserting them.
Normal Row:
Problematic rows:
To fix the file structure, you should read each row as one large column DT_STR (4000). Then you should use two script components: the first one to fix the erroneous rows, and the second to split each row into separate columns before inserting the data into the destination database.
You can check my answer on the following question for a step-by-step guide: SSIS reading LF as terminator when its set as CRLF
Currently I receive a daily file of around 750k rows and each row has a 3 character identifier at the start.
For each identifier, the number of columns can change but are specific to the identifier (e.g. SRH will always have 6 columns, AAA will always have 10 and so on).
I would like to be able to automate this file into an SQL table through SSIS.
This solution is currently built in MSACCESS using VBA just looping through recordsets using a CASE statement, it then writes a record to the relevant table.
I have been reading up on BULK INSERT, BCP (w/Format File) and Conditional Split in SSIS however I always seem to get stuck at the first hurdle of even loading the file in as SSIS errors due to variable column layouts.
The data file is pipe delimited and looks similar to the below.
AAA|20180910|POOL|OPER|X|C
SRH|TRANS|TAB|BARKING|FORM|C|1.026
BHP|1
*BPI|10|16|18|Z
BHP|2
*BPI|18|21|24|A
(* I have added the * to show that these are child records of the parent record, in this case BHP can have multiple BPI records underneath it)
I would like to be able to load the TXT file into a staging table, and then I can write the TSQL to loop through the records and parse them to their relevant tables (AAA - tblAAA, SRH - tblSRH...)
I think you should read each row as one column of type DT_WSTR and length = 4000 then you need to implement the same logic written using vba within a Script component (VB.NET / C#), there are similar posts that can give you some insights:
SSIS ragged file not recognized CRLF
SSIS reading LF as terminator when its set as CRLF
How to load mixed record type fixed width file? And also file contain two header
SSIS Flat File - CSV formatting not working for multi-line fileds
how to skip a bad row in ssis flat file source
I wrote an SSIS package which imports data from a fixed record length flat file into a SQL table. Within a single file, the record length is constant, but different files may have different record lengths. Each record ends with a CR/LF. How can I make it detect where the end of the record is, and use that length when importing it?
You can use a script task. Pass a ReadWriteVariable into the script task. Let's call the ReadWriteVariable intLineLength. In the script task code, detect the location of the CR/LF and write it to intLineLength. Use the intLineLength ReadWriteVariable in following package steps to import the data.
Here is an article with some good examples: script-task-to-dynamically-build-package-variables
Hope this helps.
This may not work for everyone, but what I ended up doing was simply setting it to a delimited flat file and setting CR/LF as the row delimiter, and leaving the column delimiter and text qualifier blank. This won't work if you actually need to have it split out columns in the import, but I was already using a Derived Column task to do the actual column splitting, because my column positions are variable, so it worked fine.
I have a CSV file where there is a header row and data rows in the same file.
I want to get information from both rows during the same load.
What is the easiest way to do this?
i.e File Example - Import.CSV
2,11-Jul-2011
Mr,Bob,Smith,1-Jan-1984
Ms,Jane,Doe,23-Apr-1981
In the first row, there a a count of the number of rows and the date of transmission.
In the second and subsequent rows is the actual data, in this Title, FirstName, LastName, Birthdate
SQL Server Integration Services Conditional Split Transformation should do it.
I wonder what would You do with that info in the pipeline. However, there is only one solution to read it in one pass (take a look at notes/limitations at the end):
Create a data flow
Put File source component and set it the way You want
Add script task to count the number of rows
Put conditional split transformation where condition is mycounter=0
One path from condition split will be the first row of file (mycounter=0) and the other path will be the rest of the rows (2 in your example).
Note#1: file source can set only one metadata for each column in the source. This means that if your first column of data is string (Mr, Ms, ...) then You have to set it as string data type in the source. Otherwise, if You set it as integer (DT_Ix) it
will fail as soon as it encounters row with string data (Mr, Ms, ...) in the first column of file. This applies to all columns, not just the first one.
Note #2: SSIS will see only the number of columns You told it to. This means that You have to have the same number of columns in EACH row. Otherwise, You have ragged csv file and You need to take another approach - search the Internet. But those solutions also require different layout of csv.
Answers in the following links explain how to load parent-child data from a flat file into an SQL Server database when both parent and child rows exist in the same file next to each other.
How do I split flat file data and load into parent-child tables in database?
How to load a flat file with header and detail data into a database using SSIS package?
SSIS does 2 things in relation to handling flat files which are particularly frustrating, and it seems there should be a way around them, but I can't figure it out. If you define a flat file with 10 columns, tab delimited with CRLF as the end of row marker this will work perfectly for files where there are exactly 10 columns in every row. The 2 painful scenarios are these:
If someone supplies a file with an 11th column anywhere, it would be nice if SSIS simply ignored it, since you haven't defined it. It should just read the 10 columns you have defined then skip to the end of row marker, but what is does instead is concatenate any additional data with the data in the 10th column and bung all that into the 10th column. Kind of useless really. I realise this happens because the delimiter for the 10th column is not tab like all the others, but CRLF, so it just grabs everything up to the CRLF, replacing extra tabs with nothing as it does so. This is not smart, in my opinion.
If someone supplies a file with only 9 columns something even worse happens. It will temporarily disregard the CRLF it has unexpectedly found and pad any missing columns with columns from the start of the next row! Not smart is an understatement here. Who would EVER want that to happen? The remainder of the file is garbage at that point.
It doesn't seem unreasonable to have variations in file width for whatever reason (of course only variations at the end of a row can reaonably be handled (x fewer or extra columns) but it looks like this is simply not handled well, unless I'm missing something.
So far our only solution to this is to load a row as one giant column (column0) and then use a script task to dynamically split it using however many delimiters it finds. This works well, except that it limits row widths to 4000 chars (the max width of one unicode column). If you need to import a wider row (say with multiple 4000 wide columns for text import) then you need to define multiple columns as above, but you are then stuck with requiring a strict number of columns per row.
Is there any way around these limitations?
Glenn, i feel your pain :)
SSIS cannot make the columns dynamic, as it needs to store metadata of each column as it come through, and since we're working with flat files which can contain any kind of data, it can't assume that the CRLF in a 'column-that-is-not-that-last-column', is indeed the end of the data line its supposed to read.
Unlike DTS in SQL2000, you can't change the properties of a SSIS package at runtime.
What you could do is create a parent package, that reads the flat file (script task), and only reads the first line of the flat file to get the number of columns, and the column names. This info can be stored in a variable.
Then, the parent package loads the child package (script task again) programmatically, and updates the metadata of the Source Connection of the child package. This is where you would either
1. Add / remove columns to match the flat file.
2. Set the column delimiter for the columns, the last column has to be the CRLF - matching the ROW delimiter
3. Reinitialise the metadata (ComponentMetadata.ReinitializeMetadata()) of the Source Compoenent in the Dataflow task (to recognize the recent changes in the Source Connection).
4. Save the child ssis package.
Details on programmatically modifying a package is readily available only.
Then, your parent package just executes the Child package (Execute Package Task), and it'll execute with your new mappings.