XML Sources Randomly Fail on Execution in SSIS - sql-server

In SSIS, I have a script task that pulls data and creates and bunch of xml documents. I have those documents read in as xml source tasks, and they all go to an OLE DB. Each time I run the entire package, one or two of the xml source tasks will fail. The failing xml source task appears to be random at first glance. Without changing anything at all, if I run the package again, some different xml source task will fail, or it may run all of them successfully. Running it a third time produces new failures or sometimes success, and so on. It seems like regenerating the XSDs for the xml source that fails temporarily fixes that task, but it always fails again after a number of runs. I usually get the same error on a given xml source task, which looks like this:
http://i.imgur.com/0GhKLBB.png
I have no idea what is causing this as I am new to SSIS, so any help is very much appreciated. Thanks.

Will it be possible in task script to output xml document as of data type dt_iu4 or DT_STR( rather than xml type). Also check what's the data type of the WSOperationMetrics in the XSD, try to map it's output data type accordingly.
Have look of this thread.

Related

Does adding simple script tasks to SSIS packages drastically reduce performance?

I am creating an SSIS package to import CSV file data into a SQL Server table.
Some of the rows in the CSV files will have missing values.
For example, if a row has the format: value1,value2,value3 and value2 is missing,
then it will render as: value1,,value3 in the csv file.
When the above happens (value2 is missing) in my SSIS package, I want NULL to go into the receiving SQL Server column that would hold value2.
I understand that I can add a "Script" task to my SSIS package to apply this rule. However, I'm concerned that this will drastically reduce the performance of my SSIS package. I'm not an expert on the inner workings of SSIS/SQL Server, but I'm concerned that this script will cause my script to lose "BULK INSERT" capabilities (and other efficiencies) since the script will have to inspect every row and apply the changes as needed.
Can anyone confirm if adding such a script will cause major performance impacts? Or does the SSIS/SQL-Server engine run the script on every row and then bulk-insert? Is there another way I can apply this rule without taking a performance hit?
Firstly, you can use script task when required. Script task will be executed only once for each execution of the whole package not for every row. For every row there is another component called script component. When the other regular SSIS tasks are not enough to achieve what you want you can surely use script component. I don't believe it is a performance killer unless you implement it badly.
Secondly, this particular requirement you can simply use Flat File Source task to import your csv file. It will put the value NULL when there is no value. I'm considering this is a valid csv value and each row has correct number of comma for every fields (total field - 1 actually) even if value is empty or null for some fields.

SSIS redirect empty rows as flat file source read errors

I'm struggling to find a built-in way to redirect empty rows as flat file source read errors in SSIS (without resorting to a custom script task).
as an example, you could have a source file with an empty row in the middle of it:
DATE,CURRENCY_NAME
2017-13-04,"US Dollar"
2017-11-04,"Pound Sterling"
2017-11-04,"Aus Dollar"
and your column types defined as:
DATE: database time [DT_DBTIME]
CURRENCY_NAME: string [DT_STR]
with all that, package still runs and takes the empty row all the way to destination where it, naturally fails. I was to be able to catch it early and identify as a source read failure. Is it possible w/o a script task? A simple derived column perhaps but I would prefer if this could be configured at the Connection Manager / Flat File Source level.
The only way to not rely on a script task is to define your source flat file with only one varchar(max) column, chose a delimiter that is never used within and write all the content into a SQL Server staging table. You can then clean those empty lines and parse the rest to a relational output using SQL.
This approach is not very clean and a takes lot more effort than using a script task to dump empty lines or ones not matching a pattern. It isn't that hard to create a transformation with the script component
This being said, my advise is to document a clear interface description and distribute it to all clients using your interface. Handle all files that throw an error while reading the flat file and send a mail with the file to the responsible client with information that it doesn't follow the interface rules and needs to be fixed.
Just imagine the flat file is manually generated, even worse using something like excel, you will struggle with wrong file encoding, missing columns, non ascii characters, wrong date format etc.
You will be working on handling all exceptions caused by quality issues.
Just add a Conditional Split component, and use the following expression to split rows
[DATE] == ""
And connect the default output connector to the destination
References
Conditional Split Transformation

SSIS Insert Row and Get Resulting ID Using OLE DB Command

Having seen other questions with answers that don't totally address what I am after, I am wondering how in SSIS to use an OLE DB Command transformation to do an Insert and immediately get the resulting primary key for each row inserted as a new column, all within the same Data Flow Task. That sounds like it should be a common, built-in, fairly simple thing to ask for in SSIS, right?
So the obvious first choice for me would be to use an OLE DB Command where I do a SELECT and include an OUTPUT clause in my command:
INSERT INTO dbo.MyReleaseTable(releaseDate)
OUTPUT ?=Inserted.id
VALUES (?)
Only I can't figure out how to do this in an OLE DB Command (with an output) and it not complain. I've read about using stored procedures to do this, so am I required to use a stored procedure if I want to do this?
Let's say this won't work. I could use a Script Transformation and execute direct SQL in that, right? Well if that's what I must do, then the line between using custom code and SSIS block-components gets blurred and I am tempted to throw SSIS away and just do the whole ETL in code.
Then I hear talk about using an Execute SQL task. So now I can't even do 1 data flow within 1 data flow task? Am I getting that right? I'd like to keep 1 single data flow contained within 1 data flow task and not have to break my 1 flow out between separate tasks.
If it turns out that this seemingly simple data flow objective is not built into SSIS then I will consider dumping SSIS altogether. Talend has a free ETL offering, don't they?
Well, this can be done with SSIS inside DataFlow, but with some tricks. You need to create a stored procedure with input and output parameters and reuse it in DataFlow, as described here, fetching result value.
Drawbacks of this approach:
You need to create a Stored Procedure
Each row is processed with SP, which causes implicit transactions, instead of batch processing. This can slow down your package.
Solution without performance penalty - do it in two DataFlows, first doing value insert into some temp table, and the second DF - doing SQL MERGE command at OLE DB source and handling output data as you wish. All this inside transaction, handled either by MSDTC or by your own.

Use API to read SQL Server SSIS Package and determine Data Flow Task execution sequence

In an earlier post, I figured out how to use the SQL Server API to programmatically determine the order in which SSIS tasks on the "Control Flow" tab are executed by reading the Precedence Constraints collection either via the API or reading the raw XML in the dtsx file. Now, I similarly want to programmatically figure out the order in which objects in a "Data Flow Task" run. At this point, I'd be happy just to understand how the XML represents this info.
Unlike the earlier post in which I could see in the XML how the precedence information was stored (in the Precedence Constraint collection) but was just having trouble trying to determine how to use the API to get at the info, for Data Flow task "streams" or "series of connected arrows", I am even confused at how this info is reprsented when I look at the raw DTSX file in notepad or IE.
I suspect that the info that stores the order in which the sequence that the objects within a Data Flow Task are execute is either stored in the PATHS collection. Here's a sample:
- <paths>
<path id="128" name="OLE DB Source Output" description="" startId="42" endId="117" />
<path id="129" name="Data Conversion Output" description="" startId="118" endId="48" />
</paths>
..or somehow disguised in the Lineage property, but I can't seem to relate the IDs specified in eiether in a meaningful way to the objects.
Why don't you try it by following this example and see if you can figure out of this info is represented in the XML?
Create a simple package with one Data Flow Task:
Double click on it and then add the following two sample two tasks that execute in a sequence:
Then open up the DTSX file in Notepad and determine how this sequencing/ordering info is stored in a meaningful way. Serious bonus points if you can tell me how to use the SQL Server API to get at this info.
Side bar on how to view a dtsx file as xml using IE:
Note that IE makes a good XML viewer. All I had to do was make a copy of my simple HelloWorld dtsx file, remove the optional xml version node at the top and wrap the remaining xml in a dummy tag, save the file with an xml extension and drag it to an open IE window...
startId maps to the output element with the corresponding id
EndId maps to the input element with the corresponding id
I just traversed through the xml and found that starting and end component names are present here, rt. So just parse the xml an retrieving those values, helps in a clear idea of path. I don't see any id's though.
Starting and end component names

Dynamic Columns in Flat File Destination

I am working on a generic SSIS package that receives a flat file, add new columns to it, and generate a new flat file.
The problem I have is that the number of new columns varies based on a stored procedure XML parameter. I tried to use the "Execute Process Task" to call BCP, but the XML parameter is too long for the command line.
I search on the web and found that you cannot dynamically change the SSIS package during runtime and that I would have to use a script task to generate the output. I started going trough that path and found that you still have to let the script component know how may columns will be receiving and that is exactly what I do not know at design time.
I found a third party SSIS extension from CozyRoc, but I want to do it without any extensions.
Has anyone done something like this?
Thanks!
If the number of columns is unknown at run time then you will have to do something dynamically, and that means using a script task and/or a script component.
The workflow could be:
Parse the XML to get the number of rows
Save the number of rows in a package variable
Add columns to the flat file based on the variable
This is all possible using script tasks, although if there is no data flow involved, it might be easier to do the whole thing in an external Perl script or C# program and just call that from your package.

Resources