Dynamic Columns in Flat File Destination - sql-server

I am working on a generic SSIS package that receives a flat file, add new columns to it, and generate a new flat file.
The problem I have is that the number of new columns varies based on a stored procedure XML parameter. I tried to use the "Execute Process Task" to call BCP, but the XML parameter is too long for the command line.
I search on the web and found that you cannot dynamically change the SSIS package during runtime and that I would have to use a script task to generate the output. I started going trough that path and found that you still have to let the script component know how may columns will be receiving and that is exactly what I do not know at design time.
I found a third party SSIS extension from CozyRoc, but I want to do it without any extensions.
Has anyone done something like this?
Thanks!

If the number of columns is unknown at run time then you will have to do something dynamically, and that means using a script task and/or a script component.
The workflow could be:
Parse the XML to get the number of rows
Save the number of rows in a package variable
Add columns to the flat file based on the variable
This is all possible using script tasks, although if there is no data flow involved, it might be easier to do the whole thing in an external Perl script or C# program and just call that from your package.

Related

Does adding simple script tasks to SSIS packages drastically reduce performance?

I am creating an SSIS package to import CSV file data into a SQL Server table.
Some of the rows in the CSV files will have missing values.
For example, if a row has the format: value1,value2,value3 and value2 is missing,
then it will render as: value1,,value3 in the csv file.
When the above happens (value2 is missing) in my SSIS package, I want NULL to go into the receiving SQL Server column that would hold value2.
I understand that I can add a "Script" task to my SSIS package to apply this rule. However, I'm concerned that this will drastically reduce the performance of my SSIS package. I'm not an expert on the inner workings of SSIS/SQL Server, but I'm concerned that this script will cause my script to lose "BULK INSERT" capabilities (and other efficiencies) since the script will have to inspect every row and apply the changes as needed.
Can anyone confirm if adding such a script will cause major performance impacts? Or does the SSIS/SQL-Server engine run the script on every row and then bulk-insert? Is there another way I can apply this rule without taking a performance hit?
Firstly, you can use script task when required. Script task will be executed only once for each execution of the whole package not for every row. For every row there is another component called script component. When the other regular SSIS tasks are not enough to achieve what you want you can surely use script component. I don't believe it is a performance killer unless you implement it badly.
Secondly, this particular requirement you can simply use Flat File Source task to import your csv file. It will put the value NULL when there is no value. I'm considering this is a valid csv value and each row has correct number of comma for every fields (total field - 1 actually) even if value is empty or null for some fields.

SSIS redirect empty rows as flat file source read errors

I'm struggling to find a built-in way to redirect empty rows as flat file source read errors in SSIS (without resorting to a custom script task).
as an example, you could have a source file with an empty row in the middle of it:
DATE,CURRENCY_NAME
2017-13-04,"US Dollar"
2017-11-04,"Pound Sterling"
2017-11-04,"Aus Dollar"
and your column types defined as:
DATE: database time [DT_DBTIME]
CURRENCY_NAME: string [DT_STR]
with all that, package still runs and takes the empty row all the way to destination where it, naturally fails. I was to be able to catch it early and identify as a source read failure. Is it possible w/o a script task? A simple derived column perhaps but I would prefer if this could be configured at the Connection Manager / Flat File Source level.
The only way to not rely on a script task is to define your source flat file with only one varchar(max) column, chose a delimiter that is never used within and write all the content into a SQL Server staging table. You can then clean those empty lines and parse the rest to a relational output using SQL.
This approach is not very clean and a takes lot more effort than using a script task to dump empty lines or ones not matching a pattern. It isn't that hard to create a transformation with the script component
This being said, my advise is to document a clear interface description and distribute it to all clients using your interface. Handle all files that throw an error while reading the flat file and send a mail with the file to the responsible client with information that it doesn't follow the interface rules and needs to be fixed.
Just imagine the flat file is manually generated, even worse using something like excel, you will struggle with wrong file encoding, missing columns, non ascii characters, wrong date format etc.
You will be working on handling all exceptions caused by quality issues.
Just add a Conditional Split component, and use the following expression to split rows
[DATE] == ""
And connect the default output connector to the destination
References
Conditional Split Transformation

SQl Server SSIS Create A Dataset From An Array?

I have a text file that holds a list of filenames held in a windows share.
My SSIS package runs a single .BAT file in an SSIS ‘Execute Task’ that will update all or some of the files in the share. I want to be able to identify if any of the files have not been updated by the BAT file.
To do that I need to compare the modified time from the files before the BAT file was executed and the modified times from after the BAT file is run. I therefore need a list/array/dataset in my SSIS package of the files and their modified times before the BAT files is executed. The list can then be used in a For Each loop to check the modified time has changed after the BAT file has run.
The problem is how to represent this list in my SSIS package.
If I do it as a ‘Script Task’ generated array I will have to use two arrays.
One array would be the filename and one would be the modified time.
I think I can reference them both in a loop but it feels a bit poor to do it that way.
What I really need is the list to be a dataset with two columns or an array with two columns.
Is there a way of doing this or am I missing something silly? I know an ‘Execute SQL Task’ can create a dataset but how can I create a dataset in an ‘Execute SQL Task’ from an array?
You can store datasets in an Object Variable in the package, and they will persist throughout the run of the package.
You can create the object variable in a script task the same way you would create a DataTable in any .net code.

SQL Server 2008 - Save each "while loop" result to a different file

I'm using a double while loop to get a lot of results from several different tables. I get everything I need (500+ subjects, each with 1000+ rows), but each comes into a different grid. I would like to save each "while" result to a different .csv file. Is there any way
Might be possible to do using SQLCMD or BCP, but will be quite cumbersome to code, using quite a few variables and dynamic SQL.
If faced with this scenario, I would personally go with an SSIS package -
Use package variables to generate destination filenames dynamically
Use a for each loop container instead of the while loop
Put a dataflow task inside the container and use the Select code as source and the file from step one as the destination
It is pretty easy to do.

XML Sources Randomly Fail on Execution in SSIS

In SSIS, I have a script task that pulls data and creates and bunch of xml documents. I have those documents read in as xml source tasks, and they all go to an OLE DB. Each time I run the entire package, one or two of the xml source tasks will fail. The failing xml source task appears to be random at first glance. Without changing anything at all, if I run the package again, some different xml source task will fail, or it may run all of them successfully. Running it a third time produces new failures or sometimes success, and so on. It seems like regenerating the XSDs for the xml source that fails temporarily fixes that task, but it always fails again after a number of runs. I usually get the same error on a given xml source task, which looks like this:
http://i.imgur.com/0GhKLBB.png
I have no idea what is causing this as I am new to SSIS, so any help is very much appreciated. Thanks.
Will it be possible in task script to output xml document as of data type dt_iu4 or DT_STR( rather than xml type). Also check what's the data type of the WSOperationMetrics in the XSD, try to map it's output data type accordingly.
Have look of this thread.

Resources