How to add one more path in SSIS data flow - sql-server

Hi,
I try to use same data twice in SSIS data flow panel, however, it only allow me to build one path, is there anyway I can build another path of it or I can duplicate the data I want to use?
Thanks,

You're looking for multi cast transformation.
Connect the above 'CONVERT DATA TYPE2' TO 'MULICAST TRANSFORMATION'.
From multicast you can take anynumber of outtflows.

There are 2 ways to add a path, it depends on your requirements:
Multicast transformation
The Multicast transformation distributes its input to one or more outputs. This transformation is similar to the Conditional Split transformation. Both transformations direct an input to multiple outputs. The difference between the two is that the Multicast transformation directs every row to every output, and the Conditional Split directs a row to a single output
Script Component multiple outputs
If you are looking to create many distinct path based on the script component code, then script component allow creating many outputs. (check the link above for more details)

Option 1
The best and most SSIS way of doing this is by using the Multicast component. Connect it to the output path of your Script Transformation "Convert data type 2" and from there, you can connect it to both "Sort 1" and "Sort 3"
Option 2
If your Script Transformation is asynchronous (1 row in to many rows out, many rows in to 1 out, etc) then you could add a second output and also send the data along. That answer is only provided for completeness. Doing this would cause the amount of data required for a row in your pipeline to double (the Multicast component does some pointer reference voodoo to not physically duplicate the data)
Finally, I'm not sure what business problem you're solving but if performance is an issue, it'll be the package design and not SSIS itself. Without knowing more (aka a difference quest

Related

How to modify the projection of a dataset in a ADF Dataflow

I want to optimize my dataflow reading just data I really need.
I created a dataset that maps a view on my database. This dataset is used by different dataflow so I need a generic projection.
Now I am creating a new dataflow and I want to read just a subset of the dataset.
Here how I created the dataset:
And that is the generic projection:
Here how I created the data flow. That is the source settings:
But now I want just a subset of my dataset:
It works but I think I am doing wrong:
I wanto to read data from my dataset (as you can see from source settings tab), but when I modify the projection I read from the underlying table (as you can see from source option). It seems an inconsistence. Which is the correct way to manage this kind of customization?
Thank you
EDIT
The solution proposed does not solve my problem. If I go in monitor and I analyze the exections that is what I saw...
Before I had applyed the solution proposed and with the solution I wrote above I got this:
As you can see I had read just 8 columns from database.
With the solution proposed, I get this:
And just then:
Just to be clear, the purpose of my question is:
How ca n I read only the data I really need instead of read all data and filter them in second moment?
I found a way (explained in my question) but there is an inconsistency with the configuration of the dataflow (I set a dataflow as input but in the option I write a query that read from db).
First import data as a Source.
You can use Select transformation in DataFlow activity to select CustomerID from imported dataset.
Here you can remove unwanted columns.
Refer - https://learn.microsoft.com/en-us/azure/data-factory/data-flow-select

How to read and change series numbers in columns SSIS?

I'm trying to manipulate a column in SSIS which looks like below after i removed unwanted rows with derived column and conditional split in my data flow task. The source for this is a flatfile.
XXX008001161022061116030S1TVCO3057
XXX008002161022061146015S1PUAG1523
XXX009001161022063116030S1DVLD3002
XXX009002161022063146030S1TVCO3057
XXX009003161022063216015S1PUAG1523
XXX010001161022065059030S1MVMA3020
XXX010002161022065129030S1TVCO3057
XXX01000316102206515901551PPE01504
The first three numbers from the left (starting with "008" first row) represent a series, and the next three ("001") represent another number within the series. what i need is to change all of the first three numbers starting from "001" to the end.
The desired reslut would thus look like:
XXX001001161022061116030S1TVCO3057
XXX001002161022061146015S1PUAG1523
XXX002001161022063116030S1DVLD3002
XXX002002161022063146030S1TVCO3057
XXX002003161022063216015S1PUAG1523
XXX003001161022065059030S1MVMA3020
XXX003002161022065129030S1TVCO3057
XXX00300316102206515901551PPE01504
...
My potential solution would be to load the file to a temporary database table and query it with SQL from there, but i am trying to avoid this.
The final destination is a flatfile.
Does anybody have any ideas how to pull this off in SSIS? Other solutions are appreciated also.
Thanks in advance
I would definitely use the staging table approach and use windows functions to accomplish this. I could see a use case if SSIS was on another machine than the database engine and there was a need to offload the processing to the SSIS box.
In that case I would create a script transformation. You can process each row and make the necessary changes before passing the row to the output. You can use C# or VB.
There are many examples out there. Here is MSDN article - https://msdn.microsoft.com/en-us/library/ms136114.aspx

SSIS common destination, multiple file inputs, different structures

I'm sure this question is a common one, but I'm having a real challenging time coming up with a modular design for what should not be an impossible task. I have a situation where I have common destination tables, about five or six of them, but multiple input files which need to be massaged into a certain format for insertion. I've been tasked with making the design modular so as we work with new data providers with different formats, the pieces of the package that handle the insertion don't change, nor the error reporting, etc., just the input side. I've suggested using a common file format which would mean taking the source files and then transforming them and running the rest of the common import process on them. It was suggested that I consider using tables for this process on the input side.
I guess what strikes me about this process is the fact that the package can be saved as a template and I can use the common pieces over and over and set up new connections as we work with other data providers. Outside of that, I could see resorting to custom code in a script task to ensure a common format to be inserted into common input tables, but that's as far as I've gotten.
If anyone has ever dealt with a situation as such, I would appreciate design suggestions to accommodate functionality for now and in the future.
Update: I think the layered architectural design that is being emphasized in this particular instance would be as such (which is why I find it confusing):
There would be six layers. They are as follows:
A. File acquisition
B. File Preparation
C. Data Translation to common file format (in XML)
D. Transformation of data to destination format (XML - preparation for insertion into database)
E. Insert into database
F. Post processing (reporting and output of erred out
Since we will be dealing with several different data providers, the steps would be the same for processing the data, but the individual steps themselves may differ between providers, if that makes sense. Example: We may get data from provider A, but we would receive files from them and they are zipped CSV files. Provider B's would be in XML, uncompressed. One provider might send files to us and we may have to go pick files up (This would take place in the file acquisition step above).
So my question are:
A. Is it more important to follow the architectural pattern here or to combine things were possible? It was a possible suggestion to combine all the connection items in a single package as the top layer, so therefore a single package would handle things like making a service call, SFTP, FTP, and anything else that was needed. I'm not sure quite how one would do multiple connections for different providers when a schedule is needed. It just seems to complicate things...I'm thinking connection layer, but have it be specific to the provider, not a be all end all.
B.Same thing with the file preparation layer.
I think a modular design is key, but stacking things into control tasks seems to make things more complicated in design than they should be. Any feedback or suggestions would be helpful in this area.
I would do what was suggested in another comment and import each file into the appropriate temp table first and then union them all later. This will give you more modularity and make adding or removing input files easier and make debugging easier because you can easily see which section failed. Here is an outline of what I would do:
Step 1 SQL Task:
Create the temp table (repeat as needed for each task with a unique table):
IF EXISTS (SELECT * FROM sys.objects
WHERE object_id = OBJECT_ID(N'[dbo].[tbl_xxxxxx_temp]')
AND type in (N'U'))
DROP TABLE [dbo].[tbl_xxxxxx_temp]
GO
CREATE TABLE [dbo].[tbl_xxxxxx_temp](
(columns go here)
) ON [PRIMARY]
GO
Step 2: Data Flow Task
Create a Data Flow Task and import each file into their unique temp table which you created above.
Step 3: Data Flow Task
Create a second DFT and connect each temp table to a union all Data Flow Transformation (convert, or derive columns as needed) and then connect the output to your static data base table.
Step 4: SQL Task: Drop temp tables
DROP TABLE tbl_xxxxxx_temp
Please note it is necessary to set "DelayValidation" to True in each Data Flow Task in order for this to work.

SSIS: Adding multiple Derived Columns without using the gui?

I have about 500 fixed width columns in a flat file that I want to apply the same logic to to replace an empty column with null before it goes into the database.
I know the command to replace the empty string with null but I really don't want to have to use the gui to input that command for every column.
So is there a tool out there that can do this all on the back end?
You could look at something like the EzAPI to create your data flow. This this answer, I have an example of how one creates a EzDerivedColumn and sets the formula within it.
Automatically mapping columns with EZApi with OLEDBSource
If you can install third party components, I've seen a number of implementations of a Trim-To-Null functionality on codeplex.com
BIML might be an option to generate your package as well. I'd need to play with that to figure the syntax though.
My googlefu worked a little better after lunch.
I as able to modify about the 5th comment down on http://social.msdn.microsoft.com/Forums/sqlserver/en-US/222e70f5-0a21-4bb8-a3fc-3f365d9c701f/ssis-custom-component-derivedcolumn-programmatically-problems?forum=sqlintegrationservices to work for my needs.
My c# code will now loop through all the input columns from a "Flat File Source" object and add a derived column for each.

Temporary storage for cleaned data in Integration Services

I have an Excel file that I need to process three times in integration services, once for projects, once for persons and once for time tracking data.
At each step I have the excel source and I do need to do some data clean up and type conversions (same in all three steps).
Is there an easy way of creating a step that does all this and that allows me to use the output as input to the other "real" steps?
I am starting to think about importing it into SQL server in a temp table, which is by all means ok, but it would be nice if I could skip that step.
This can actually be achieved using a single data flow.
You can read the Excel data source once and then use Multicast Transformation to create copies of the data set in memory. You can then process each of your three data flow branches accordingly and can also make use of parallel processing!
See the following reference for details:
http://msdn.microsoft.com/en-us/library/ms137701(SQL.90).aspx
I hope what I have detailed is clear and understandable but please feel free to contact me directly if you require further guidance.
Cheers, John
[Added in response to comments]
With regard to your further question, you can specify the precedence/flow control of your package using more than one flow. So for example, you could use the multicast task to create three data flows however and then subsequently define your precedence flow control so that all transformation tasks in flow 1 must be completed before the transformations in flow two can begin.
You could use three separate data flow tasks with a file operation task first. The File Operation would be to copy the original Excel file to a temporary area. Each of the three Data Flow tasks would start with the temp file and write to the temp file (I think they may need to write to a copy).
An issue with this is that this makes the data flows operate sequentially. This might not be an issue for your Excel file, but would be an issue for processing larger numbers of rows. In such a case, it would be better to process the three "steps" in parallel, and join the results at the final stage.

Resources