I'm trying to create an SSIS package that takes data from an XML data source and for each row inserts another row with some preset values. Any ideas? I'm thinking I could use a DataReader source to generate the preset values by doing the following:
SELECT 'foo' as 'attribute1', 'bar' as 'attribute2'
The question is, how would I insert one row of this type for every row in the XML data source?
I'm not sure if I understand the question... My assumption is that you have n number of records coming into SSIS from your data source, and you want your output to have n * 2 records.
In order to do this, you can do the following:
multicast to create multiple copies of your input data
derived column transforms to set the "preset" values on the copies
sort
merge
Am I on the right track w/ what you're trying to accomplish?
I've never tried it, but it looks like you might be able to use a Derived Column transformation to do it: set the expression for attribute1 to "foo" and the expression for attribute2 to "bar".
You'd then transform the original data source, then only use the derived columns in your destination. If you still need the original source, you can Multicast it to create a duplicate.
At least I think this will work, based on the documentation. YMMV.
I would probably switch to using a Script Task and place your logic in there. You may still be able leverage the File Reading and other objects in SSIS to save some code.
Related
In my SSIS Data Flow I create a derived column (DT_WSTR) based on a concatenation of two other columns. I want to save the max length in this column in a variable (with SQL it would be a MAX(LEN(COLUMN))). How is this done?
Add another Derived Column after your Derived Column that calculates the length of the computed column. Let's call it ColumnLength
LEN(COLUMN)
Now add a Multicast transformation. One path from here will go on to the "rest" of your data flow. A new path will lead to an Aggregate transformation. There, specify you want the maximum.
Now - what do you want to do with that information?
Write to a table -> OLE DB Destination
Report to the log -> Script Task that fires and information event
Use elsewhere in the package -> Recordset destination and then use a foreach loop to pop the one row, one value out it
Something else?
Sample data flow assuming you chose option 3 - recordset destination
I need to create 2 variables in my SSIS package. objRecordset of type Object and MaxColumnLength of type Int32
When the data flow finishes, all my data would have arrived in my table (represented by the Script Component) and my aggregated maximum length will flow into the recordset destination which uses my variable objRecordset
To get the value out of the ado.net recordset and into our single variable, we need to "shred the recordset" Google that term, you'll find many, many examples.
My control flow will look something like this
The ForEach (ado.net) Enumerator Loop Container is consumes every row in our dataset and we will specify that the our variable MaxColumnLength will be the 0th element of the table.
Finally, I put a sequence container in there so I can get a breakpoint to fire. We see the length of my max column variable to be 15 which matches my source query
SELECT 'a' As [COLUMN]
UNION ALL SELECT 'ZZZZZZZZZZZZZZZ'
I believe this addresses the problem you have asked.
As a data warehouse practitioner, I would encourage you to rethink your approach on the lookups. Yes, the 400 character column is going to wreak havoc on your memory so "math it out". Use the cryptological functions available to you and compute a fixed width, unique, key for that column and then you only work with that data.
SELECT
CONVERT(binary(20), HASHBYTES('SHA1', MyBusinessKeys)) AS BusHashKey
FROM
dbo.MyDimension;
Now you have 20 bytes, always and SHA1 is unlikely to generate duplicate values.
I have a source with 100+ columns.
I need to pass these through a script component transformation, which altersonly a handful of the columns.
Is there a simple way to allow any columns i dont modify to simply pass through the transformation?
Currently, i have to pass them all in, and assign them to the output with no changes.
This is a lot of work for 100 columns and id rather not have to do it if possible!
FYI:
There is no unique key, so i cannot split out the records using multicast and merge them after the script component.
You actually have to choose what columns you want included in your script component as either read only or read/write.
Anything you do not select as read/write simply passes through.
There are things you can do with a script task. Like add an output column to your current data flow or even create a separate data flow output.
In your case. you will want to select the handful of columns that you want to alter as read/write, then modify those columns in script and the rest will just pass through.
I'm trying to manipulate a column in SSIS which looks like below after i removed unwanted rows with derived column and conditional split in my data flow task. The source for this is a flatfile.
XXX008001161022061116030S1TVCO3057
XXX008002161022061146015S1PUAG1523
XXX009001161022063116030S1DVLD3002
XXX009002161022063146030S1TVCO3057
XXX009003161022063216015S1PUAG1523
XXX010001161022065059030S1MVMA3020
XXX010002161022065129030S1TVCO3057
XXX01000316102206515901551PPE01504
The first three numbers from the left (starting with "008" first row) represent a series, and the next three ("001") represent another number within the series. what i need is to change all of the first three numbers starting from "001" to the end.
The desired reslut would thus look like:
XXX001001161022061116030S1TVCO3057
XXX001002161022061146015S1PUAG1523
XXX002001161022063116030S1DVLD3002
XXX002002161022063146030S1TVCO3057
XXX002003161022063216015S1PUAG1523
XXX003001161022065059030S1MVMA3020
XXX003002161022065129030S1TVCO3057
XXX00300316102206515901551PPE01504
...
My potential solution would be to load the file to a temporary database table and query it with SQL from there, but i am trying to avoid this.
The final destination is a flatfile.
Does anybody have any ideas how to pull this off in SSIS? Other solutions are appreciated also.
Thanks in advance
I would definitely use the staging table approach and use windows functions to accomplish this. I could see a use case if SSIS was on another machine than the database engine and there was a need to offload the processing to the SSIS box.
In that case I would create a script transformation. You can process each row and make the necessary changes before passing the row to the output. You can use C# or VB.
There are many examples out there. Here is MSDN article - https://msdn.microsoft.com/en-us/library/ms136114.aspx
I am new to SSIS.
I have a number of MS access tables to transform to SQL. Some of these tables have datetime fields needed to go under some rules before sitting in respected SQL tables. I want to use Script component that deals with these kind of fields converting them to the desired values.
Since all of these fields need same modification rules, I want to apply the same code base to all of them thus avoiding the code duplication. What would be the best option for this scenario?
I know I can't use the same Script Component and direct all of those datasets outputs to it because unfortunately it doesn't support multi-inputs . So the question is is it possible to apply a set of generic data manipulation rules
on a group of different datasets' fields without repeating the rules. I can use a Script component for each ole db input and apply the same rule on them each. But it would not be an efficient way of doing that.
Any help would be highly appreciated.
SQL Server Integration Services has a specific task to suit this need, called a Data Conversion Transformation. This can be accomplished on the data source or via the task, as noted here.
You can also use the Derived Column transformation to convert data. This transformation is also simple, select an input column and then chose whether to replace this column or create a new output column. Then you apply an expression for the output column.
So why use one over the other?
The Data Conversion transformation (Pictured Below) will take an input, convert the type and provide a new output column. If you use the Derived Column transformation, you get to apply an expression to the data, which allows you to do more complex manipulations on the data.
I have a csv file containing the following columns:
Email | Date | Location
I want to pretty much throw this straight into a database table. The snag is that the Location values in the file are strings - eg: "Boston". The table I want to insert into has an integer property LocationId.
So, midway through my data flow, I need to do a database query to get the LocationId corresponding to the Location. eg:
SELECT Id as LocationId FROM Locations WHERE Name = { location string from current csv row }
and add this to my current column set as a new value "LocationId".
I don't know how to do this - I tried a lookup, but this meant I had to put the lookup in a separate data flow - the columns from my csv file don't seem to be available.
I want to use caching, as the same Locations are repeated a lot and I don't want to run selects for each row when I don't need to.
In summary:
How can I, part-way through the data flow, stick in a lookup transformation (from a different source, sql), and merge the output with the csv-derived columns?
Is lookup the wrong transformation to use?
Lookup transformation will work for you, it has caching and you can persist all input columns to the output and add columns from the query you use in lookup transformation.
You can also use merge join here, in some cases it is better solution, however it brings additional overhead because it requres sorting for its inputs.
Check this.
Right click on look up transformation -> go to show advanced editor -> go to Input and output properties.
here you can add new column or you can change data type of existing columns.
for more info how to use look up Click Here
Open the flat file connection manager, go to the Advanced tab.
Click "New" to add the new column and modify the properties.
Now go back to the Flat File Destination output, right click > Mappings > map the lookup column with the new one.