In my SSIS Data Flow I create a derived column (DT_WSTR) based on a concatenation of two other columns. I want to save the max length in this column in a variable (with SQL it would be a MAX(LEN(COLUMN))). How is this done?
Add another Derived Column after your Derived Column that calculates the length of the computed column. Let's call it ColumnLength
LEN(COLUMN)
Now add a Multicast transformation. One path from here will go on to the "rest" of your data flow. A new path will lead to an Aggregate transformation. There, specify you want the maximum.
Now - what do you want to do with that information?
Write to a table -> OLE DB Destination
Report to the log -> Script Task that fires and information event
Use elsewhere in the package -> Recordset destination and then use a foreach loop to pop the one row, one value out it
Something else?
Sample data flow assuming you chose option 3 - recordset destination
I need to create 2 variables in my SSIS package. objRecordset of type Object and MaxColumnLength of type Int32
When the data flow finishes, all my data would have arrived in my table (represented by the Script Component) and my aggregated maximum length will flow into the recordset destination which uses my variable objRecordset
To get the value out of the ado.net recordset and into our single variable, we need to "shred the recordset" Google that term, you'll find many, many examples.
My control flow will look something like this
The ForEach (ado.net) Enumerator Loop Container is consumes every row in our dataset and we will specify that the our variable MaxColumnLength will be the 0th element of the table.
Finally, I put a sequence container in there so I can get a breakpoint to fire. We see the length of my max column variable to be 15 which matches my source query
SELECT 'a' As [COLUMN]
UNION ALL SELECT 'ZZZZZZZZZZZZZZZ'
I believe this addresses the problem you have asked.
As a data warehouse practitioner, I would encourage you to rethink your approach on the lookups. Yes, the 400 character column is going to wreak havoc on your memory so "math it out". Use the cryptological functions available to you and compute a fixed width, unique, key for that column and then you only work with that data.
SELECT
CONVERT(binary(20), HASHBYTES('SHA1', MyBusinessKeys)) AS BusHashKey
FROM
dbo.MyDimension;
Now you have 20 bytes, always and SHA1 is unlikely to generate duplicate values.
Related
I have a copy activity pipeline that simply copies cells A6:C9 into sql table.
However, now I need to grab the value in a cell A3 (date) and write it as a new column in sql table.
How can I achieve that?
Do I need to use another copy activity? Or it can be done in a single one?
UPDATE: value A3:A3 in Lookup activity.
You may fetch the desired value with a Lookup activity, and then add it to the Copy Data activity as an additional column. Please see the example below.
Start by creating a range parameter on your Excel dataset, so you may provide it dynamically in the pipeline:
Make sure the Range parameter is added in the Connection tab as well.
Next, create a pipeline with a Lookup activity, followed by a Copy Data activity. The lookup activity should provide the range for the cell you want to capture (in my case, B1)
Finally, in the Copy Data activity, insert an additional column and provide it the Lookup activity output.
The expression I used is #activity('Lookup1').output.firstRow.Prop_0. This will add a column called Date with the value in B1, for every row.
I have a csv file containing the following columns:
Email | Date | Location
I want to pretty much throw this straight into a database table. The snag is that the Location values in the file are strings - eg: "Boston". The table I want to insert into has an integer property LocationId.
So, midway through my data flow, I need to do a database query to get the LocationId corresponding to the Location. eg:
SELECT Id as LocationId FROM Locations WHERE Name = { location string from current csv row }
and add this to my current column set as a new value "LocationId".
I don't know how to do this - I tried a lookup, but this meant I had to put the lookup in a separate data flow - the columns from my csv file don't seem to be available.
I want to use caching, as the same Locations are repeated a lot and I don't want to run selects for each row when I don't need to.
In summary:
How can I, part-way through the data flow, stick in a lookup transformation (from a different source, sql), and merge the output with the csv-derived columns?
Is lookup the wrong transformation to use?
Lookup transformation will work for you, it has caching and you can persist all input columns to the output and add columns from the query you use in lookup transformation.
You can also use merge join here, in some cases it is better solution, however it brings additional overhead because it requres sorting for its inputs.
Check this.
Right click on look up transformation -> go to show advanced editor -> go to Input and output properties.
here you can add new column or you can change data type of existing columns.
for more info how to use look up Click Here
Open the flat file connection manager, go to the Advanced tab.
Click "New" to add the new column and modify the properties.
Now go back to the Flat File Destination output, right click > Mappings > map the lookup column with the new one.
I have excel file with 2 sheets. What i need to do:
Read Id from the 1st sheet (just one row, one column)
Check whether there is a record in db with such id (figure 1)
If previous condition is met, then i need to read data from 2nd sheet (many rows)
Pass data from #3 and id from #1 to stored procedure and execute it.
This figure reads 2nd sheet with user data and passes to stored procedure.
But i don't understand how to combine it into one scheme to make items 1-4 work.
So ... if I think I understand your question correctly ... what you want to do is store that Excel ID value in a variable, and then add it to your data flow in part 2 as a Derived Column.
So your package should look like this:
EXECUTE SQL TASK
ConnectionType EXCEL, Connection is your Excel connection, Result Set is "Single row", SQL Statement is "select top 1 * from [SheetNameWithDataSourceIDGoesHere$]". Result Set maps result name "0" to a Variable Name (let's call it User::DataSourceID.)
DATA FLOW TASK
SOURCE: Excel sheet
DERIVED COLUMN: Call it DataSourceID, value is User::DataSourceID.
LOOKUP: Against wherever you're looking up data sources. No Match Output heads on to your Data Conversion and stored procedure.
If you want to get fancy, you could include a second Excecute SQL Task after the first one, use an expression for the SQLStatementSource to run against your database table containing the data source IDs.
Something like:
"select count(1) as DataSourceIDExists from DATASOURCELOOKUPTABLE where DataSourceID = " + (DT_WSTR,50)#User::DataSourceID
And then map that to a "Single row" ResultSet, with result name "0" mapped to User::DataSourceIDExists. Then after that task, use a precedence constraint with an expression of
(User::DataSourceIDExists == 0)
To determine if you even go down to the Data Flow Task to load Excel data. Since it sounds like you're not a full-on SSIS expert just yet, this might serve as an excellent learning opportunity.
I've got a data flow task that takes a pair of tables, mashes the relevant data together, and comes out with some results to be put into an indexed table. The indexed table already has data that I'm not getting rid of and for simplicity's sake should retain their existing keys. So, I need to generate a key that starts from the highest Primary Key value already in the column.
I have found a blog post that works when starting from any known value, but this data flow will eventually be used on different databases, so that value won't be constant. It will always be the max of the column, though, but I can't find a way to grab that value using the script component suggested there.
This type of thing is notoriously difficult to do in SSIS which is why I try to avoid it. You need to:
...brace yourself...
-create a variable in your SSIS package to hold the start value
-create a SQL Task with a Parameter mapped to that variable with a direction of output and a query something like "SET ? = (SELECT MAX(IDValue) FROM Table)" - the question mark is the placeholder for the parameter which maps to the variable
-work the variable into your data flow - probably with a derived column transformation
I hope this helps...
I'm trying to create an SSIS package that takes data from an XML data source and for each row inserts another row with some preset values. Any ideas? I'm thinking I could use a DataReader source to generate the preset values by doing the following:
SELECT 'foo' as 'attribute1', 'bar' as 'attribute2'
The question is, how would I insert one row of this type for every row in the XML data source?
I'm not sure if I understand the question... My assumption is that you have n number of records coming into SSIS from your data source, and you want your output to have n * 2 records.
In order to do this, you can do the following:
multicast to create multiple copies of your input data
derived column transforms to set the "preset" values on the copies
sort
merge
Am I on the right track w/ what you're trying to accomplish?
I've never tried it, but it looks like you might be able to use a Derived Column transformation to do it: set the expression for attribute1 to "foo" and the expression for attribute2 to "bar".
You'd then transform the original data source, then only use the derived columns in your destination. If you still need the original source, you can Multicast it to create a duplicate.
At least I think this will work, based on the documentation. YMMV.
I would probably switch to using a Script Task and place your logic in there. You may still be able leverage the File Reading and other objects in SSIS to save some code.