Pentaho rows to variables - loops

I am trying to create a transformation which takes values from a table input (lets say, 10 rows) and in turns creates variables from the values from the rows. For each row in the original set, I then need to do a new transformation using the variables.
How can I loop through a bunch of rows, one by one, reading it into variables which will be used later in a transformation of its own?

looping in PDI is a bit complicated.
You can use the following proceed:
Create a job in PDI. First execute a Transformation, which reads or
generates the rows you need and use the "Copy rows to result" step.
After that create a execute Job step in the first Job. Here you have
to check the "Execute for every input row" Option in the "Advanced"
settings of the job. In this job you create and execute the final
Transformation, which is transforming your data.
In this transformation you have to use the "get rows from result".
Here you can finally read the variables you have defined before.
Hope i could help you.
Best regards.

Related

SSIS Insert Row and Get Resulting ID Using OLE DB Command

Having seen other questions with answers that don't totally address what I am after, I am wondering how in SSIS to use an OLE DB Command transformation to do an Insert and immediately get the resulting primary key for each row inserted as a new column, all within the same Data Flow Task. That sounds like it should be a common, built-in, fairly simple thing to ask for in SSIS, right?
So the obvious first choice for me would be to use an OLE DB Command where I do a SELECT and include an OUTPUT clause in my command:
INSERT INTO dbo.MyReleaseTable(releaseDate)
OUTPUT ?=Inserted.id
VALUES (?)
Only I can't figure out how to do this in an OLE DB Command (with an output) and it not complain. I've read about using stored procedures to do this, so am I required to use a stored procedure if I want to do this?
Let's say this won't work. I could use a Script Transformation and execute direct SQL in that, right? Well if that's what I must do, then the line between using custom code and SSIS block-components gets blurred and I am tempted to throw SSIS away and just do the whole ETL in code.
Then I hear talk about using an Execute SQL task. So now I can't even do 1 data flow within 1 data flow task? Am I getting that right? I'd like to keep 1 single data flow contained within 1 data flow task and not have to break my 1 flow out between separate tasks.
If it turns out that this seemingly simple data flow objective is not built into SSIS then I will consider dumping SSIS altogether. Talend has a free ETL offering, don't they?
Well, this can be done with SSIS inside DataFlow, but with some tricks. You need to create a stored procedure with input and output parameters and reuse it in DataFlow, as described here, fetching result value.
Drawbacks of this approach:
You need to create a Stored Procedure
Each row is processed with SP, which causes implicit transactions, instead of batch processing. This can slow down your package.
Solution without performance penalty - do it in two DataFlows, first doing value insert into some temp table, and the second DF - doing SQL MERGE command at OLE DB source and handling output data as you wish. All this inside transaction, handled either by MSDTC or by your own.

How to read and change series numbers in columns SSIS?

I'm trying to manipulate a column in SSIS which looks like below after i removed unwanted rows with derived column and conditional split in my data flow task. The source for this is a flatfile.
XXX008001161022061116030S1TVCO3057
XXX008002161022061146015S1PUAG1523
XXX009001161022063116030S1DVLD3002
XXX009002161022063146030S1TVCO3057
XXX009003161022063216015S1PUAG1523
XXX010001161022065059030S1MVMA3020
XXX010002161022065129030S1TVCO3057
XXX01000316102206515901551PPE01504
The first three numbers from the left (starting with "008" first row) represent a series, and the next three ("001") represent another number within the series. what i need is to change all of the first three numbers starting from "001" to the end.
The desired reslut would thus look like:
XXX001001161022061116030S1TVCO3057
XXX001002161022061146015S1PUAG1523
XXX002001161022063116030S1DVLD3002
XXX002002161022063146030S1TVCO3057
XXX002003161022063216015S1PUAG1523
XXX003001161022065059030S1MVMA3020
XXX003002161022065129030S1TVCO3057
XXX00300316102206515901551PPE01504
...
My potential solution would be to load the file to a temporary database table and query it with SQL from there, but i am trying to avoid this.
The final destination is a flatfile.
Does anybody have any ideas how to pull this off in SSIS? Other solutions are appreciated also.
Thanks in advance
I would definitely use the staging table approach and use windows functions to accomplish this. I could see a use case if SSIS was on another machine than the database engine and there was a need to offload the processing to the SSIS box.
In that case I would create a script transformation. You can process each row and make the necessary changes before passing the row to the output. You can use C# or VB.
There are many examples out there. Here is MSDN article - https://msdn.microsoft.com/en-us/library/ms136114.aspx

SSIS trying to only load file if all rows are good

I am trying to use a SSIS package to insert data from a file into a table but only if all the data in the file is good. I have read around and realise that I can split my good data and bad data with a conditional split.
However I cannot come up with a way to not write the good data if there is some bad data rows.
I can solve my problem use a staging table. I just thought I would ask if I am missing a more elegant way to do this within SSIS package rather than load then transform with TSQL.
Thanks
SSIS way allows wrapping actions in a Transaction. According to your task, you need to count bad rows in the dataflow, and if there is at least one bad row - do nothing i.e. rollback.
Below is how I would do it in Pure SSIS. Create a sequence and specify TransactionOption=Required on it, move your dataflow to the sequence. Add Count Rows transformation to your bad rows dataflow and store its result to some variable. After DataFlow inside sequence - create conditional task link where you check whether bad_rowcount variable > 0, and on the next - do little script task which raise an error to roll back transaction.
Pure SSIS - yes! Simpler than using staging table - not sure.

SSIS, splitting a single row into multiple rows

My problem is as follows. I have a CSV file (~100k rows) containting history information with the column format of:
ID1,History1,ID2,History2...ID110,History110
Each row may have anywhere between 0 and 110 history entries. Each separate entry requires a stored procedure to be called.
If there were a small number of possible entries per row, I imagine the way to do this would be to transform the data using a script, and send it to a unique path. Creating 110 paths would probably work, but isn't very elegant (and quite time consuming).
What would the best way to approach this be?
Just load the data (raw csv unchanged, one row per file line) into a staging table. Then, call a stored procedure that will use a string splitter to break up and loop over the staging table rows and call your other procedure for each history entry.
see: Arrays and Lists in SQL Server 2005 and Beyond
also see this previous answer: SQL comma delimted column => to rows then sum totals?
If you want to solve this in SSIS without the staging tables, you could create a destination script component. You could use switch statement or hashtable to lookup the right sproc to execute for the data row.
It is unclear whether this is a better solution then the staging table approach above; but it is an alternative.
I know you already accepted an answer, but couldn't you use an Unpivot task to achieve what you wanted to do here?

How to insert a row into a dataset using SSIS?

I'm trying to create an SSIS package that takes data from an XML data source and for each row inserts another row with some preset values. Any ideas? I'm thinking I could use a DataReader source to generate the preset values by doing the following:
SELECT 'foo' as 'attribute1', 'bar' as 'attribute2'
The question is, how would I insert one row of this type for every row in the XML data source?
I'm not sure if I understand the question... My assumption is that you have n number of records coming into SSIS from your data source, and you want your output to have n * 2 records.
In order to do this, you can do the following:
multicast to create multiple copies of your input data
derived column transforms to set the "preset" values on the copies
sort
merge
Am I on the right track w/ what you're trying to accomplish?
I've never tried it, but it looks like you might be able to use a Derived Column transformation to do it: set the expression for attribute1 to "foo" and the expression for attribute2 to "bar".
You'd then transform the original data source, then only use the derived columns in your destination. If you still need the original source, you can Multicast it to create a duplicate.
At least I think this will work, based on the documentation. YMMV.
I would probably switch to using a Script Task and place your logic in there. You may still be able leverage the File Reading and other objects in SSIS to save some code.

Resources