Pentaho data integration loop count variable - loops

I want a simple loop function to count the number of loop like below in java programming:
for (int i = 0; i <3; i++) {
count = count+1;
}
System.out.println(count);
I am doing it using Pentaho data integration. so I have 1 job contain 3 transformations in it, where first transformation set the number of loop (above example 3), then second transformation click "Execute every input row" for looping, and set variable inside the transformation using Javascript with getVariable() and setVariable() function. the last transformation just get variable and write log to show the count.
The problem is every loop in the transformation 2 will get variable as 0. so it end up result = 1, what I expect is 3.
added the project files here: file

We'll need more details to help you, don't you have a simple sample of what you are trying to accomplish?
You can pass variables to a transformation from the job, so I don't think you'll need the getVariable() and setVariable() methods, you can just use the configuration properties of the transformation to execute:
I prefer using parameters (next tab) better than arguments/variables, but that's my preference.

The problem is that, in t2 transformation, you are getting the variable and setting a new value for the same variable at the same time, which does not work in the same transformation. When you close the Set variable step you get this warning:
To avoid it you need to use two variables, one you set before executing the loop, and another set each time you execute the loop or after executing the loop with the last value.
I have modified your job to make it work, in t1 transformation, I have added a new field (rownum_seq) created with the Add sequence step, to know how much to add to variable cnt in each execution of the loop. I could have used your id field, but in case you don't have a similar field in your real world job, that's the step you need to achieve something similar. I have modified the variable name to make more clear what I'm doing, in t1 I set the value of variable var_cnt_before.
In t2 transformation, I read var_cnt_before, and set the value of var_cnt_after as the sum of var_cnt_before + rownum_seq, this means I'm changing the value of var_cnt_after each time t2 is executed.
In t3 transformation, I read var_cnt_after, which has the value of the last execution of t2.
You could also calculate var_cnt_after in t1 and not modify it in t2, using the Group by step to get the max value of rownum_seq, so you don't need to modify that variable each time you execute t2, depending on what you need to achieve you might need to use it or change in t2 or you just need the final value so you calculate it in t1.
This is the link to the modified job and transformations.

Related

How do I configure a foreach loop container in SSIS to take defined start and end dates and run for each date in between?

I'd like to define start_date and end_date parameters in my SSIS package, and have a foreach container that runs for each date in between these 2 (inclusive), which executes a SQL query taking in the current date value (ie starting at start_date) and using it as a parameter for the query.
I'm quite new to SSIS programming and I cannot find information on how to do this.
You can simply add a for loop container and use these variables as mentioned in the image below:
Where #[User:Loop], #[User:MinDate], #[User::MaxDate] are of type System.DateTime
image reference
How do I loop through date values stored as numbers within For Loop container?
Passing parameters to Execute SQL Task
You can refer to the following posts for more details:
Passing Variables to and from an SSIS task
How to pass variable as a parameter in Execute SQL Task SSIS?
A For Loop would be the better option to do this. Assuming that the start and end dates as supplied as parameters to the package as indicated in your question, be aware that parameters cannot be updated in an SSIS package however variables can be. This, as well as an example of the process outlined in your question, is further detailed below.
Create an SSIS datetime variable. As mentioned earlier, this will be used to store in initial value of the start date parameter.
Next add a For Loop on the Control Flow. In the screenshot below, the variable #[User::vStartDate] is set to the same value as the package parameter #[$Package::pStartDate] in the InitExpression on the For Loop. Iterations of the loop continue while the start date variable is less than/equal to the end date parameter, which is specified in the EvalExpression field.
After the Execute SQL Task (or however the SQL query is executed) add a Script Task. This will increment the value of the start date variable, so make sure this is the last task in the loop. An example C# script is below, which simply sets the value of the start date SSIS variable to a C# variable, increments the C# variable by one day, then writes that value back to the SSIS variable. Make sure to add the SSIS start date variable in the ReadWriteVariables field on the Script Task. This will go in the Main method of the script as follows. Although there’s just an increment of the date and update of the variable done in the Script Task, having this in place will allow for easier sustainability in the long term in case more logic needs to be added to this as C# provides much more functionality.
Script Task:
public void Main()
{
//get value in current iteration of loop
DateTime currentIterationValue = Convert.ToDateTime(Dts.Variables["User::vStartDate"].Value);
//increment by one day
currentIterationValue = currentIterationValue.AddDays(1);
//update SSIS variable
Dts.Variables["User::vStartDate"].Value = currentIterationValue;
Dts.TaskResult = (int)ScriptResults.Success;
}
I used an Execute SQL Task to store the dates (results) as a Result Set in a user defined variable. Then, inside the foreach loop container, I used the foreach ADO Enumerator on the user defined variable which has the set of dates. Using the variable mapping in the foreach loop container, you can map the start_date and end_dates from the user defined variable and pass it to other variables.
For example:
I have a SELECT statement which selects 2 rows with columns start_date and end_date. This will be stored as a result set in a variable called "main_dates". The foreach ADO Enumerator will enumerate on this "main_dates" variable (for each row in main_dates run the for loop). Then in the Variable Mapping section, you can create 2 new variables called u_start_date and u_end_date and map the columns 0 and 1 to these variables.
Inside the foreach loop whenever you execute a stored procedure, you can pass the u_start_date and u_end_date variables as parameters.

Automation Anywhere Excel Loop

Does anyone know of an easy way to start an excel loop in Automation Anywhere at a row other than 1? (or two by using contains header option). I would like to start a loop at row 5 but everything I have tried thus far does not work.
Thanks in advance.
You can use the command - Go to Cell under Excel commands as explained below.
If you want to start from 5th row and 'A' column then before starting the excel loop:
Open spreadsheet
Select Go to Cell Command under excel commands
Select Specific Cell radio button
Provide the value of the Specific Cell as'A5'(which denotes 5th row and 'A' column)
Reference:
How to write data to excel file in a loop starting from a specific cell
Assign a loop count to a variable and if loop count is less than your desired row, continue the loop.
You can do this very easily using the "Continue" command. When you are looping in the excel using "Each row in an excel Dataset", make use of system variable "counter". Since you want to start from row 5, that means you will have to skip the first 3 lines as you have marked the contains header as yes. Use the condition "if counter < 4 then continue" which will skip the first 3 lines.
See below image for reference
There are couple of ways you can go about it,
♦defining counter would be one way to go about it.and loop starts after 5.
♦you can use specific cell to begin your task and use times for loop(rest of the rows) and can iterate to rest of the rows.
♦other is run a smaller task where it would copy all cell from row 5 to a new sheet and you can use generic excelloop to do the same task.
[this sheet can also be used to input something like "Success" and timestamp when loop finished that row to know its completed.]
Look at this. I can tell you about G1ANT as I have worked on that. Just change the value of from, your excel loop will start from that value only.
addon msoffice version 4.101.0.0
addon core version 4.101.0.0
addon language version 4.103.0.0
excel.open ♥environment⟦USERPROFILE⟧\Desktop\tlt.xlsx inbackgound true
for ♥n from 5 to 100 step 1
excel.getrow ♥n result ♥rowInput
dialog ♥rowInput
end
you cam use Excel Command.
Excel Command
1. Go to cell.
2. select your Excel Session
3. select specific Cells
If you use Excel command in Loop Command
you can use System variable "$Counter$"
And set Excel Command's select specific Cells to A$Counter$ / B$Counter$ etc.

Auto-generating destinations of split files in SSIS

I am working on my first SSIS package. I have a view with data that looks something like:
Loc Data
1 asd
1 qwe
2 zxc
3 jkl
And I need all of the rows to go to different files based on the Loc value. So all of the data rows where Loc = 1 should end up in the file named Loc1.txt, and the same for each other Loc.
It seems like this can be accomplished with a conditional split to flat file, but that would require a destination for each Location. I have a lot of Locations, and they all will be handled the same way other than being split in to different files.
Is there a built in way to do this without creating a bunch of destination components? Or can I at least use the script component to act as a way?
You should be able to set an expression using a variable. Define your path up to the directory and then set the variable equal to that column.
You'll need an Execute SQL task to return a Single Row result set, and loop that in a container for every row in your original result set.
I don't have access at the moment to post screenshots, but this link should help outline the steps.
So when your package runs the expression will look like:
'C:\Documents\MyPath\location' + #User::LocationColumn + '.txt'
It should end up feeding your directory with files according to location.
Set the User::LocationColumn equal to the Location Column in your result set. Write your result set to group by Location, so all your records write to a single file per Location.
I spent some time try to complete this task using the method #Phoenix suggest, but stumbled upon this video along the way.
I ended up going with the method shown in the video. I was hoping I wouldn't have to separate it in to multiple select statements for each location and an extra one to grab the distinct locations, but I thought the SSIS implementation in the video was much cleaner than the alternative.
Change the connection manager's connection string, in which you have to use variable which should be changed.
By varying the variable, destination file also changes
and connection string is :
'C:\Documents\ABC\Files\' + #User::data + '.txt'
vote this if it helps you

jmeter: how to use while loop inside a loop

my JMeter plan is
-- Loop controller (set to 3)
-- while controller(set to "${Dataitems}" != "<EOF>")
-- using CSV for reading the data items
-- HTTP requests
now issue is that after completing the while loop , thread is getting stopped . I know this is controlled in CSV config "Stop thread on EOF" =true will make thread to stop. But my requirement is that after iterating through my csv , I want control to passed to outer loop which will run based on the variable i provide in my case 3 so all request will run for all the data items i provide in csv for three times.
To achieve i tried to set
"Stop thread on EOF" = false , then its started running lopp for intinte times
"Stop thread on EOF" = ${Var1} , where var1 was declared in test plan , but still loop was running infinite times
Can some provide an insight on how to handover control from while loop to outer loop?
Your help much appreciated.
As you know, While controller will keep running until the condition fails.
Unfortunately, in your case, you need to set below condition to reiterate the CSV file values.
'Recycle on EOF?' to TRUE
So, While controller will never exit as you can not use any condition there (to fulfill your requirement).
The only way I can think to make your requirement work is - to use 2 Loop Controllers.
(Outer) Loop Controller (set to 3 as you had said)
(Inner) Loop Controller (set to no of rows in CSV)
I get the output like this. My CSV file has 3 rows with the values A,B,C. They are called 3 times by the outer loop controller.
If you do not want to hard code the number in the loop controller, use a variable & find the no of rows using beanshell and set the value to the variable.

SSIS: execute first task if condition met else skip to next

I am getting to know SSIS, I apologize if the question is too simple.
I got a set of tasks inside a foreach-loop-container.
The first task needs only to get executed on condition that a certain user variable is not null or empty.
Otherwise, the flow should skip the first task and continue to the second one.
How would I go about realizing this (in detail) ?
Issue 1: There are two ways to interpret your logic: "...a certain user variable is not null or empty":
The (Variable is Not Null) OR the (Variable is Empty).
The (Variable is Not Null) OR the (Variable is Not Empty).
It's all about the object(s?) of the word "not". The differences are subtle but will impact when the first task in the Foreach loop executes. For demonstration purposes, I am assuming you intend #1.
Issue 2: The first task can no longer be first. In order to accomplish what you desire using SSIS inside the BIDS environment, you need to place another task ahead of the task formerly known as "the first task". This is so you can set a Precedence Constraint on the former first task from the new first task.
It is possible to accomplish what you desire by designing your SSIS dynamically from managed code, but I don't think this issue warrants the overhead associated with that design choice.
I like using an empty Sequence Container as an "Anchor" task - a task that exists solely to serve as the starting endpoint of a Precedence Constraint. I heavily document them as such. I don't want anyone deleting the "unnecessary empty container" and roaming the halls for days shaking their heads and repeating "Andy, Andy, Andy..." but I digress.
In the example below, I have two precedence constraints leaving the empty Sequence Container. One goes to the task that may be skipped and the other to the task following the task that can sometimes be skipped. A third precedence constraint is required between the task that can sometimes be skipped and the task following. It is important to note this third precedence constraint must be edited and the Multiple Constraints option set to OR. This allows the task following to execute when either of the mutually exclusive previous paths are taken. By default, this is set to AND and will require both paths to execute. By definition, that will not - cannot - happen with mutually exclusive paths.
I test the value of an SSIS String variable named #MyVar to see if it's Null or Empty. I used the Expression Only Evaluation Option for the constraints leaving the empty Sequence Container. The expressions vary but establish the mutual exclusivity of the expression. My Foreach Loop Container looks like this:
I hope this helps.
:{>
The best thing can be to use the 'Disable Property' in expressions and giving the expression as per the condition. Just search how to use the disable property.
How about a simple solution instead of some of the more complex ones that have already been given. For the task you want to conditionally skip, add an expression to the disabled property. Any expression that produces a true or false result will work, so for the question example you could use:
ISNULL(#[User::MY_VAR]) || #[User::MY_VAR]==""
The only downside is that it may not as visible as some of the other solutions but it is far easier to implement.
I would create a For Loop Container around the task that needs the condition with the following conditions (#iis the loop counter, #foo is your user variable that you want to test):
InitExpression: #i=0
EvalExpression: #i<1 && !ISNULL(#Foo) && #Foo!=""
AssignExpression: #i=#i+1
there is no need to create a "script"
I think the best (and simpler) approach is to add a blank script task inside your loop container before your "first task", drag the green arrow from it to your "first task" (which obviously will become the second) and use the precedence constraint to do the check.
To do that, double click the arrow, select "expression" on the "evaluation operation" and write your expression. After hitting OK the arrow will become blue indicating that it isnt a simple precedence constraint, it has a expression assigned to it.
Hopefully I didn't misunderstand the question but a possible solution can be as written below.
I created a sample ForEach loop. The loop itself is an item enumerator. It enumerates the numbers 1, 2, 3. The acutal value is stored in a variable called LoopVariable.
There is another variable named FirstShouldRun which is a Boolean variable showing the first task in the foreach loop should be runned or not. I set this variable's EvaluateAsExpression property to true, and its expression is (#[User::LoopVariable] % 2) == 0. I would like to demonstrate with this that every second time the first task should be started.
The two tasks do nothing much but display a MessageBox showing the task has been started.
I started the package and first and the third time the first task didn't started. In the second loop the MessageBox (showing "First started") appeared.
After that you should set FirstShouldRun variable as you like.
As I mentioned in my first comment to the OP, this solution is based on the idea of Amos Wood written in another answer.
That's a bit tricky.
You have to create a Script Task and check if your variable is not null in there.
So first you have the script task in which you will have the following code in your Main() function:
public void Main()
{
if (Dts.Variables["User::yourVariable"].Value != null)
{
Dts.TaskResult = (int)ScriptResults.Failure;
}
else
{
Dts.TaskResult = (int)ScriptResults.Success;
}
}
Then you create two connections from your script task, one to the task that needs to be executed when your variable is not null, and one to the next task (or to another script, if you need to check again, if the variable is not null).
Then you right-click on the (green) arrow of your first connection and select "Failure". Right-click the connection to the next task / script and set it to "Completion".
It should then look something like this:
That's it.

Resources