I am getting to know SSIS, I apologize if the question is too simple.
I got a set of tasks inside a foreach-loop-container.
The first task needs only to get executed on condition that a certain user variable is not null or empty.
Otherwise, the flow should skip the first task and continue to the second one.
How would I go about realizing this (in detail) ?
Issue 1: There are two ways to interpret your logic: "...a certain user variable is not null or empty":
The (Variable is Not Null) OR the (Variable is Empty).
The (Variable is Not Null) OR the (Variable is Not Empty).
It's all about the object(s?) of the word "not". The differences are subtle but will impact when the first task in the Foreach loop executes. For demonstration purposes, I am assuming you intend #1.
Issue 2: The first task can no longer be first. In order to accomplish what you desire using SSIS inside the BIDS environment, you need to place another task ahead of the task formerly known as "the first task". This is so you can set a Precedence Constraint on the former first task from the new first task.
It is possible to accomplish what you desire by designing your SSIS dynamically from managed code, but I don't think this issue warrants the overhead associated with that design choice.
I like using an empty Sequence Container as an "Anchor" task - a task that exists solely to serve as the starting endpoint of a Precedence Constraint. I heavily document them as such. I don't want anyone deleting the "unnecessary empty container" and roaming the halls for days shaking their heads and repeating "Andy, Andy, Andy..." but I digress.
In the example below, I have two precedence constraints leaving the empty Sequence Container. One goes to the task that may be skipped and the other to the task following the task that can sometimes be skipped. A third precedence constraint is required between the task that can sometimes be skipped and the task following. It is important to note this third precedence constraint must be edited and the Multiple Constraints option set to OR. This allows the task following to execute when either of the mutually exclusive previous paths are taken. By default, this is set to AND and will require both paths to execute. By definition, that will not - cannot - happen with mutually exclusive paths.
I test the value of an SSIS String variable named #MyVar to see if it's Null or Empty. I used the Expression Only Evaluation Option for the constraints leaving the empty Sequence Container. The expressions vary but establish the mutual exclusivity of the expression. My Foreach Loop Container looks like this:
I hope this helps.
:{>
The best thing can be to use the 'Disable Property' in expressions and giving the expression as per the condition. Just search how to use the disable property.
How about a simple solution instead of some of the more complex ones that have already been given. For the task you want to conditionally skip, add an expression to the disabled property. Any expression that produces a true or false result will work, so for the question example you could use:
ISNULL(#[User::MY_VAR]) || #[User::MY_VAR]==""
The only downside is that it may not as visible as some of the other solutions but it is far easier to implement.
I would create a For Loop Container around the task that needs the condition with the following conditions (#iis the loop counter, #foo is your user variable that you want to test):
InitExpression: #i=0
EvalExpression: #i<1 && !ISNULL(#Foo) && #Foo!=""
AssignExpression: #i=#i+1
there is no need to create a "script"
I think the best (and simpler) approach is to add a blank script task inside your loop container before your "first task", drag the green arrow from it to your "first task" (which obviously will become the second) and use the precedence constraint to do the check.
To do that, double click the arrow, select "expression" on the "evaluation operation" and write your expression. After hitting OK the arrow will become blue indicating that it isnt a simple precedence constraint, it has a expression assigned to it.
Hopefully I didn't misunderstand the question but a possible solution can be as written below.
I created a sample ForEach loop. The loop itself is an item enumerator. It enumerates the numbers 1, 2, 3. The acutal value is stored in a variable called LoopVariable.
There is another variable named FirstShouldRun which is a Boolean variable showing the first task in the foreach loop should be runned or not. I set this variable's EvaluateAsExpression property to true, and its expression is (#[User::LoopVariable] % 2) == 0. I would like to demonstrate with this that every second time the first task should be started.
The two tasks do nothing much but display a MessageBox showing the task has been started.
I started the package and first and the third time the first task didn't started. In the second loop the MessageBox (showing "First started") appeared.
After that you should set FirstShouldRun variable as you like.
As I mentioned in my first comment to the OP, this solution is based on the idea of Amos Wood written in another answer.
That's a bit tricky.
You have to create a Script Task and check if your variable is not null in there.
So first you have the script task in which you will have the following code in your Main() function:
public void Main()
{
if (Dts.Variables["User::yourVariable"].Value != null)
{
Dts.TaskResult = (int)ScriptResults.Failure;
}
else
{
Dts.TaskResult = (int)ScriptResults.Success;
}
}
Then you create two connections from your script task, one to the task that needs to be executed when your variable is not null, and one to the next task (or to another script, if you need to check again, if the variable is not null).
Then you right-click on the (green) arrow of your first connection and select "Failure". Right-click the connection to the next task / script and set it to "Completion".
It should then look something like this:
That's it.
Related
I want a simple loop function to count the number of loop like below in java programming:
for (int i = 0; i <3; i++) {
count = count+1;
}
System.out.println(count);
I am doing it using Pentaho data integration. so I have 1 job contain 3 transformations in it, where first transformation set the number of loop (above example 3), then second transformation click "Execute every input row" for looping, and set variable inside the transformation using Javascript with getVariable() and setVariable() function. the last transformation just get variable and write log to show the count.
The problem is every loop in the transformation 2 will get variable as 0. so it end up result = 1, what I expect is 3.
added the project files here: file
We'll need more details to help you, don't you have a simple sample of what you are trying to accomplish?
You can pass variables to a transformation from the job, so I don't think you'll need the getVariable() and setVariable() methods, you can just use the configuration properties of the transformation to execute:
I prefer using parameters (next tab) better than arguments/variables, but that's my preference.
The problem is that, in t2 transformation, you are getting the variable and setting a new value for the same variable at the same time, which does not work in the same transformation. When you close the Set variable step you get this warning:
To avoid it you need to use two variables, one you set before executing the loop, and another set each time you execute the loop or after executing the loop with the last value.
I have modified your job to make it work, in t1 transformation, I have added a new field (rownum_seq) created with the Add sequence step, to know how much to add to variable cnt in each execution of the loop. I could have used your id field, but in case you don't have a similar field in your real world job, that's the step you need to achieve something similar. I have modified the variable name to make more clear what I'm doing, in t1 I set the value of variable var_cnt_before.
In t2 transformation, I read var_cnt_before, and set the value of var_cnt_after as the sum of var_cnt_before + rownum_seq, this means I'm changing the value of var_cnt_after each time t2 is executed.
In t3 transformation, I read var_cnt_after, which has the value of the last execution of t2.
You could also calculate var_cnt_after in t1 and not modify it in t2, using the Group by step to get the max value of rownum_seq, so you don't need to modify that variable each time you execute t2, depending on what you need to achieve you might need to use it or change in t2 or you just need the final value so you calculate it in t1.
This is the link to the modified job and transformations.
Maybe this question seems very basic and elementary but I need some clarification.
In SSIS the following Constraints are available to control component execution flow:
Logical AND. All constraints must evaluate to true.
Logical OR. At least one constraint must evaluate to true.
I have a very simple package I am using to test the conditions described.
A Process Task - calls a batch file that returns a string value and exit code.
The Process Task receives the value from the batch file as a StandardOutputVariable and assigns it to my user defined variable User::BatchExecutionCode
A Script Task - for validation, displays the acquired value from my user variable so I can visually see and affirm that the expected value is getting passed
An Execute SQL Task - That simply does a SELECT GETDATE().
I have setup a Logical AND condition between the Script Task and Process Task that mandates:
Constraint - Task execution must "Complete"
Expression - #BatchExecutionCode == "0"
When I execute, both tasks prior to the final "Execute SQL" task complete and I get a visual message box showing the value of my variable as "0" but the execution just stops afterward and never executes the last task after evaluation.
What is the problem? According to the stated conditions for execution, the conditions have been met. So exactly how are the precedence constraints being evaluated.
EDIT: For clarification, In the screenshot the value of #BatchExecutionCode has been passed to User::Result variable via the Script Task. Thats why the Expression says #Result == "0". Either way, the results is still the same
The only reason should be #Result doesn't equal "0", you can confirm that with a post execute breakpoint on you script task.
You can use the following article to confirm what's really in this variable.
http://www.jasonstrate.com/2011/01/31-days-of-ssis-using-breakpoints-231/
We wrote a function get_timestamp() defined as
CREATE OR REPLACE FUNCTION get_timestamp()
RETURNS integer AS
$$
SELECT (FLOOR(EXTRACT(EPOCH FROM clock_timestamp()) * 10) - 13885344000)::int;
$$
LANGUAGE SQL;
This was used on INSERT and UPDATE to enter or edit a value in a created and modified field in the database record. However, we found when adding or updating records consecutively it was returning the same value.
On inspecting the function in pgAdmin III we noted that on running the SQL to build the function the key word IMMUTABLE had been injected after the LANGUAGE SQL statement. The documentation states that the default is VOLATILE (If none of these appear, VOLATILE is the default assumption) so I am not sure why IMMUTABLE was injected, however, changing this to STABLE has solved the issue of repeated values.
NOTE: As stated in the accepted answer, IMMUTABLE is never added to a function by pgAdmin or Postgres and must have been added during development.
I am guessing what was happening was that this function was being evaluated and the result was being cached for optimization, as it was marked IMMUTABLE indicating to the Postgres engine that the return value should not change given the same (empty) parameter list. However, when not used within a trigger, when used directly in the INSERT statement, the function would return a distinct value FIVE times before then returning the same value from then on. Is this due to some optimisation algorithm that says something like "If an IMMUTABLE function is used more that 5 times in a session, cache the result for future calls"?
Any clarification on how these keywords should be used in Postgres functions would be appreciated. Is STABLE the correct option for us given that we use this function in triggers, or is there something more to consider, for example the docs say:
(It is inappropriate for AFTER triggers that wish to query rows
modified by the current command.)
But I am not altogether clear on why.
The key word IMMUTABLE is never added automatically by pgAdmin or Postgres. Whoever created or replaced the function did that.
The correct volatility for the given function is VOLATILE (also the default), not STABLE - or it wouldn't make sense to use clock_timestamp() which is VOLATILE in contrast to now() or CURRENT_TIMESTAMP which are STABLE: those return the same timestamp within the same transaction. The manual:
clock_timestamp() returns the actual current time, and therefore its
value changes even within a single SQL command.
The manual warns that function volatility STABLE ...
is inappropriate for AFTER triggers that wish to query rows modified
by the current command.
.. because repeated evaluation of the trigger function can return different results for the same row. So, not STABLE.
You ask:
Do you have an idea as to why the function returned correctly five
times before sticking on the fifth value when set as IMMUTABLE?
The Postgres Wiki:
With 9.2, the planner will use specific plans regarding to the
parameters sent (the query will be planned at execution), except if
the query is executed several times and the planner decides that the
generic plan is not too much more expensive than the specific plans.
Bold emphasis mine. Doesn't seem to make sense for an IMMUTABLE function without input parameters. But the false label is overridden by the VOLATILE function in the body (voids function inlining): a different query plan can still make sense.
Related:
PostgreSQL Stored Procedure Performance
Aside
trunc() is slightly faster than floor() and does the same here, since positive numbers are guaranteed:
SELECT (trunc(EXTRACT(EPOCH FROM clock_timestamp()) * 10) - 13885344000)::int
I have a simple String variable with the following value: "C:\Test.txt".
Now I would like to edit the variable to point to a different file.
I cannot find a way to do that. I can change the Name, Data Type, but not the value itself!
Do I need to delete the variable and create the new one?
Update: The problem was caused by "ReadOnly" property set to "True". For typical scenarios, see the accepted answer below.
As #Yuck and #devarc have noted, there are two different and distinct values a Variable holds. The Design-time value is the value you assign when the variable is first created. In your case, the variable holds C:\Test.txt as the design-time value. Everytime you open the package, it would show C:\Test.txt until you change it in the
To make the value of a variable change while the package is running, your options are either to set the value or calculate it. Here I have created a package-level variable CurrentFile with the value of C:\Test.txt
One thing that often trips people up is that they have correctly changed the run-time value but when they run it in BIDS, they see the "old" value. The value displayed in the Variables window does not change during package execution.
During package execution, my Variables window still shows the design-time value (C:\Test.txt) but the true value is reflected in the Locals window (C:\Test2.txt)
Setting a value
The value of most anything in SSIS can be established at run-time through a set of verbose command-line options or through configuration sources. The biggest difference in my mind is that this approach is that the value will always be the value for the entire lifetime of package execution. Sequential or parallel invocations of a package can change that value but for that execution the value would remain constant (barring an explicit modification of the value.
/SET
Command-line execution (dtexec.exe), right clicking on a package and running from the filesystem (dtexecUI.exe) or creating a SQL Agent job step of SQL Server Integration Services all allow for providing a run-time value through the SET command. Using the above variable, the following command would set the run-time value to C:\Test2.txt
dtexec /file C:\Generated.dtsx /set \Package.Variables[User::CurrentFile].Properties[Value];"C:\Test2.txt"
Configuration
SSIS offers an option to create configuration sources to provide run-time values to packages. The article I linked to above does a much better job describing the pros and cons of the configuration options than I will do here. I will say that I typically use both - my SET command configures a connection manager which is then used by the package to find the "full" set of package configurations.
Calculating a value
There are a variety of tasks in SSIS that can change the value of a variable as well as the use of Expressions to change a value. I see these as things that operate on value whilst the package is in flight.
Tasks
A Script Task is one of the most commonly used mechanisms for those starting out but I find other tools in the SSIS toolkit usually better suited for changing variable values.
Foreach Loop Container and Execute SQL Task are two of the other big Tasks you should look at for assignment of a variable value.
Expressions
Expressions are the most glorious candy in the SSIS toolbox. Most every "thing" in SSIS exposes properties for configuration. That's helpful, but using assigning an expression to build those properties is outstanding.
For example, imagine 3 variables RootFolder, FileName and ComputedCurrentFile with values of C:\, File2.txt and empty string. On the Properties window for ComputedCurrentFile we'd change the value for EvaluateAsExpression from False to True and then use an expression like #[User::RootFolder]+ "\\" +#[User::FileName] That simply concatenates the value the first two variables together. This can be helpful if the file name for processing was standard but the source folder changed often. Or if we're talking about output, it's common to use expressions to build an output file name using the date and possibly time of when the package is running.
Finally, there is nothing that prevents a mixing and matching of these approaches. I typically use a configuration to point a file enumerator at the correct starting folder and then use calculated values to identify the current file for processing.
If you want to change it in designer just right click on free space and --> Variables.
But if you want to change it at runtime I suggest you to:
create script task
choose language
add your variable to ReadWriteVariables.
Edit script.
For example in VB:
Dts.Variables("myVariable").Value = #"C:\Test2.txt";
Dts.TaskResult = ScriptResults.Success
Found an easy way to handle this. Remove the Variable from Expression which will enable Value Box to edit. Once it is edited, add the Variable back in the Expression should get the updated value. Hope this helps.
I was also facing the same issue like you where once the variable is declared and define (for eg:var1=text1.csv)in SSIS Variable window I was not able to update the variable value(for eg: var1=text2.csv) in SSIS Variable Window by clicking on the variable value field.
Applied below fix:-
I noticed that I was using var1 variable as a Expression by using expression builder so to update the value(for eg:-var1=text2.csv) I used expression builder window.once you done using the expression builder,you can see the text2.csv is got mapped to var1.
I'm tring to create an SSIS package to import some dataset files, however given that I seem to be hitting a brick
wall everytime I achieve a small part of the task I need to take a step back and perform a sanity check on what I'm
trying to achieve, and if you good people can advise whether SSIS is the way to go about this then I would
appreciate it.
These are my questions from this morning :-
debugging SSIS packages - debug.writeline
Changing an SSIS dts variables
What I'm trying to do is have a For..Each container enumerate over the files in a share on the SQL Server. For each
file it finds a script task runs to check various attributes of the filename, such as looking for a three letter
code, a date in CCYYMM, the name of the data contained therein, and optionally some comments. For example:-
ABC_201007_SalesData_[optional comment goes here].csv
I'm looking to parse the name using a regular expression and put the values of 'ABC', '201007', and
'SalesData' in variables.
I then want to move the file to an error folder if it doesn't meet certain criteria :-
Three character code
Six character date
Dataset name (e.g. SalesData, in this example)
CSV extension
I then want to lookup the Character code, the date (or part thereof), and the Dataset name against a lookup table
to mark off a 'checklist' of received files from each client.
Then, based on the entry in the checklist, I want to kick off another SSIS package.
So, for example I may have a table called 'Checklist' with these columns :-
Client code Dataset SSIS_Package
ABC SalesData NorthSalesData.dtsx
DEF SalesData SouthSalesData.dtsx
If anyone has a better way of achieving this I am interested in hearing about it.
Thanks in advance
That's an interesting scenario, and should be relatively easy to handle.
First, your choice of the Foreach Loop is a good one. You'll be using the Foreach File Enumerator. You can restrict the files you iterate over to be just CSVs so that you don't have to "filter" for those later.
The Foreach File Enumerator puts the filename (full path or just file name) into a variable - let's call that "FileName". There's (at least) two ways you can parse that - expressions or a Script Task. Depends which one you're more comfortable with. Either way, you'll need to create three variables to hold the "parts" of the filename - I'll call them "FileCode", "FileDate", and "FileDataset".
To do this with expressions, you need to set the EvaluateAsExpression property on FileCode, FileDate, and FileDataset to true. Then in the expressions, you need to use FINDSTRING and SUBSTRING to carve up FileName as you see fit. Expressions don't have Regex capability.
To do this in a Script Task, pass the FileName variable in as a ReadOnly variable, and the other three as ReadWrite. You can use the Regex capabilities of .Net, or just manually use IndexOf and Substring to get what you need.
Unfortunately, you have just missed the SQLLunch livemeeting on the ForEach loop: http://www.bidn.com/blogs/BradSchacht/ssis/812/sql-lunch-tomorrow
They are recording the session, however.