Capture aggregate value from data flow task into a variable - sql-server

I have an OLEDB (SQL) data flow source (A) that pulls a result set from a stored procedure and throws the results into an OLEDB (Oracle) data flow destination (B).
Is there a way to capture an aggregate value from the dataset into a variable, all within the data flow task? Specifically, I'd want to capture the MAX(<DateValue>) from the entire dataset.
Otherwise, I'd have to pull the same data twice in a different data flow task, whether I point to A or in its new location, B.
EDIT: I already know how to do this in the Control Flow from an Execute SQL task. I'm asking because I'm curious to know if I can get this done in the Data Flow task since I'm already collecting the data there. Is there a way to grab an aggregate value in the Data Flow?

One way of doing it would be to add a multicast transform between the source and destination that also feeds into a script component.
Whilst an aggregate transform would also work this method avoids adding a blocking transform
Configure the Script Component as a destination, give it read/write access to the variable and then edit the script to be something like
//Instance level variable
DateTime? maxDate = null;
public override void PostExecute()
{
base.PostExecute();
if (maxDate.HasValue)
{
this.Variables.MaxDate = maxDate.Value;
}
System.Windows.Forms.MessageBox.Show(this.Variables.MaxDate.ToString());
}
public override void Input0_ProcessInputRow(Input0Buffer Row)
{
if (!Row.createdate_IsNull)
{
maxDate = Row.createdate < maxDate ? maxDate : Row.createdate;
}
}

U keep your current DFT as such in the control Flow (source to destination mapping as such)
In the control flow, add an Exceute SQL task, With the same source query with your desired MAX() function applied on it.
Eg:
--Let the given be Your source query.
SELECT ColumnA,
ColumnB,
ColumnC,
DateValue
FROM SourceA
--Your new query to calculate MAX() may be this.
SELECT MAX(DateValue)
FROM SourceA
Give the 2nd SQL in the execute SQL task.
In the package Add a variable of type int, in package level scope. (eg: name = intMax)
In the Execute SQL task, not the following.
a.general Tab
Result Set = Single Row
Sql Statement = SELECT MAX(DateValue) FROM SourceA
b.result set Tab
click ADD
ResultName = 0
variable Name = variable Name (eg: name = intMax)
Your required result will be available in the variable from here onwards.

Related

How does the question mark function in the SQL query?

I'm attempting to edit an ETL package(SSIS) that queries a SQL table and outputs csv files for every StationID and I'm having trouble understanding how the question mark is being used in the query definition below. I understand ? is used a parameter but I don't understand how it's used in the date function below:
SELECT TimeSeriesIdentifier, StationID, ParameterID FROM dbo.EtlView WHERE
LastModified > DATEADD(hour, ?*-1, GETDATE())
AND StationID LIKE
CASE WHEN ? = 0 THEN
StationID
ELSE
?
END
The parameterization available in SSIS is dependent upon the connection manager used.
OLE DB and ODBC based connection managers use ? as the variable place holder, whereas ADO.NET uses a named parameter, #myVariable.
OLE DB begins counting at 0 whereas ODBC used a 1 based counting system. They are both however ordinal based systems so in your CASE expression the two ? are for the same variable. But, you'll have to list that SSIS Variable twice in the parameter mapping dialog because it's ordinal based - i.e. (param, name) => #HoursBack, 0; #MyVar, 1; and #MyVar, 2;
A "dumb trick" I would employ if I had to deal with repeated ordinal based parameters or if I was troubleshooting packages is to make the supplied query use local variables in the query itself.
DECLARE
#HoursBack int = ?
, #MyVariable int = ?;
SELECT
TimeSeriesIdentifier
, StationID
, ParameterID
FROM
dbo.EtlView
WHERE
LastModified > DATEADD(HOUR, #HoursBack * -1, GETDATE())
AND StationID LIKE
CASE
WHEN #MyVariable = 0 THEN StationID
ELSE #MyVariable
END;
Now I only have to map the SSIS Variable #MyVar once into my script as the "normal" TSQL parameterization takes over. The other benefit is that I can copy and paste that into a query tool and sub in the ?s with actual values to inspect the results directly from the source. This can be helpful if you're running into situations where the strong typing in SSIS prevents you from getting the results into a data viewer.
SSIS is building a parameterized query.
You can get more information about this here (MySQL-specific):
What is the question mark's significance in MySQL at "WHERE column = ?"?
Or you can get a more generally-applicable response here: What does a question mark represent in SQL queries?
At a very "nuts and bolts" level, those are parameters being passed into the SQL statement by the package. With the Execute SQL task open, click on the tab that says Parameter Mapping. There will be a list of variables that are being sent into the query, and they are consumed in the order that they're listed.
Here's a logger for an archiving package I'm working on:
The query on the General tab just writes those five values to a table:
INSERT INTO dbo.ArchiveRowCounts (
TableName,
ServerName,
ReportYear,
BaseTblCnt,
ArchiveTblCnt)
VALUES (?,?,?,?,?);

SSIS: enrich query and table with input file as base

I need to extract data from a DB2 database to a SQL Server. I need to create my query based on a Excel file I have 176 records, which I need to create repeating queries & put in SQL server
So for example;
I have an Excel with a Number, From date, To date, and a Country
So the query should use these information from the records
SELECT *
FROM dbo.Test
WHERE Number = excel.Number1 AND Date BETWEEN excel.fromDate1 AND excel.toDate1 AND Country = excel.country1
And then another query with
SELECT *
FROM dbo.Test
WHERE Number = excel.Number2 AND Date BETWEEN excel.fromDate2 AND excel.toDate2 AND Country = excel.country2
Etc...
How should I do something like this in SSIS?
If needed I can put the DB2 and Excel data in MS SQL
You can proceed with the following approach:
Extract data rows from Excel and put it into SSIS Object Variable
Proceed with a Foreach loop to get each row from the Object Variable, parsing Object Variable to separate variables
Inject variable values into SQL Select command with Expressions
Perform Data Flow task based on SQL command, transform and put it into the target
Overall, your task seems to be feasible, but requires some knowledge on parsing Object Variable in Foreach Loop, and writing Variable Expressions.

Can I set variables in an SSIS for loop based on a query?

I have a SQL query that's being executed in SSIS to load data into a CSV file that looks something like this:
SELECT *
FROM SomeTable
WHERE SomeDate BETWEEN '1-Jan-2016' AND '31-Dec-2016'
AND Param1 = 2 AND Param2 = 2
When this was written in QlikView, I used parameters like so:
SELECT *
FROM SomeTable
WHERE SomeDate BETWEEN '1-Jan-2016' AND '31-Dec-2016'
AND Param1 = $(Param1) AND Param2 = $(Param2)
Now that I'm migrating the entire task to SSIS, I'm figuring out how to get it such that Param1 and Param2 would be dynamically assigned. For example, in QlikView, I created a table that was populated by another query:
SELECT Param1, Param2
FROM ThisTable
WHERE SomeID = 1
Something like that. The selection of Param1 and Param2 from that query gets me the necessary values for $(Param1) and $(Param2) in my QlikView code.
I'm right now trying to convert my QlikView code into an SSIS package instead since SSIS is a dedicated ETL tool whereas QlikView isn't. Is what I'm doing possible? And if so, how would I go about doing it?
My idea was to wrap it all in a for loop container and have it stop after it grabs the last Param1 and Param2 from this query:
SELECT Param1, Param2
FROM ThisTable
WHERE SomeID = 1
Basically, I'm trying to avoid having to write my first select statement a thousand times over.
Thank you.
If what I'm saying doesn't make sense, please let me know so I can elaborate a bit more.
I'm suspecting that you're doing a SQL Task, therefore, you simply can map parameters in SQL Task component.
What you'll have to do is to first create a SQL Component that executes this query:
SELECT Param1, Param2
FROM ThisTable
WHERE SomeID = 1;
I've mocked SQLStatement up, but everything else should look like that (don't forget to check that it has a full dataset):
Then put result set into a object variable (just make sure to Result Name as 0):
Now in order to run following query for each value, gathered above, we can use a foreach loop and iterate over our dataset. In this foreach loop we'll put a data flow task where you'll be using OLE DB as a source and flat file as destination in order to read data and put it into csv files. (In real project I'd advice to use ODBC instead of OLE DB, it's faster).
Loop properties:
Assign variables in foreach loop:
Now in dataflow task create your data source, add query and parameterize it like that:
In the end it should look somehow like that (what's highlighted in red is inner components of Data Flow Task):
Of course you'll have to add some logging or some other components, but this is basic and will get you moving.
You may also what to look into the SSIS tools "For Loop Container" and " Foreach Loop.
You fill a Parameter type Object with a list of values - in my case I used a query in the SQL Task [Lookup missing Orders]. And then the [Foreach Order Loop] task goes through each entry in the parameter and executes the [Load missing Orders] data flow task:

How to control SSIS package flow based on record count returned by a query?

I'm trying to first check if there are any new records to process before I execute my package. I have a bit field called "processed" in a SQL Server 2008 R2 table that has a value of 1 if processed and 0 if not.
I want to query it thus:
select count(processed) from dbo.AR_Sale where processed = 0
If the result is 0 I want to send an e-mail saying the records are not there. If greater than zero, I want to proceed with package execution. I am new to SSIS and can't seem to figure out what tool to use for this.
My package has a data flow item with an OLE DB connection inside it to the database. The connection uses a query to return the records. Unfortunately, the query completes successfully (as it should) even if there are no records to process. Here is the query:
Select * from dbo.AR_Sale where processed = 0
I copy these records to a data warehouse and then run another query to update the source table by changing the processed field from 0 to 1.
Any help would be greatly appreciated.
One option would be to make use of precedence constraint in conjunction with Execute SQL task to achieve this functionality. Here is an example of how to achieve this in SSIS 2008 R2.
I created a simple table based on the information provided in the question.
Create table script:
CREATE TABLE dbo.AR_Sale(
Id int NOT NULL IDENTITY PRIMARY KEY,
Item varchar(30) NOT NULL,
Price numeric(10, 2) NOT NULL,
Processed bit NOT NULL
)
GO
Then populated the new table with some sample data. You can see that one of the row has Processed flag set to zero.
Populate table script:
INSERT INTO dbo.AR_Sale (Item, Price, Processed) VALUES
('Item 1', 23.84, 1),
('Item 2', 72.19, 0),
('Item 3', 45.73, 1);
On the SSIS package, create the following two variables.
Processed of data type Int32
SQLFetchCount of data type String with value set to SELECT COUNT(Id) ProcessedCount FROM dbo.AR_Sale WHERE Processed = 0
On the SSIS project, create a OLE DB data source that points to the database of your choice. Add the data source to the package's connection manager. In this example, I have used named the data source as Practice.
On the package's Control Flow tab, drag and drop Execute SQL Task from the toolbox.
Configure the General page of the Execute SQL Task as shown below:
Give a proper Name, say Check pre-execution
Change ResultSet to Single row because the query returns a scalar value
Set the Connection to the OLE DB datasource, in this example Practice
Set the SQLSourceType to Variable because we will use the query stored in the variable
Set the SourceVariable to User::SQLFetchCount
Click Result Set page on the left section
Configure the Result Set page of the Execute SQL Task as shown below:
Click Add button to add a new variable which will store the count value returned by the query
Change the Result Name to 0 to indicate the first column value returned by query
Set the Variable Name to User::Processed
Click OK
On the package's Control Flow tab, drag and drop Send Mail Task and Data Flow Task from the toolbox. The Control Flow tab should look something like this:
Right-click on the green arrow that joins the Execute SQL task and Send Mail Task. Click Edit... the Green Arrow is called as Precedence Constraint.
On the Precedence Constraint Editor, perform the following steps:
Set Evaluation operation to Expression
Set the Expression to #[User::Processed] == 0. It means that take this path only when the variable Processed is set to zero.
Click OK
Right-click on the green arrow that joins the Execute SQL task and Data Flow Task. Click Edit... On the Precedence Constraint Editor, perform the following steps:
Set Evaluation operation to Expression
Set the Expression to #[User::Processed] != 0. It means that take this path only when the variable Processed is not set to zero.
Click OK
Control flow tab would look like this. You can configure the Send Mail Task to send email and the Data Flow Task to update the data according to your requirements.
When I execute the package with the data set to based on the populate table script, the package will execute the Data Flow Task because there is one row that is not processed.
When I execute the package after setting Processed flag to 1 on all the rows in the table using the script UPDATE dbo.AR_Sale SET Processed = 1, the package will execute the Send Mail Task.
Your SSIS design should be
Src:
Select count(processed) Cnt from dbo.AR_Sale where processed = 0
Conditional Split stage [under data flow transformations]:
output1: Order 1, Name - EmailCnt, Condition - Cnt = 0
output2: Order 2, Name - ProcessRows, Condition - Cnt > 0
Output Links:
EmailCnt Link: Send email
ProcessRowsLink: DataFlowTask

How to automate the execution of a stored procedure with an SSIS package?

I have a stored procedure that gets executed through SQL SSIS using a Execute SQL Task.
The task has the following:
USE [OPPY_DWUSD]
GO
DECLARE #return_value int
EXEC #return_value = [dbo].[generate_merge_scdbk]
#Schema = N'dim',
#Dimension = N'VARIETY',
#ETLSchema = N'stg',
#ETLTable = N'vw_VARIETY',
#Execute = 1
SELECT 'Return Value' = #return_value
GO
Right now the way I have this setup, I have multiple Execute SQL Tasks with the same code but different values, about 20 Execute SQL Tasks.
Is there a more cleaner way to pull this off?
Here is one way of doing this. The example uses SSIS 2008 R2 with SQL Server 2012 backend.
Create a table to store your parameter values. Let's say the table name is dbo.SProcValues. Based on your stored procedure definition, the table schema would look like this.
CREATE TABLE dbo.SProcValues(
Id int IDENTITY(1,1) NOT NULL,
SProcName nvarchar(40) NOT NULL,
SchemaName nvarchar(20) NOT NULL,
Dimension nvarchar(40) NOT NULL,
ETLSchema nvarchar(20) NOT NULL,
ETLTable nvarchar(40) NOT NULL,
IsExecute bit NOT NULL
)
GO
Let's insert some sample data using the following script.
INSERT INTO dbo.SProcValues
(SProcName, SchemaName, Dimension, ETLSchema, ETLTable, IsExecute) VALUES
('dbo.sp_generate_merge', 'dim1', 'dimension1', 'stg1', 'table1', 1),
('dbo.sp_generate_merge_scdbk', 'dim2', 'dimension2', 'stg2', 'table2', 1),
('dbo.sp_generate_merge_scdbk', 'dim3', 'dimension3', 'stg3', 'table3', 0),
('dbo.sp_generate_merge', 'dim4', 'dimension4', 'stg4', 'table4', 0);
GO
On the SSIS package, assuming that you have the data source and connection manager already established. Create the following variables. Variable SProcValues will hold the parameter set that we stored in the above-mentioned table. Variable SQLInnerQuery will hold the query that will be used later in the inner Execute SQL Task. Other variables relate to each column available in the table so we can loop through each row and hold it in a variable.
Paste the following query in the value of the variable SQLGetParameters
SELECT SProcName, SchemaName, Dimension, ETLSchema, ETLTable, IsExecute FROM dbo.SProcValues
Select the variable SQLInnerQuery and press F4 to view the properties. Set the property EvaluateAsExpression to True and then click the Ellipsis button against the Expression property.
We need to set an expression that will evaluate to the EXEC stored procedure statement that can be later supplied to the inner Execute SQL Task. Set the following expression.
"EXEC " + #[User::SProcName] + " #Schema = ?, #Dimension = ?, #ETLSchema = ?, #ETLTable = ?, #IsExecute = ?"
If you click Evaluate Expression button on the editor, you can see what the expression will evaluate to. You will also notice that there is no stored procedure name in the below screenshot that is because the package variable SProcName currently does not have any value. During runtime, the SProcName will be assigned with the value from the table and this expression will automatically resolve itself.
On the SSIS package, drag and drop an Execute SQL Task. This task will run the following query to fetch the list of parameter values that are stored in the table dbo.SProcValues. Configure the General page on the Execute SQL Task as shown below. The example uses OLEDB connection and the connection manager/data source is named as Practice.
Configure the Result Set page of Execute SQL Task to store the result set from the query to an object variable.
Now that the first Execute SQL Task is configured to get the list of parameter values that should be passed to the stored procedure, you need to loop through the records.
Drag and drop a Foreach Loop container. Connect the Execute SQL Task's precedence container to the Foreach Loop container. Configure the Collection page of the Foreach Loop container as shown below. We are looping through the result set using the ADO enumerator.
Configure the Variable Mappings page on Foreach Loop container as shown below. As we loop through each row, we are storing the column values in respective variables so we can pass it to the next Execute SQL Task to run the stored procedure.
Drag and drop an Execute SQL Task inside the Foreach Loop container so that this task is executed each time we loop through a row in the result set. Configure the Execute SQL Task as shown below.
NOTE
You might want to configure the ResultSet property on this second Execute SQL Task according to your requirements. If you choose ResultSet, then you need to configure an appropriate object variable to accept the result set. I left it as None for this example.
Configure the values to be passed as parameters to the stored procedure.
Finally, the control flow would look something like this.
When the package runs, the loop will execute the stored procedure for as many records are returned by the SELECT query mentioned above, provided that you have all the stored procedures defined in the table rows are available in the database. I had created the stored procedures dbo.sp_generate_merge_scdbk and dbo.sp_generate_merge with the same parameters definition. That's the reason the package executed successfully.
You have the right concept, just need to use some concepts like variables, a foreach loop and parameters on the Execute SQL Task.
Control Flow
Your Control Flow would look something like this
Variables
I have 6 variables defined in SSIS
Dimension | String | VARIETY
ETLSchema | String | stg
ETLTable | String | vw_VARIETY
Execute | Int32 | 1
RecordSet | Object | System.Object
Schema | String | dim
The first Execute SQL Task will be a query or something enumerable like it. Currently, have a hard coded query to produce the supplied query values. Your solution could just be a chain of SELECT's UNIONed together. The goal of this step is to populate the RecordSet variable.
My Execute SQL Task returns a full result set
and I push that into my object thusly
ForEach Loop Container (ADO Recordset)
The ForEach Loop Container is going to consume that enumerable thing we established beforehand. It will go through each row and we will pop the values out of the object and assign them into local variables.
Change the Enumerator to Foreach ADO Enumerator. Select the object we populated with results User::RecordSet and then use an enumeration mode of Rows in first table
In the Variable Mappings tab, we will identify the ordinal based location for the values (column 0 maps to variable X). The only trick here is to ensure your SSIS Variable data types match the data type in the result set from your source query. Do note it's a zero based ordinal system.
At this point, if you click run you see it enumerate through all the rows you have sent into the RecordSet variable. I find it helpful to run it at this point to make sure I have all of my data types aligned.
Inner Execute SQL Task
I have taken your query and replaced the hard coded values with place holder. An OLEDB connection will use ? while an ADO.NET will use named #varname.
In the Parameter Mapping tab, simply map those local variables to the place holders.
Now you have a nice template for running the same proc with varying values.

Resources