I'm fairly new to SSIS and don't know all it's features and what tasks I can use to do things I want. I have found many Google and stackoverflow.com searches to help me get to know variables and parameters and how to set them etc.
BLUF (Bottom Line Up Front)
I have a view with data which I want to export to a file through a job that runs the package.
The data will be filtered by it's LastUpdatedDate field with datatype of DateTimeOffSet(7). The package should allow a user to run it with a specified dat or use a value from another table (SSISJobRun).
Structure
/*Employee Table*/
Id int
Name varchar(255)
LastUpdatedDate datetimeoffset(7)
/*SSISJobRun Table*/
Id int
JobTypeId int
RunDate datetimeoffset(7)
What I have Done
Currently, I'm using SSDT for VS 2015 to create my SSIS packages and deploy them to my SSIS Catalog.
I have a project with 1 package. The package contains:
a Data Flow Task named EmployeeExport; this task contains an OLE DB Source and a Flat File Destination
a package level parameter named Filename_Export (this is so that the file path can be changed when it's run by a user; the parameter has a default value configured within the Job that runs it daily
All this runs perfectly fine.
Problem
I also have set another package level parameter named LastUpdatedDate. The intent is to have who/what-ever runs the package to define a date. However, if the date is null (if I decide to use a string) or if the date is the default value 1899-12-30 00:00:00 (if I decide to use a date), I want to determine what date to use.
Specifically, if there is no real date supplied by the user, then I want to the date to be the latest RunDate. For that case I use the following code:
SELECT TOP 1 LastUpdatedDate
FROM SSISJobRun
WHERE JobTypeId = 1
ORDER BY LastUpdatedDate DESC
I've tried many different ways, and it works when I supply a date, but I couldn't get it to work when the date I gave was blank when I used a string or the default when I used a date.
Here's a few sources I've been looking through to figure out my issue
How to pass SSIS variables in ODBC SQLCommand expression?
How do I pass system variable value to the SQL statement in Execute SQL task?
http://mindmajix.com/ssis/how-to-execute-stored-procedure-in-ssis-execute-sql-task
.. and many more.
Once last note: this date will be used to run two tasks, so if there is a way to keep it global that would be great.
Lastly, I need to package to insert a row specifying when the the task was run into the SSISJobRun table.
Thank you.
Use a Execute SQL Task, paste
SELECT TOP 1 LastUpdatedDate
FROM SSISJobRun
WHERE JobTypeId = 1
ORDER BY LastUpdatedDate DESC
in the statement, and set the result to single row, in the Result page, choose the variable you set, and change the index to 0
And before the same task run the 2nd time (inside any foreach or for loop) within the same execution and does not get used anywhere within the package, this variable will remain the same value.
if you need to check, right click that Execute SQL task, Edit Breakpoints to post execution, then run the package, open watch window from Debug tab, drag and drop the variable into watch window, you should see the value.
Related
Having different set of tables for every SSIS load, I want to implement a smart routine which would launch only the packages which files are present.
I have a task that uploads file listing from a folder to a database table:
[dbo].[FileList]
Product.csv
Sales_2018.csv
Customer.csv
Delivery.csv
If in my SSIS, besides Product, Sales, Customer, Delivery packages I also have Shipping, Returns and others, is it possible to disable those automatically based on FileList match. So only Product, Sales, Customer, Delivery packages would run?
Or should it be approached in a different way?
Thank you!
I've done this in the past using this simple control flow.
Note the 2nd for each only loops through one time. It is just a check to see if the one file exists without a script task.
A few more notes:
1. Store execute sql results into Object Variable
2. Outer foreach is on ADO Object (variable from step 1)
2a. Map the current iteration of the object to local variables
3. Inner foreach is on file based on local variable from step 2
4. Package expression is based on local variable from step 2
I've read thru many related topics here, but don't seem to find a solution. Here's my scenario:
I have multiple identical customer databases
I use ETL to fill special tables within these databases in order to use as a source for PowerBI reports
Instead of copying (and thus maintaining) the ETLs for each customer, I want to pass the DB connection to the Jobs/Transformations dynamically
My plan is to create a text file of DB connections for each Customer:
cust1, HOST_NAME, DATABASE_NAME, USER_NAME, PASSWORD
cust2, HOST_NAME, DATABASE_NAME, USER_NAME, PASSWORD
and so on...
The Host will stay the same always.
The jobs will be started monthly using Pentaho kitchen in a linux box.
So when I run a Job for a specific customer, I want to tell the job to use the DB connection for that specific customer, i.e. Cust2. from the Connection file.
Any help is much appreciated.
Cheers & Thanks,
Heiko
Use parameters!
When you define a connection, you see a small S sign in a blue diamond on the right of the Database Name input box. It means that, instead of spelling the name of the database, you can put in a parameter.
The first time you do it, it's a bit challenging. So follow the procedure step by step, even if you are tempted to go straight to launch a ./kitchen.sh that reads a file containing a row per customer.
1) Parametrize your transformation.
Right-click anywhere, select Properties then Parameters, fill the table:
Row1 : Col1 (Parameter) = HOST_NAME, Col2 (Default value) = the host name for Cust1
Row2 : Col1 = DATABASE_NAME, Col2 = the database name for Cust1
Row3 : Col1 = PORT_NUMBER, Col2 = the database name for Cust1
Row4 : Col1 = USER_NAME, Col2 = the database name for Cust1
Row5 : Col1 = PASSWORD, Col2 = the database name for Cust1
Then go to the Database connection definition (On the left panel, View tab) and in the Setting panel:
Host name: ${HOST_NAME} -- The variable name with a "${" before and a a "$" after
Database name: ${DATABASE_NAME} -- Do not type the name, press Crtl+SPACE
Port Number: ${PORT_NUMBER}
Database name: ${USER_NAME}
Database name: ${PASSWORD}
Test the connection. If valid try a test run.
2. Check the parameters.
When you press the run button, Spoon prompts for some Run option (If you checked the "Don't show me anymore" in the past, use the drop-down just near by the Run menu).
Change the values of the parameters for those of Cust2. And check it runs for the other customer.
Change it on the Value column and the Default value column. You'll understand the difference in a short while, for the moment check it works with both.
3. Check it in command line.
Use pan from the command line.
The syntax should look like :
./pan.sh -file=your_transfo.ktr -param=HOST_NAME:cust3_host -param=DATABASE_NAME:cust3_db....
At this point, you have a small bit of trials and errors, because the syntax between = and : varies sightly with the OS and the PDI version. But you should get by with 4-6 trials.
4. Make a job
Do to the parallel computing paradigm of PDI, you cannot use the Set variable step in a single transformation. You need to make a job with two transformation : the first reads the csv file and define the variables with the Set variable step. The second is the transformation you just developed and tested.
Don't expect to make it run on the first trial. Some versions of the PDI are buggy and requires, for example to clean the default value of the parameters in the transformation. You are helped with the Write to log step which will write a field in the log of the calling job. Of course you will need to first put the parameters/variables in a field with the Get variable step.
In particular, do not start with the full customer list! Set the system up with 2-3 customers before.
Write the full list of customer in your csv, and run.
Make a SELECT COUNT(customer) on your final load. This is important, because you will probably want to load as many customer as possible, so to continue the process even in case of failure. This is the default behavior (on my best memory), so you won't probably notice a failure in the log if there is a large number of customer.
5. Install the job
In principle, it is just a ./kitchen.sh.
However, if you want to automate the load, you will have a hard time for checking that nothing went wrong. So open the transformation an use the System date (fixed) of the Get System Info step and write the result with the customer data. Alternatively you can get this date in the main job and pass it along the other variables.
If you have concerns about creating a new column in the database, store the list of customers loaded by day, in another table, in a file or send it to you by mail. From my experience, it's the only practical way to be able to answer to a user that claims that their biggest customer was not loaded tree weeks ago.
I run a similar scenario daily in my work. What we do is we use Batch files with named parameters for each client, this way we have the same package KJB/KTR's that run for a different client based on these parameters entirely.
What you want to do is set variables on a master job, that are used throughout the entire execution.
As for your question directly, in the connection creation tab you can use those variables in Host and DBname. Personally, we have the same user/pw set on every client DB so we don't have to change those or pass user/pw as variables for each connection, we only send the host name and database with the Named Parameters. We also have a fixed scheduled run that executes the routine for every database, for this we use a "Execute for each input row" type JOB.
I have a SSIS package that has three DataFlowTasks.
1st dataflow load data to destination table1
2nd dataflow load data to destination table2
3rd dataflow load data to destination table3
I configured logging by default to SQL Server table (ssiserrorlog) on error.
but this only has startdate and enddate details but I want to log the details to SQL Server custom error log table like the below.
How to do this process I am new to SSIS.
You can use the Row Count component in each data flow to get the number of rows loaded.
"Duration" is just the DateDiff between Start and End Date. You could even make it a computed column in your log table, if you're not content to just calculate it at query time.
Use RowCount Transformation in each of the dataflow task before loading into the Destination. And then use this value in the SSISLogging Table
As for as duration, since you know the starttime and endtime, use the DateDiff function.
In SSIS, you will have system variables - start and endtime. Use the System Variables to capture Start and End Time
A correct solution for the RowCount and Duration has been suggested, however, there are noted datatype issues between using the system::starttime variable to transfer Package or Event starttime custom logs from SSIS to SQL.
The user will have to create a user variable (ie. user::StartTime) and likely create an expression (depending on what is being used for EndTime) in order to solve this aspect of the problem.
https://sqljunkieshare.com/2011/12/09/ssis-package-logging-custom-logging-2008-r2-and-2012/
My SSIS package has an execute SQL task which has a query that needs a datetime filter at runtime.
The value of this filter is supposed to be the last datetime in which the package ran successfully.
What is the standard/optimal methodology to retrieve, persist and use this lastrun datetime?
For that kind of thing, I have a "config" table in the database to store the value. Then this can be read and updated each time the package runs. You could also use a text file, but that is not as secure.
Edit:
I achieve this by invoking a SQL Task at the end of the Package that calls a stored procedure. This SP accepts a bit parameter indicating success (1) or failure (0). The SP uses GetDate() to record the time that the Proc ran (which is when the Package finishes).
As DeanOC posted, I always have a step in my package that stores this kind of stuff. It can be as simple as a insert select current timestamp... kind of thing. or it may be the max of a timestamp column in the data I'm processing, so that next run I can filter by ...> StoredMaxTimestamp.
I am working with SSIS 2008. I have a select query name sqlquery1 that returns some rows:
aq
dr
tb
This query is not implemented on the SSIS at the moment.
I am calling a stored procedure from an OLE DB Source within a Data Flow Task. I would like to pass the data obtained from the query to the stored procedure parameter.
Example:
I would like to call the stored procedure by passing the first value aq
storedProdecure1 'aq'
then pass the second value dr
storedProdecure1 'dr'
I guess it would be something like a cycle. I need this because the data generated by the OLE DB Source through the stored procedure needs to be sent to another destination and this must be done for each record of the sqlquery1.
I would like to know how to call the query sqlquery1 and pass its output to call another stored procedure.
How do I need to do this in SSIS?
Conceptually, what your solution will look like is an execute your source query to generate your result set. Store that into a variable and then you'll need to do iterate through those results and for each row, you'll want to call your stored procedure with that row's value and send the results into a new Excel file.
I'd envision your package looking something like this
An Execute SQL Task, named "SQL Load Recordset", attached to a Foreach Loop Container, named "FELC Shred Recordset". Nested inside there I have a File System Task, named "FST Copy Template" which is a precedence for a Data Flow Task, named "DFT Generate Output".
Set up
As you're a beginner, I'm going to try and explain in detail. To save yourself some hassle, grab a copy of BIDSHelper. It's a free, open source tool that improves the design experience in BIDS/SSDT.
Variables
Click on the background of your Control Flow. With nothing selected, right-click and select Variables. In the new window that pops up, click the button that creates a New Variable 4 times. The reason for clicking on nothing is that until SQL Server 2012, the default behaviour of variable creation is to create them at the scope of the current object. This has resulted in many lost hairs for new and experienced developers alike. Variable names are case sensitive so be aware of that as well.
Rename Variable to RecordSet. Change the Data type from Int32 to Object
Rename Variable1 to ParameterValue. Change the data type from Int32 to String
Rename Variable2 to TemplateFile. Change the data type from Int32 to String. Set the value to the path of your output Excel File. I used C:\ssisdata\ShredRecordset.xlsx
Rename Variable 4 to OutputFileName. Change the data type from Int32 to String. Here we're going to do something slightly advanced. Click on the variable and hit F4 to bring up the Properties window. Change the value of EvaluateAsExpression to True. In Expression, set it to "C:\\ssisdata\\ShredRecordset." + #[User::ParameterValue] + ".xlsx" (or whatever your file and path are). What this does, is configures a variable to change as the value of ParameterValue changes. This helps ensure we get a unique file name. You're welcome to change naming convention as needed. Note that you need to escape the \ any time you are in an expression.
Connection Managers
I have made the assumption you are using an OLE DB connection manager. Mine is named FOO. If you are using ADO.NET the concepts will be similar but there will be nuances pertaining to parameters and such.
You will also need a second Connection Manager to handle Excel. If SSIS is temperamental about data types, Excel is flat out psychotic-stab-you-in-the-back-with-a-fork-while-you're-sleeping about data types. We're going to wait and let the data flow actually create this Connection Manager to ensure our types are good.
Source Query to Result Set
The SQL Load Recordset is an instance of the Execute SQL Task. Here I have a simple query to mimic your source.
SELECT 'aq' AS parameterValue
UNION ALL SELECT 'dr'
UNION ALL SELECT 'tb'
What's important to note on the General tab is that I have switched my ResultSet from None to Full result set. Doing this makes the Result Set tab go from being greyed out to usable.
You can observe that I have assigned the Variable Name to the variable we created above (User::RecordSet) and I the Result Name is 0. That is important as the default value, NewResultName doesn't work.
FELC Shred Recordset
Grab a Foreach Loop Container and we will use that to "shred" the results that were generated in the preceding step.
Configure the enumerator as a Foreach ADO Enumerator Use User::RecordSet as your ADO object source variable. Select rows in the first table as your Enumeration mode
On the Variable Mappings tab, you will need to select your variable User::ParameterValue and assign it the Index of 0. This will result in the zerotth element in your recordset object being assigned to the variable ParameterValue. It is important that you have data type agreement as SSIS won't do implicit conversions here.
FST Copy Template
This a File System Task. We are going to copy our template Excel File so that we have a well named output file (has the parameter name in it). Configure it as
IsDestinationPathVariable: True
DestinationVarible: User::OutputFileName
OverwriteDestination: True
Operation: Copy File
IsSourcePathVariable: True
SourceVariable: User::TemplateFile
DFT Generate Output
This is a Data Flow Task. I'm assuming you're just dumping results straight to a file so we'll just need an OLE DB Source and an Excel Destination
OLEDB dbo_storedProcedure1
This is where your data is pulled from your source system with the parameter we shredded in the Control Flow. I am going to write my query in here and use the ? to indicate it has a parameter.
Change your Data access mode to "SQL Command" and in the SQL command text that is available, put your query
EXECUTE dbo.storedProcedure1 ?
I click the Parameters... button and fill it out as shown
Parameters: #parameterValue
Variables: User::ParameterValue
Param direction: Input
Connect an Excel Destination to the OLE DB Source. Double click and in the Excel Connection Manager section, click New... Determine if you're needing 2003 or 2007 format (.xls vs .xlsx) and whether you want your file to have header rows. For you File Path, put in the same value you used for your #User::TemplatePath variable and click OK.
We now need to populate the name of the Excel Sheet. Click that New... button and it may bark that there is not sufficient information about mapping data types. Don't worry, that's semi-standard. It will then pop up a table definition something like
CREATE TABLE `Excel Destination` (
`name` NVARCHAR(35),
`number` INT,
`type` NVARCHAR(3),
`low` INT,
`high` INT,
`status` INT
)
The "table" name is going to be the worksheet name, or precisely, the named data set in the worksheet. I made mine Sheet1 and clicked OK. Now that the sheet exists, select it in the drop down. I went with the Sheet1$ as the target sheet name. Not sure if it makes a difference.
Click the Mappings tab and things should auto-map just fine so click OK.
Finally
At this point, if we ran the package it would overwrite the template file every time. The secret is we need to tell that Excel Connection Manager we just made that it needs to not have a hard coded name.
Click once on the Excel Connection Manager in the Connection Managers tab. In the Properties window, find the Expressions section and click the ellipses ... Here we will configure the Property ExcelFilePath and the Expression we will use is
#[User::OutputFileName]
If your icons and such look different, that's to be expected. This was documented using SSIS 2012. Your work flow will be the same in 2005 and 2008/2008R2 just the skin is different.
If you run this package and it doesn't even start and there is an error about the ACE 12 or Jet 4.0 something not available, then you are on a 64bit machine and need to tell BIDS/SSDT that you want to run in 32 bit mode.
Ensure the Run64BitRuntime value is False. This project setting can be found by right clicking on the project, expand the Configuration Properties and it will be an option under Debugging.
Further reading
A different example of shredding a recordset object can be found on How to automate the execution of a stored procedure with an SSIS package?