Custom Logging to SQL Server table in SSIS - sql-server

I have a SSIS package that has three DataFlowTasks.
1st dataflow load data to destination table1
2nd dataflow load data to destination table2
3rd dataflow load data to destination table3
I configured logging by default to SQL Server table (ssiserrorlog) on error.
but this only has startdate and enddate details but I want to log the details to SQL Server custom error log table like the below.
How to do this process I am new to SSIS.

You can use the Row Count component in each data flow to get the number of rows loaded.
"Duration" is just the DateDiff between Start and End Date. You could even make it a computed column in your log table, if you're not content to just calculate it at query time.

Use RowCount Transformation in each of the dataflow task before loading into the Destination. And then use this value in the SSISLogging Table
As for as duration, since you know the starttime and endtime, use the DateDiff function.

In SSIS, you will have system variables - start and endtime. Use the System Variables to capture Start and End Time

A correct solution for the RowCount and Duration has been suggested, however, there are noted datatype issues between using the system::starttime variable to transfer Package or Event starttime custom logs from SSIS to SQL.
The user will have to create a user variable (ie. user::StartTime) and likely create an expression (depending on what is being used for EndTime) in order to solve this aspect of the problem.
https://sqljunkieshare.com/2011/12/09/ssis-package-logging-custom-logging-2008-r2-and-2012/

Related

SSIS User Defined Date or Default

I'm fairly new to SSIS and don't know all it's features and what tasks I can use to do things I want. I have found many Google and stackoverflow.com searches to help me get to know variables and parameters and how to set them etc.
BLUF (Bottom Line Up Front)
I have a view with data which I want to export to a file through a job that runs the package.
The data will be filtered by it's LastUpdatedDate field with datatype of DateTimeOffSet(7). The package should allow a user to run it with a specified dat or use a value from another table (SSISJobRun).
Structure
/*Employee Table*/
Id int
Name varchar(255)
LastUpdatedDate datetimeoffset(7)
/*SSISJobRun Table*/
Id int
JobTypeId int
RunDate datetimeoffset(7)
What I have Done
Currently, I'm using SSDT for VS 2015 to create my SSIS packages and deploy them to my SSIS Catalog.
I have a project with 1 package. The package contains:
a Data Flow Task named EmployeeExport; this task contains an OLE DB Source and a Flat File Destination
a package level parameter named Filename_Export (this is so that the file path can be changed when it's run by a user; the parameter has a default value configured within the Job that runs it daily
All this runs perfectly fine.
Problem
I also have set another package level parameter named LastUpdatedDate. The intent is to have who/what-ever runs the package to define a date. However, if the date is null (if I decide to use a string) or if the date is the default value 1899-12-30 00:00:00 (if I decide to use a date), I want to determine what date to use.
Specifically, if there is no real date supplied by the user, then I want to the date to be the latest RunDate. For that case I use the following code:
SELECT TOP 1 LastUpdatedDate
FROM SSISJobRun
WHERE JobTypeId = 1
ORDER BY LastUpdatedDate DESC
I've tried many different ways, and it works when I supply a date, but I couldn't get it to work when the date I gave was blank when I used a string or the default when I used a date.
Here's a few sources I've been looking through to figure out my issue
How to pass SSIS variables in ODBC SQLCommand expression?
How do I pass system variable value to the SQL statement in Execute SQL task?
http://mindmajix.com/ssis/how-to-execute-stored-procedure-in-ssis-execute-sql-task
.. and many more.
Once last note: this date will be used to run two tasks, so if there is a way to keep it global that would be great.
Lastly, I need to package to insert a row specifying when the the task was run into the SSISJobRun table.
Thank you.
Use a Execute SQL Task, paste
SELECT TOP 1 LastUpdatedDate
FROM SSISJobRun
WHERE JobTypeId = 1
ORDER BY LastUpdatedDate DESC
in the statement, and set the result to single row, in the Result page, choose the variable you set, and change the index to 0
And before the same task run the 2nd time (inside any foreach or for loop) within the same execution and does not get used anywhere within the package, this variable will remain the same value.
if you need to check, right click that Execute SQL task, Edit Breakpoints to post execution, then run the package, open watch window from Debug tab, drag and drop the variable into watch window, you should see the value.

How to run SSIS packages dynamically?

We have a large production MSSQL database (mdf appx. 400gb) and i have a test database. All the tables,indexes,views etc. are same eachother. I need to make sure that tha datas in the tables of this two database consistent. so i need to insert all the new rows and update all the updated rows into test db from production every night.
I came up with idea of using SSIS packages to make the data consistent by checking updated rows and new rows in all the tables. My SSIS Flow is ;
I have packages in SSIS for each tables seperately because;
Orderly;
Im getting the timestamp value in the table in order to get last 1 day rows instead of getting whole table.
I get the rows of the table in the production
Then im using 'Lookup' tool to compare this data with the test database table data.
Then im using conditional sprit to get a clue whether the data is new or updated.
If the data is new, i insert this data to the destination
5_2. If the data is updated, then i update the data in the destination table.
Data flow is in the MTRule and STBranch package in the picture
The problem is, im repeating creating all this single flow for each table and i have more than 300 table like this. It takes hours and hours :(
What im asking is;
Is there any way in SSIS to do this dynamically ?
PS: Every single table has its own columns and PK values but my data flow schema is always same. . (Below)
You can look into BiMLScript, which lets you create packages dynamically based on metadata.
I believe the best way to achieve this is to use Expressions. They empower you to dynamically set the source and Destination.
One possible solution might be as follows:
create a table which stores all your table names and PK columns
define a package which Loops through this table and which parses a SQL Statement
Call your main package and pass the stmt to it
Use the stmt as Data Source for your Data Flow
if applicable, pass the Destination Table as Parameter as well (another column in your config table)
This is how I processed several really huge tables: the data had to be fetched from 20 tables and moved to one single table.
You are better off writing a stored procedure that takes the tablename as parameter and doing your CRUD there.
Then call the stored procedure in a FOR EACH component in SSIS.
Why do you need to use SSIS?
You are better off writing a stored procedure that takes the tablename as parameter and doing your CRUD there. Then call the stored procedure in a FOR EACH component in SSIS.
In fact you might be able to do everything using a Stored Procedure and scheduling it in a SQL Agent Job.

How can I minimize validation intervals when changing the SQL in ADO NET Source Tasks

Part of an SSIS package is the data import from an external database via a SQL command embedded into an ADO.NET Source Data Flow Source. Whenever I make even the slightest adjustment to the query (such as changing a column name) it takes ages (in that case 1-2 hours) until the program has finished validation. The query itself returns around 30,000 rows with 20 columns each.
Is there any way to cut these long intervals or is this something I have to live with?
I usually store the source queries in a table and the first part of my package would execute a select and store the query returned from the table in a package variable, which would then be used by the ADO.NET Source Data Flow. So In my package for the default value of the variable I usually have the query that is stored in the database along with a "where 1=2" at the end. Hence during design time it does execute the query but just returns the column metadata. Let me know if you have any questions.

SSIS Package last run date in a variable

My SSIS package has an execute SQL task which has a query that needs a datetime filter at runtime.
The value of this filter is supposed to be the last datetime in which the package ran successfully.
What is the standard/optimal methodology to retrieve, persist and use this lastrun datetime?
For that kind of thing, I have a "config" table in the database to store the value. Then this can be read and updated each time the package runs. You could also use a text file, but that is not as secure.
Edit:
I achieve this by invoking a SQL Task at the end of the Package that calls a stored procedure. This SP accepts a bit parameter indicating success (1) or failure (0). The SP uses GetDate() to record the time that the Proc ran (which is when the Package finishes).
As DeanOC posted, I always have a step in my package that stores this kind of stuff. It can be as simple as a insert select current timestamp... kind of thing. or it may be the max of a timestamp column in the data I'm processing, so that next run I can filter by ...> StoredMaxTimestamp.

How do you get an SSIS package to only insert new records when copying data between servers

I am copying some user data from one SqlServer to another. Call them Alpha and Beta. The SSIS package runs on Beta and it gets the rows on Alpha that meet a certain condition. The package then adds the rows to Beta's table. Pretty simple and that works great.
The problem is that I only want to add new rows into Beta. Normally I would just do something simple like....
INSERT INTO BetaPeople
SELECT * From AlphaPeople
where ID NOT IN (SELECT ID FROM BetaPeople)
But this doesn't work in an SSIS package. At least I don't know how and that is the point of this question. How would one go about doing this across servers?
Your example seems simple, looks like you are adding only new people, not looking for changed data in existing records. In this case, store the last ID in the DB.
CREATE TABLE dbo.LAST (RW int, LastID Int)
go
INSERT INTO dbo.LAST (RW, LastID) VALUES (1,0)
Now you can use this to insert the last ID of the row transferred.
UPDATE dbo.LAST SET LastID = #myLastID WHERE RW = 1
When selecting OLEDB source, set data access mode to SQL Command and use
DECLARE #Last int
SET #Last = (SELECT LastID FROM dbo.LAST WHERE RW = 1)
SELECT * FROM AlphaPeople WHERE ID > #Last;
Note, I do assume that you are using ID int IDENTITY for your PK.
If you have to monitor for data changes of existing records, then have the "last changed" column in every table, and store time of the last transfer.
A different technique would involve setting-up a linked server on Beta to Alpha and running your example without using SSIS. I would expect this to be way slower and more resource intensive than the SSIS solution.
INSERT INTO dbo.BetaPeople
SELECT * FROM [Alpha].[myDB].[dbo].[AlphaPeople]
WHERE ID NOT IN (SELECT ID FROM dbo.BetaPeople)
Add a lookup between your source and destination.
Right click the lookup box to open Lookup Transformation Editor.
Choose [Redirect rows to no match output].
Open columns, map your key columns.
Add an entry with the table key in lookup column , lookup operation as
Connect lookup box to destination, choose [Lookup no Match Output]
Simplest method I have used is as follows:
Query Alpha in a Source task in a Dataflow and bring in records to the data flow.
Perform any needed Transformations.
Before writing to the Destination (Beta) perform a lookup matching the ID column from Alpha to those in Beta. On the first page of the Lookup Transformation editor, make sure you select "Redirect rows to no match output" from the dropdown list "Specify how to handle rows with now matching error"
Link the Lookup task to the Destination. This will give you a prompt where you can specify that it is the unmatched rows that you want to insert.
This is the classical Delta detection issue. The best solution is to use Change Data Capture with/without SSIS. If what you are looking for is a once in a life time activity, no need to go for SSIS. Use other means such as linked server and compare with existing records.
The following should solve issue of loading Changed and New records using SSIS:
Extract Data from Source usint Data flow.
Extract Data from Target.
Match on Primary key Add Unmatch records and split matched and unmatched records from Source and Matched records from Target call them Matched_Source,
Unmatch_Source and Matched_Target.
Compare Matched_Source and Matched_Target and Split Matched_Source to Changed and Unchanged.
Null load TempChanged Table.
Add Changed Records to TempChanged.
Execute SQL script/stored proc to Delete Records from Target for primary key in TempChanged and add records in TempChanged to Target.
Add Unmatched_Source to Target.
Another solution would be to use a temporary table.
In the properties for Beta's connection manager, change RetainSameConnection to true (by default SSIS runs each query in it's own connection, this would mean the temporary table would be killed as soon as it has been created).
Create a SQL Task using Beta's connection and use the following SQL to create your temporary table:
SELECT TOP 0 *
INTO ##beta_temp
FROM Beta
Next create a data flow that pulls data from Alpha and loads into ##beta_temp (you will need to run the SQL statement above on SSMS first so that Visual Studio can see the table at design time and you will also need to set the DelayValidation property to true on the Data Flow task).
Now you have two tables on the same server and you can just use your example SQL modified to use the temporary table.
INSERT INTO Beta
SELECT * FROM ##beta_temp
WHERE ID NOT IN (SELECT ID FROM Beta)

Resources