SSIS : delete rows after an update or insert - sql-server

Here is the following situation:
I have a table of StudentsA which needs to be synchronized with another table, on a different server, StudentsB. It's a one-way sync from A to B.
Since the table StudentsA can hold a large number of rows, we have a table called StudentsSync (on the input server) containing the ID of StudentsA which have been modified since the last copy from StudentsA to StudentsB.
I made the following SSIS Data Flow task:
The only problem is that I need to DELETE the row from StudentsSync after a successful copy or update. Something like this:
Any idea how this can be achieved?

It can be achieved using 3 methods
1.If your target table in OutputDB has TimeStamp columns such as Create and modified TimeStamp then rows which have got updated or inserted can be obtained by writing a simple query. You need to write the below query in the execte sql task in Control Flow to delete those rows in Sync Table .
Delete from SyncTable
where keyColumn in (Select primary_key from target
where ModifiedTimeStamp >= GETDATE() or (ModifiedTimeStamp is null
and CreateTimeStamp>=GETDATE()))
I assume StudentsA's primary key is present in Sync table along with primary key of Target table. The above condition basically checks, if a new row is added then CreateTimeStamp column will have current date and modifiedTimeStamp will be null else if the values are updated then the modifiedTimeStamp will have current date
The above query will work if you have TimeStamp columns in your target table which i feel should be there if your loading data into Data Warehouse
2.You can use MERGE syntax to perform the update and insert in Control Flow with Execute SQL Task.No need to use Data Flow Task .The below query can be used even if you don't have any TimeStamp columns
DECLARE #Output TABLE ( ActionType VARCHAR(20), SourcePrimaryKey INT)
MERGE StudentsB AS TARGET
USING StudentsA AS SOURCE
ON (TARGET.CommonColumn = SOURCE.CommonColumn)
WHEN MATCHED
THEN
UPDATE SET TARGET.column = SOURCE.Column,TARGET.ModifiedTimeStamp=GETDATE()
WHEN NOT MATCHED BY TARGET THEN
INSERT (col1,col2,Col3)
VALUES (SOURCE.col1, SOURCE.col2, SOURCE.Col3)
OUTPUT $action,
INSERTED.PrimaryKey AS SourcePrimaryKey INTO #Output
Delete from SyncTable
where PrimaryKey in (Select SourcePrimaryKey from #Output
where ActionType ='INSERT' or ActionType='UPDATE')
The code is not tested as i'm running out of time .but at-least it should give you a fair idea how to proceed . .For furthur detail on MERGE syntax read this and this
3.Use Multicast Component to duplicate the dataset for Insert and Update .Connect a MULTICAST to lookmatch output and another multicast to Lookup No match output

Add a task after "Update existing entry" and after "Insert new entry" to add the student ID to a variable which will contain the list of IDs to delete.
Enclose all of the tasks in a sequence container.
After the sequence container executes add a task to delete all the records from the sync table that are in that variable you've been populating.

Related

Recording info in SQL Server trigger

I have a table called dsReplicated.matDB and a column fee_earner. When that column is updated, I want to record two pieces of information:
dsReplicated.matDB.mt_code
dsReplicated.matDB.fee_earner
from the row where fee_earner has been updated.
I've got the basic syntax for doing something when the column is updated but need a hand with the above to get this over the line.
ALTER TRIGGER [dsReplicated].[tr_mfeModified]
ON [dsReplicated].[matdb]
AFTER UPDATE
AS
BEGIN
IF (UPDATE(fee_earner))
BEGIN
print 'Matter fee earner changed to '
END
END
The problem with triggers in SQL server is that they are called one per SQL statement - not once per row. So if your UPDATE statement updates 10 rows, your trigger is called once, and the Inserted and Deleted pseudo tables inside the trigger each contain 10 rows of data.
In order to see if fee_earner has changed, I'd recommend using this approach instead of the UPDATE() function:
ALTER TRIGGER [dsReplicated].[tr_mfeModified]
ON [dsReplicated].[matdb]
AFTER UPDATE
AS
BEGIN
-- I'm just *speculating* here what you want to do with that information - adapt as needed!
INSERT INTO dbo.AuditTable (Id, TriggerTimeStamp, Mt_Code, Old_Fee_Earner, New_Fee_Earner)
SELECT
i.PrimaryKey, SYSDATETIME(), i.Mt_Code, d.fee_earner, i.fee_earner
FROM Inserted i
-- use the two pseudo tables to detect if the column "fee_earner" has
-- changed with the UPDATE operation
INNER JOIN Deleted d ON i.PrimaryKey = d.PrimaryKey
AND d.fee_earner <> i.fee_earner
END
The Deleted pseudo table contains the values before the UPDATE - so that's why I take the d.fee_earner as the value for the Old_Fee_Earner column in the audit table.
The Inserted pseudo table contains the values after the UPDATE - so that's why I take the other values from that Inserted pseudo-table to insert into the audit table.
Note that you really must have an unchangeable primary key in that table in order for this trigger to work. This is a recommended best practice for any data table in SQL Server anyway.

SSIS data flow - copy new data or update existing

I queried some data from table A(Source) based on certain condition and insert into temp table(Destination) before upsert into Crm.
If data already exist in Crm I dont want to query the data from table A and insert into temp table(I want this table to be empty) unless there is an update in that data or new data was created. So basically I want to query only new data or if there any modified data from table A which already existed in Crm. At the moment my data flow is like this.
clear temp table - delete sql statement
Query from source table A and insert into temp table.
From temp table insert into CRM using script component.
In source table A I have audit columns: createdOn and modifiedOn.
I found one way to do this. SSIS DataFlow - copy only changed and new records but no really clear on how to do so.
What is the best and simple way to achieve this.
The link you posted is basically saying to stage everything and use a MERGE to update your table (essentially an UPDATE/INSERT).
The only way I can really think of to make your process quicker (to a significant degree) by partially selecting from table A would be to add a "last updated" timestamp to table A and enforcing that it will always be up to date.
One way to do this is with a trigger; see here for an example.
You could then select based on that timestamp, perhaps keeping a record of the last timestamp used each time you run the SSIS package, and then adding a margin of safety to that.
Edit: I just saw that you already have a modifiedOn column, so you could use that as described above.
Examples:
There are a few different ways you could do it:
ONE
Include the modifiedOn column on in your final destination table.
You can then build a dynamic query for your data flow source in a SSIS string variable, something like:
"SELECT * FROM [table A] WHERE modifiedOn >= DATEADD(DAY, -1, '" + #[User::MaxModifiedOnDate] + "')"
#[User::MaxModifiedOnDate] (string variable) would come from an Execute SQL Task, where you would write the result of the following query to it:
SELECT FORMAT(CAST(MAX(modifiedOn) AS date), 'yyyy-MM-dd') MaxModifiedOnDate FROM DestinationTable
The DATEADD part, as well as the CAST to a certain degree, represent your margin of safety.
TWO
If this isn't an option, you could keep a data load history table that would tell you when you need to load from, e.g.:
CREATE TABLE DataLoadHistory
(
DataLoadID int PRIMARY KEY IDENTITY
, DataLoadStart datetime NOT NULL
, DataLoadEnd datetime
, Success bit NOT NULL
)
You would begin each data load with this (Execute SQL Task):
CREATE PROCEDURE BeginDataLoad
#DataLoadID int OUTPUT
AS
INSERT INTO DataLoadHistory
(
DataLoadStart
, Success
)
VALUES
(
GETDATE()
, 0
)
SELECT #DataLoadID = SCOPE_IDENTITY()
You would store the returned DataLoadID in a SSIS integer variable, and use it when the data load is complete as follows:
CREATE PROCEDURE DataLoadComplete
#DataLoadID int
AS
UPDATE DataLoadHistory
SET
DataLoadEnd = GETDATE()
, Success = 1
WHERE DataLoadID = #DataLoadID
When it comes to building your query for table A, you would do it the same way as before (with the dynamically generated SQL query), except MaxModifiedOnDate would come from the following query:
SELECT FORMAT(CAST(MAX(DataLoadStart) AS date), 'yyyy-MM-dd') MaxModifiedOnDate FROM DataLoadHistory WHERE Success = 1
So the DataLoadHistory table, rather than your destination table.
Note that this would fail on the first run, as there'd be no successful entries on the history table, so you'd need you insert a dummy record, or find some other way around it.
THREE
I've seen it done a lot where, say your data load is running every day, you would just stage the last 7 days, or something like that, some margin of safety that you're pretty sure will never be passed (because the process is being monitored for failures).
It's not my preferred option, but it is simple, and can work if you're confident in how well the process is being monitored.

SQL Update master table with new table data hourly based on no match on Composite PK

Using SQL Server 2008
I have an SSIS task that downloads a CSV file from FTP and renames the file every hour. After that I'm doing a bulk insert of the data into a new table called NEWFTPDATA.
The data in this file is for the current day up to the current hour. The table has a composite primary key consisting of 4 different columns.
The next step I need to complete is, using T-SQL, compare this new table to my existing master archive table and insert any rows that do not already exist based on matching (or rather not-matching on those 4 columns)
Since I'll be downloading this file hourly (for real-time reporting) for each subsequent run there will be duplicate data which I will not want to insert into the master table to avoid duplicating data.
I've found ways to do this based off of the existence of one particular column, but I can't seem to figure out how to do it based off of 4 columns needing to match.
The workflow should be as follows
Update MASTERTABLE from NEWFTPDATA where newftpdata.column1, newftpdata.column2, newftpdata.column3, newftpdata.column4 do not exist in MASTERTABLE
Hopefully I've supplied substantial information for this question. If any further details are required please let me know. Thank you.
you can use MERGE
MERGE MasterTable as dest
using newftpdata as src
on
dest.column1 = src.column1
and
dest.column2 = src.column2
and
dest.column3 = src.column3
and
dest.column4 = src.column4
WHEN NOT MATCHED then
INSERT (column1, column2, ...)
values ( Src.column1, Src.column2,....)

In SQL Server, how to get last inserted records ID's when batch update is done?

In SQL Server, i am inserting multiple records into table using batch update. How do i get back the ID's (unique primary key) which is being created after batch update?
If I insert one record, I can get the last inserted using IDENT(tableName). I am not sure how to get if I do batch update. Please help.
For example, I have student table, with ROLE NO and NAME. ROLE NO is auto incremented by 1, as soon I insert the names into DB using java program. I will add 3 rows at a time using batch update from my java code. In DB, it gets added with ROLE NO 2, 3 and 4. How do I get these newly generated ID in my java program, please help
I tried getting ids using getgeneratedkeys method after I do executebatch. I get exception. Is batch update + get generated keys supported.?
In SQL Server when you do an insert there is an extra option your query; OUTPUT. This will let you capture back the data you inserted into the table - including your id's. You have to insert them into a temporary table; so something like this (with your table/ column names will get you there.
declare #MyNewRoles Table (Name, RoleNo)
insert into tblMyTable
(Name)
Select
Name
Output
inserted.Name, Inserted.RoleNo
into #MyNewRoles
From tblMyTableOfNames
select * from #MyNewRoles
If you don't mind adding a field to your table, you could generate a unique ID for each batch transaction (for example, a random UUID), and store that in the table as well. Then, to find the IDs associated with a given transaction you would just need something like
select my_id from my_table where batch_id = ?

Merge query using two tables in SQL server 2012

I am very new to SQL and SQL server, would appreciate any help with the following problem.
I am trying to update a share price table with new prices.
The table has three columns: share code, date, price.
The share code + date = PK
As you can imagine, if you have thousands of share codes and 10 years' data for each, the table can get very big. So I have created a separate table called a share ID table, and use a share ID instead in the first table (I was reliably informed this would speed up the query, as searching by integer is faster than string).
So, to summarise, I have two tables as follows:
Table 1 = Share_code_ID (int), Date, Price
Table 2 = Share_code_ID (int), Share_name (string)
So let's say I want to update the table/s with today's price for share ZZZ. I need to:
Look for the Share_code_ID corresponding to 'ZZZ' in table 2
If it is found, update table 1 with the new price for that date, using the Share_code_ID I just found
If the Share_code_ID is not found, update both tables
Let's ignore for now how the Share_code_ID is generated for a new code, I'll worry about that later.
I'm trying to use a merge query loosely based on the following structure, but have no idea what I am doing:
MERGE INTO [Table 1]
USING (VALUES (1,23-May-2013,1000)) AS SOURCE (Share_code_ID,Date,Price)
{ SEEMS LIKE THERE SHOULD BE AN INNER JOIN HERE OR SOMETHING }
ON Table 2 = 'ZZZ'
WHEN MATCHED THEN UPDATE SET Table 1.Price = 1000
WHEN NOT MATCHED THEN INSERT { TO BOTH TABLES }
Any help would be appreciated.
http://msdn.microsoft.com/library/bb510625(v=sql.100).aspx
You use Table1 for target table and Table2 for source table
You want to do action, when given ID is not found in Table2 - in the source table
In the documentation, that you had read already, that corresponds to the clause
WHEN NOT MATCHED BY SOURCE ... THEN <merge_matched>
and the latter corresponds to
<merge_matched>::=
{ UPDATE SET <set_clause> | DELETE }
Ergo, you cannot insert into source-table there.
You could use triggers for auto-insertion, when you insert something in Table1, but that will not be able to insert proper Shared_Name - trigger just won't know it.
So you have two options i guess.
1) make T-SQL code block - look for Stored Procedures. I think there also is a construct to execute anonymous code block in MS SQ, like EXECUTE BLOCK command in Firebird SQL Server, but i don't know it for sure.
2) create updatable SQL VIEW, joining Table1 and Table2 to show last most current date, so that when you insert a row in this view the view's on-insert trigger would actually insert rows to both tables. And when you would update the data in the view, the on-update trigger would modify the data.

Resources