Incremental load in T-SQL with recorded history

Incremental load in T-SQL with recorded history - sql-server

Please help me, I need do a incremental process to my dimensions, to store history data too by T-SQL. I am trying use the MERGE statement, but it doesn't work, because this process deletes data that exists in the target but not in the source table.
Does someone have a suggestion ?
For exemple I have the source table: The source table is my STAGE,
Cod Descript State
AAA Desc1 MI
BBB Desc 2 TX
CCC Desc 3 MA
In the first load my dimension will be equal STAGE
However I can change the value in source table for exemple
AAA CHANGEDESCRIPTION Mi
So, I need update my dimension like this:
Cod Descript State
AAA Desc1 Mi before
AAA CHANGEDESCRIPTION MI actual
BBB Desc 2 TX actual
CCC Desc 3 MA actual
This is my DW and I need the information actual and all history

Try this. Column Aging is always "0" for current record and indicates change generation:
SELECT * INTO tbl_Target FROM (VALUES
('AAA','Desc1','MI',0),('BBB','Desc 2','TX',0),('CCC','Desc 3','MA',0)) as X(Cod, Descript, State, Aging);
GO
SELECT * INTO tbl_Staging FROM (VALUES ('AAA','Desc4','MI')) as X(Cod, Descript, State);
GO
UPDATE t SET Aging += 1
FROM tbl_Target as t
INNER JOIN tbl_Staging as s on t.Cod = s.Cod;
GO
INSERT INTO tbl_Target(Cod, Descript, State, Aging)
SELECT Cod, Descript, State, 0
FROM tbl_Staging;
GO
SELECT * FROM tbl_Target;
Please note that if you have records in staging table, which are "unchanged", you'll get false changes. If so, you have to filter them out in both queries.

I just commented the clause DELETE...tell me what do you think please
MERGE DimTarget AS [Target] --— begin merge statements (merge statements end with a semi-colon)
USING TableSource AS [Source]
ON [Target].ID = [Source].ID AND [Target].[IsCurrentRow] = 1
WHEN MATCHED AND --— record exists but values are different
(
[Target].Dscript <> [Source].Descript
)
THEN UPDATE SET --— update records (Type 1 means record values are overwritten)
[Target].[IsCurrentRow] = 0
-- , [Target].[ValidTo] = GETDATE()
WHEN NOT MATCHED BY TARGET --— record does not exist
THEN INSERT --— insert record
(
Descritp
, [IsCurrentRow]
)
VALUES
(
Descript
, 1
)
--WHEN NOT MATCHED BY SOURCE --— record exists in target but not source
--THEN DELETE -- delete from target
OUTPUT $action AS Action, [Source].* --— output results

Related

How to perform dynamic update through MERGE statement in Snowflake?

I am performing update from JSON data using MERGE statement. The data contains primary key and the column that was updated from the source system. Since the data contains just the updated column along with primary key, update performed through MERGE is automatically updating other column too to null value. Is there any way through which we could build the update statement dynamically for every row and execute it through MERGE?
create or replace table source_data as
select parse_json(COLUMN1)::variant datacol
from values
('{
"metadata":{"OperationName":"UPDATE"},
"data":{"id":"1234","status":"Active"}
}'),
('{
"metadata":{"OperationName":"UPDATE"},
"data":{"id":"1235","name":"Johny"}
}');
create or replace table employee_destination as
select column1::text as id,
column2::text as name,
column3::text as status
from values
('1234','John','Inactive'),
('1235','Jack','Active');
MERGE into employee_destination as Target using (select datacol:data:id as id,datacol:data:status as status,datacol:data:name name,datacol:metadata:OperationName as operation_name from SOURCE_DATA)
AS Source
ON Target.id = Source.id
when matched AND Source.operation_name = 'UPDATE'
THEN
update set Target.id = Source.id, Target.name = Source.name, Target.status = Source.status;
Current output:
1234 null Active
1235 Johny null
Expected output:
1234 John Active
1235 Johny Inactive
thanks.

Try Below
AS Source
ON Target.id = Source.id
when matched AND Source.operation_name = 'UPDATE'
THEN
update set
--Target.id = Source.id, Why update your PK_ID
Target.name = COALESCE(Source.name,Target.name),
Target.status = COALESCE(Source.status,Target.status);

Snowflake does not implement the full SQL MERGE statement?

I am trying to create a Snowflake task that executes a MERGE statement.
However, it seems that Snowflake does not recognize the “when not matched by target” or “when not matched by source” statements.
create or replace task MERGE_TEAM_TOUCHPOINT
warehouse = COMPUTE_WH
schedule = '1 minute'
when system$stream_has_data('TEAMTOUCHPOINT_CDC')
as
merge into dv.Team_Touchpoint as f using TeamTouchpoint_CDC as s
on s.uniqueid = f.uniqueid
when matched then
update set TEAMUNIQUEID = s.TEAMUNIQUEID,
TOUCHPOINTUNIQUEID = s.TOUCHPOINTUNIQUEID
when not matched by target then
insert (
ID,
UniqueID,
TEAMUNIQUEID,
TOUCHPOINTUNIQUEID
)
values (
s.ID,
s.UniqueID,
s.TEAMUNIQUEID,
s.TOUCHPOINTUNIQUEID
)
when not matched by source then delete;
How can I do this? Is there really no other way than creating a stored procedure in javascript to first truncate the table and then insert everything from the staging table?

A workaround suggested by a teammate:
Define MATCHED_BY_SOURCE based on a full join, then look if a.col or b.col are null:
merge into TARGET t
using (
select <COLUMN_LIST>,
iff(a.COL is null, 'NOT_MATCHED_BY_SOURCE', 'MATCHED_BY_SOURCE') SOURCE_MATCH,
iff(b.COL is null, 'NOT_MATCHED_BY_TARGET', 'MATCHED_BY_TARGET') TARGET_MATCH
from SOURCE a
full join TARGET b
on a.COL = b.COL
) s
on s.COL = t.COL
when matched and s.SOURCE_MATCH = 'NOT_MATCHED_BY_SOURCE' then
<DO_SOMETHING>
when matched and s.TARGET_MATCH = 'NOT_MATCHED_BY_TARGET' then
<DO_SOMETHING_ELSE>
;
(same as in https://stackoverflow.com/a/69095225/132438)

Neither 'by target' nor 'by source' are valid keywords within the MERGE command of Snowflake and the Matching is meant to be 'by target' only (https://docs.snowflake.com/en/sql-reference/sql/merge.html). To achieve your goal you need to run the DELETE separately from the MERGE - in which you will be able to run the UPDATE (when MATCHED) and the INSERT (when NOT MATCHED "by target"), as in fact the DELETE can be handled by the MERGE only WHEN MATCHED "by target".
You could handle the two steps (1.DELETE; 2.MERGE-UPDATE&INSERT) within a single explicit transaction in a Stored Procedure, or two different transactions via two separate Tasks, one of which being an AFTER Task.
Alternatively, you can run an INSERT with the optional parameter OVERWRITE which will run a TRUNCATE of the target table and a subsequent loading from the source table, all in a single transaction:
https://docs.snowflake.com/en/sql-reference/sql/insert.html#optional-parameters
Here is a reproducible example of the DELETE + MERGE(UPDATE&INSERT) approach:
USE DEV;
CREATE OR REPLACE TEMPORARY TABLE Public.My_Merge_Target (
Id INTEGER, Name VARCHAR
)
AS
SELECT column1, column2
FROM (VALUES (1, 'Stay as is'), (2, 'This name has to change'), (3, 'This needs to go'));
CREATE OR REPLACE TEMPORARY TABLE Public.My_Merge_Source (
Id INTEGER, Name VARCHAR
)
AS
SELECT column1, column2
FROM (VALUES (1, 'Stay as is'), (2, 'This is the new name for id=2'), (4, 'A new row'));
SELECT * FROM Public.My_Merge_Target ORDER BY Id;
/*
------------------------------------
Id | Name
------------------------------------
1 | Stay as is
2 | This name has to change
3 | This needs to go
*/
SELECT * FROM Public.My_Merge_Source ORDER BY Id;
/*
------------------------------------
Id | Name
------------------------------------
1 | Stay as is
2 | This is the new name for id=2
4 | A new row
*/
DELETE FROM Public.My_Merge_Target AS trg
USING (
SELECT t.Id FROM Public.My_Merge_Source AS s
RIGHT JOIN Public.My_Merge_Target AS t
ON s.Id = t.Id
WHERE s.Id IS NULL
) AS src
WHERE trg.Id = src.Id;
/*
-----------------------
number of rows deleted
-----------------------
1
-----------------------
*/
SELECT * FROM Public.My_Merge_Target ORDER BY Id;
/*
------------------------------------
Id | Name
------------------------------------
1 | Stay as is
2 | This is the new name
*/
MERGE
INTO Public.My_Merge_Target AS trg
USING (
SELECT Id, Name
FROM Public.My_Merge_Source
) AS src
ON
trg.Id = src.Id
WHEN
MATCHED
AND (src.Name != trg.Name) THEN UPDATE
SET Name = src.Name
WHEN
NOT MATCHED THEN INSERT (Id, Name)
VALUES (src.Id, src.Name)
;
/*
-------------------------------------------------
number of rows inserted | number of rows updated
-------------------------------------------------
1 | 1
-------------------------------------------------
*/
SELECT * FROM Public.My_Merge_Target ORDER BY Id;
/*
------------------------------------
Id | Name
------------------------------------
1 | Stay as is
2 | This is the new name for id=2
4 | A new row
*/

SQL Merge and output in the same table

I'm merging 2 tables and I want that if the cell is update the field would be marked as "updated" my code:
MERGE [ITWORKS].[dbo].[Testine2] te
USING [ITWORKS].[dbo].[Testinus] bo
ON te.itemid = bo.itemid
AND te.itemname <> bo.itemname
WHEN MATCHED THEN
UPDATE
SET te.itemname = bo.itemname
OUTPUT
$action
into [ITWORKS].[dbo].[Testine2] (busena);
SELECT * FROM [ITWORKS].[dbo].[Testine2];
Result I get:
Itemid Itemname Busena
100001 TEST NULL
NULL Null UPADTE
The result I want:
Itemid Itemname Busena
100001 TEST UPDATE

I want that if the cell is update the field would be marked as
"updated"
There is no reason to use output. Just set the column value in the update.
MERGE [ITWORKS].[dbo].[Testine2] te
USING [ITWORKS].[dbo].[Testinus] bo
ON te.itemid = bo.itemid
AND te.itemname <> bo.itemname
WHEN MATCHED THEN
UPDATE
SET te.itemname = bo.itemname,
te.Busena = 'UPDATE';
SELECT * FROM [ITWORKS].[dbo].[Testine2];

TSQL - MERGE statement with composite key

I have table OrderLines(OrderID int, LineIndex int, ) and table valued parameter of the same structure defining new order lines for one order.
So if I had the following OrderLines
1000 1 bread
1000 2 milk
1001 1 oil
1001 2 yogurt
1002 1 beef
1002 2 pork
and the following TVP
1001 1 yogurt
I want to get the following OrderLines
1000 1 bread
1000 2 milk
1001 1 yogurt
1002 1 beef
1002 2 pork
I.e. touch rows only for one Order.
So I wrote my query like this
MERGE
[OrderLines] AS [Target]
USING
(
SELECT
[OrderID], [LineIndex], [Data]
FROM
#OrderLines
)
AS [Source] ([OrderID], [LineIndex], [Data])
ON ([Target].[OrderID] = [Source].[OrderID]) AND ([Target].[LineIndex] = [Source].[LineIndex])
WHEN MATCHED THEN
UPDATE
SET
[Target].[Data] = [Source].[Data]
WHEN NOT MATCHED BY TARGET THEN
INSERT
([OrderID], [LineIndex], [Data])
VALUES
([Source].[OrderID], [Source].[LineIndex], [Source].[Data])
WHEN NOT MATCHED BY SOURCE THEN
DELETE;
and it deletes all other (not mentioned) OrderLines for other Orders.
I tried
WHEN NOT MATCHED BY SOURCE AND ([Target].[OrderID] = [Source].[OrderID]) THEN
but got a syntactic error.
How should I rewrite my query?

Just use the relevant subset of OrderLines as the target:
WITH AffectedOrderLines AS (
SELECT *
FROM OrderLines
WHERE OrderID IN (SELECT OrderID FROM #OrderLines)
)
MERGE
AffectedOrderLines AS [Target]
USING
(
SELECT
[OrderID], [LineIndex], [Data]
FROM
#OrderLines
)
AS [Source] ([OrderID], [LineIndex], [Data])
ON ([Target].[OrderID] = [Source].[OrderID]) AND ([Target].[LineIndex] = [Source].[LineIndex])
WHEN MATCHED THEN
UPDATE
SET
[Target].[Data] = [Source].[Data]
WHEN NOT MATCHED BY TARGET THEN
INSERT
([OrderID], [LineIndex], [Data])
VALUES
([Source].[OrderID], [Source].[LineIndex], [Source].[Data])
WHEN NOT MATCHED BY SOURCE THEN
DELETE;
And here's a SQL Fiddle to test.

For starters, only columns from the target table can be used in the WHEN NOT MATCHED BY SOURCE additional merge condition (it's on MSDN).
And I think it's normal that you lose all extra entries from the target table, because they don't match anything in the source.
You should rewrite your query by first deleting the WHEN NOT MATCHED BY SOURCE clause and then deleting separately extra/unneeded rows.
Then, you need to get all entries that are updated or inserted in the target table by adding:
DECLARE #OutputTable table( OrderId INT, OrderLine INT);
...Your entire MERGE
WHEN NOT MATCHED BY TARGET THEN
INSERT
([OrderID], [LineIndex], [Data])
VALUES
([Source].[OrderID], [Source].[LineIndex], [Source].[Data])
OUTPUT INSERTED.OrderId, INSERTED.LineIndex INTO #OutputTable
Now in #OutputTable you have all keys that were either updated or entered in the target table (notice the OUTPUT clause).
You just need now to see which rows from the target table, that only match keys from the #OrderLines, are not in the #OutputTable' and delete them (so they haven't been updated nor inserted by theMERGE` statement):
DELETE A
FROM [OrderLines] AS A
INNER JOIN #OrderLines AS B
ON B.OrderId = A.OrderId AND B.LineIndex = A.LineIndex
LEFT OUTER JOIN #OutputTable AS C
ON C.OrderId = A.OrderId AND C.OrderLine = A.LineIndex
WHERE C.OrderId IS NULL AND C.OrderLine IS NULL
What you're doing here (think it's right) is actually what you wanted to delete in the first place. The inner join filters the result set to #OrderLines (so only rows with those keys) and the left join together with the where clause is doing an anti semi join, to get rows in target table that where not affected by the MERGE statement (insert or update) but still have keys that are in the source table (#OrderLines).
Should be right... Let me know after you test it.
You may want to wrap all this (MERGE + DELETE) inside a transaction, if you decide to go with this approach.

T-SQL Grouping Sets of Information

I have a problem which my limited SQL knowledge is keeping me from understanding.
First the problem:
I have a database which I need to run a report on, it contains configurations of a users entitlements. The report needs to show a distinct list of these configurations and a count against each one.
So a line in my DB looks like this:
USER_ID SALE_ITEM_ID SALE_ITEM_NAME PRODUCT_NAME CURRENT_LINK_NUM PRICE_SHEET_ID
37715 547 CultFREE CultPlus 0 561
the above line is one row of a users configuration, for every user ID there can be 1-5 of these lines. So the definition of a configuration is multiple rows of data sharing a common User ID with variable attributes..
I need to get a distinct list of these configurations across the whole table, leaving me just one configuration set for every instance where > 1 has that configuration and a count of instances of that configuration.
Hope this is clear?
Any ideas?!?!
I have tried various group by's and unions, also the grouping sets function to no avail.
Will be very greatful if anyone can give me some pointers!

Ouch that hurt ...
Ok so problem:
a row represents a configurable line
users may be linked to more than 1 row of configuration
configuration rows when grouped together form a configuration set
we want to figure out all of the distinct configuration sets
we want to know what users are using them.
Solution (its a bit messy but the idea is there, copy and paste in to SQL management studio) ...
-- ok so i imported the data to a table named SampleData ...
-- 1. import the data
-- 2. add a new column
-- 3. select all the values of the config in to the new column (Configuration_id)
--UPDATE [dbo].[SampleData]
--SET [Configuration_ID] = SALE_ITEM_ID + SALE_ITEM_NAME + [PRODUCT_NAME] + [CURRENT_LINK_NUM] + [PRICE_SHEET_ID] + [Configuration_ID]
-- 4. i then selected just the distinct values of those and found 6 distinct Configuration_id's
--SELECT DISTINCT [Configuration_ID] FROM [dbo].[SampleData]
-- 5. to make them a bit easier to read and work with i gave them int values instead
-- for me it was easy to do this manually but you might wanna do some trickery here to autonumber them or something
-- basic idea is to run the step 4 statement but select into a new table then add a new primary key column and set identity spec on it
-- that will generate u a bunch of incremental numbers for your config id's so u can then do something like ...
--UPDATE [dbo].[SampleData] sd
--SET Configuration_ID = (SELECT ID FROM TempConfigTable WHERE Config_ID = sd.Configuration_ID)
-- at this point you have all your existing rows with a unique ident for the values combined in each row.
-- so for example in my dataset i have several rows where only the user_id has changed but all look like this ...
--SALE_ITEM_ID SALE_ITEM_NAME PRODUCT_NAME CURRENT_LINK_NUM PRICE_SHEET_ID Configuration_ID
--54101 TravelFREE TravelPlus 0 56101 1
-- now you have a config id you can start to work on building sets up ...
-- each user is now matched with 1 or more config id
-- 6. we use a CTE (common table expression) to link the possibles (keeps the join small) ...
--WITH Temp (ConfigID)
--AS
--(
-- SELECT DISTINCT SD.Configuration_Id --SD2.Configuration_Id, SD3.Configuration_Id, SD4.Configuration_Id, SD5.Configuration_Id,
-- FROM [dbo].[SampleData] SD
--)
-- this extracts all the possible combinations using the CTE
-- on the basis of what you told me, max rows per user is 6, in the result set i have i only have 5 distinct configs
-- meaning i gain nothing by doing a 6th join.
-- cross joins basically give you every combination of unique values from the 2 tables but we joined back on the same table
-- so its every possible combination of Temp + Temp (ConfigID + ConfigID) ... per cross join so with 5 joins its every combination of
-- Temp + Temp + Temp + Temp + Temp .. good job temp only has 1 column with 5 values in it
-- 7. uncomment both this and the CTE above ... need to use them together
--SELECT DISTINCT T.ConfigID C1, T2.ConfigID C2, T3.ConfigID C3, T4.ConfigID C4, T5.ConfigID C5
--INTO [SETS]
--FROM Temp T
--CROSS JOIN Temp T2
--CROSS JOIN Temp T3
--CROSS JOIN Temp T4
--CROSS JOIN Temp T5
-- notice the INTO clause ... this dumps me out a new [SETS] table in my db
-- if i go add a primary key to this and set its ident spec i now have unique set id's
-- for each row in the table.
--SELECT *
--FROM [dbo].[SETS]
-- now here's where it gets interesting ... row 1 defines a set as being config id 1 and nothing else
-- row 2 defines set 2 as being config 1 and config 2 and nothing else ... and so on ...
-- the problem here of course is that 1,2,1,1,1 is technically the same set as 1,1,1,2,1 from our point of view
-- ok lets assign a set to each userid ...
-- 8. first we pull the distinct id's out ...
--SELECT DISTINCT USER_ID usr, null SetID
--INTO UserSets
--FROM SampleData
-- now we need to do bit a of operating on these that's a bit much for a single update or select so ...
-- 9. process findings in a loop
DECLARE #currentUser int
DECLARE #set int
-- while theres a userid not linked to a set
WHILE EXISTS(#currentUser = SELECT TOP 1 usr FROM UserSets WHERE SetId IS NULL)
BEGIN
-- figure out a set to link it to
SET #set = (
SELECT TOP 1 ID
FROM [SETS]
-- shouldn't really do this ... basically need to refactor in to a table variable then compare to that
-- that way the table lookup on ur main data is only 1 per User_id
WHERE C1 IN (SELECT DISTINCT Configuration_id FROM SampleData WHERE USER_ID = #currentUser)
AND C2 IN (SELECT DISTINCT Configuration_id FROM SampleData WHERE USER_ID = #currentUser)
AND C3 IN (SELECT DISTINCT Configuration_id FROM SampleData WHERE USER_ID = #currentUser)
AND C4 IN (SELECT DISTINCT Configuration_id FROM SampleData WHERE USER_ID = #currentUser)
AND C5 IN (SELECT DISTINCT Configuration_id FROM SampleData WHERE USER_ID = #currentUser)
)
-- hopefully that worked
IF(#set IS NOT NULL)
BEGIN
-- tell the usersets table
UPDATE UserSets SET SetId = #set WHERE usr = #currentUser
set #set = null
END
ELSE -- something went wrong ... set to 0 to prevent endless loop but any userid linked to set 0 is a problem u need to look at
UPDATE UserSets SET SetId = 0 WHERE usr = #currentUser
-- and round we go again ... until we are done
END

SELECT
USER_ID,
SALE_ITEM_ID, ETC...,
COUNT(*) WhateverYouWantToNameCount
FROM TableNAme
GROUP BY USER_ID

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Incremental load in T-SQL with recorded history - sql-server

Related

How to perform dynamic update through MERGE statement in Snowflake?

Snowflake does not implement the full SQL MERGE statement?

SQL Merge and output in the same table

TSQL - MERGE statement with composite key

T-SQL Grouping Sets of Information

Categories

Resources