I have been tasked with building a history table in SQL. I have already built the base table which contains multiple left joins amongst other things. The base table will need to be compared to another table and only update specific columns that have changed, insert new rows where the key doesn't match.
Previously I have used other ETL tools which have GUI style built in SCD loaders, but I don't have such luxury in SQL Server. Here the merge statement can handle such operations. I have used the MERGE statement before, but I become a bit stuck when handling flags and date fields based on the operation performed.
Here is the BASE table
KEY
CLIENT
QUANTITY
CONTRACT_NO
FC_COUNT
DELETE_FLAG
RECORD_UPDATED_DATE
345
A
1000
5015
1
N
31/12/9999
346
B
2000
9352
1
N
31/12/9999
347
C
3000
6903
1
N
31/12/9999
348
D
1000
7085
1
N
31/12/9999
349
E
1000
8488
1
N
31/12/9999
350
F
500
6254
1
N
31/12/9999
Here is the table I plan to merge with
KEY
CLIENT
QUANTITY
CONTRACT_NO
FC_COUNT
345
A
1299
5015
1
346
B
2011
9352
1
351
Z
5987
5541
1
The results I'm looking for are
KEY
CLIENT
QUANTITY
CONTRACT_NO
FC_COUNT
DELETE_FLAG
RECORD_UPDATED_DATE
345
A
1000
5015
1
N
06/07/2022
345
A
1299
5015
1
N
31/12/9999
346
B
2000
9352
1
N
06/07/2022
346
B
2011
9352
1
N
31/12/9999
347
C
3000
6903
1
Y
06/07/2022
348
D
1000
7085
1
Y
06/07/2022
349
E
1000
8488
1
Y
06/07/2022
350
F
500
6254
1
Y
06/07/2022
351
Z
5987
5541
1
N
31/12/9999
As we can see I have shown the changes, closed off the old records, marked with a date and a delete flag if they are missing but was there previous, as well as new new row with the new key and data
Would this be a MERGE? Some direction on how to perform this sort of operation would be a great help. We have a lot of tables where we need to keep change history and this will help a lot going forward.
code shell attempt
SELECT
MAIN_KEY,
CLIENT,
QUANTITY,
CONTRACT_NO,
1 AS FC_COUNT,
NULL as DELETE_FLG_DD,
GETDATE() as RECORD_UPDATED_DATE
INTO #G1_DELTA
FROM
[dwh].STG_DTL
MERGE [dwh].[PRJ1_DELTA] TARGET
USING #G1_DELTA SOURCE
ON TARGET.MAIN_KEY = SOURCE.MAIN_KEY
WHEN MATCHED THEN INSERT
(
MAIN_KEY,
CLIENT,
QUANTITY,
CONTRACT_NO,
FC_COUNT,
DELETE_FLG_DD,
RECORD_UPDATED_DATE
)
VALUES
(
SOURCE.MAIN_KEY,
SOURCE.CLIENT,
SOURCE.QUANTITY,
SOURCE.CONTRACT_NO,
SOURCE.FC_COUNT,
SOURCE.DELETE_FLG_DD,
SOURCE.RECORD_UPDATED_DATE
)
If you need to build a history table containing the updated information from your two tables, you first need to select updated information from your two tables.
The changes that need to be applied to your tables are on:
"tab1.[DELETE_FLAG]", that should be updated to 'Y' whenever it has a match with tab2
"tab1.[RECORD_UPDATED_DATE]", that should be updated to the current date
"tab2.[DELETE_FLAG]", missing and that should be initialized to N
"tab2.[RECORD_UPDATED_DATE]", missing and that should be initialized to your random date 9999-12-31.
Once these changes are made, you can apply the UNION ALL to get the rows from your two tables together.
Then, in order to generate a table, you can use a cte to select the output result set and use the INTO <table> clause after a selection to generate your "history" table.
WITH cte AS (
SELECT tab1.[KEY],
tab1.[CLIENT],
tab1.[QUANTITY],
tab1.[CONTRACT_NO],
tab1.[FC_COUNT],
CASE WHEN tab2.[KEY] IS NOT NULL
THEN 'N'
ELSE 'Y'
END AS [DELETE_FLAG],
CAST(GETDATE() AS DATE) AS [RECORD_UPDATED_DATE]
FROM tab1
LEFT JOIN tab2
ON tab1.[KEY] = tab2.[KEY]
UNION ALL
SELECT *,
'N' AS [DELETE_FLAG],
'9999-12-31' AS [RECORD_UPDATED_DATE]
FROM tab2
)
SELECT *
INTO history
FROM cte
ORDER BY [KEY];
Check the demo here.
Related
I am stuck on this SQL problem which may be easier than I think. So in a nutshell, how do I go about selecting the cost from the appropriate garage when the GarageHistID in the GarageCosts table equals to the ID in the GarageHistory table?
GarageCosts
GarageID Cost Version GarageHistID
950 213 1 455
950 342 3 NULL
GarageHistory
ID VendorID Version GarageID
454 44 1 NULL
455 2 1 950
456 44 2 NULL
Expected Output:
VendorID Cost Version
2 213 1
44 0 1
44 0 2
This is just a left join coalescing a null to zero.
SELECT
gh.VendorID,
ISNULL(gc.Cost,0) AS Cost,
gh.Version
FROM GarageHistory gh
LEFT JOIN GarageCost gc
ON gh.GarageID = gc.GarageID
AND gh.VersionID = gc.VersionID
There is no (specific) need to have bi-directional keys in your 2 tables, but you could use either for the join (along with VersionID).
The following query gives the exact results you mentioned in your question. You can use left join to join the two tables based on GarageHistID field in GarageCosts table and ID field in GarageHistory table
SELECT
gh.VendorID,
ISNULL(gc.Cost,0) AS Cost,
gh.[Version]
FROM GarageHistory gh
left JOIN GarageCosts gc
ON gh.ID = gc.GarageHistID
order by gc.Cost desc
hi could you please help to resolve the below scenario, i need to do the update only after comparing all rows related to custid 100 and decide who has the latest date (100 or 400) that id should remain active,my existing update statement updates based on single row comparison . Could you please help in how we can do this with recursion and fire the update in the same query?
TABLE
ID CustID MCustID Date MDate
1 100 200 2017-01-10 2017-01-15
2 100 300 2017-01-10 2017-01-07
3 100 400 2017-01-10 2017-01-21
CODE
update A
set active = 0
from Cust A inner join tblMatch M
on A.CustomerID = case when M.Date < M.MDate then M.CustId else M.MCustId end
Say I have a table let's call it purchase table in SQL Server that represents user purchasing.
Table name: purchase
purchase_id buyer_member_id song_id
1 101 1001
2 101 1002
3 102 1001
4 102 1003
5 103 1001
6 103 1003
7 103 1004
Now I tried to make some stats out of this table. I want to know who has purchased both song 1001 and 1003.
select distinct buyer_member_id from purchase where
buyer_member_id in (select buyer_member_id from purchase where song_id = 1001)
and buyer_member_id in (select buyer_member_id from purchase where song_id = 1003)
This works but when we add more and more criteria to the equation, it became slower and slower. It's nearly impossible to do a research for something like, find people who buy a, b and c but not d nor f. I understand that the nature of this and the use of "where someid in (select someid from table where something) is probably not the best way to do it.
Question is, is there a better way?
I call these "set-within-a-set" queries, and like to approach them using group by and having:
select buyer_member_id
from purchase p
group by buyer_member_id
having sum(case when song_id = 1001 then 1 else 0 end) > 0 and
sum(case when song_id = 1003 then 1 else 0 end) > 0;
The sum() counts the number of purchases that match each song. The > 0 says there is at least 1. And = 0 would say there are none.
I have a table with 106 columns. One of those columns is a "Type" column with 16 types.
I want 16 rows, where the Type is distinct. So, row 1 has a type of "Construction", row 2 has a type of "Elevator PVT", etc.
Using Navicat.
From what I've found (and understood) so far, I can't use Distinct (because that looks across all rows), I can't use Group By (because that's for aggregating data, which I'm not looking to do), so I'm stuck.
Please be gentle- I'm really really new at this.
Below is a part of the table (how can I share this normally?)- it's really big so I didn't share the whole thing. Below is a partial result I'm looking for, where the Violation_Type is unique and the rest of the columns display.
Got it.. Sheesh... (took me forever, but got it...)
D_ID B_ID V_ID V_Type S_ID c_f d_y l_u p_s du_p
------ ------ ------- -------------- ------ ----- ------ ------ ----- ------
184 117 V 032 Elevator PVT 2 8 0 0
4 140 V 100 Construction 1 8 0 0
10 116 V 122 Electric 1 8 2005 0 0
11 117 V 033 Boiler Local 1 0 2005 0 0
You can use ROW_NUMBER for this:
SELECT *
FROM(
SELECT *,
rn = ROW_NUMBER() OVER(PARTITION BY V_Type ORDER BY (SELECT NULL))
FROM tbl
)t
WHERE rn = 1
Modify the ORDER BY depending on what row you want to prioritize.
From the documentation:
Returns the sequential number of a row within a partition of a result
set, starting at 1 for the first row in each partition.
This means that for every row within a partition (specified by the PARTITION BY clause), sql-server assigns a number from 1 depending on the order specified in the ORDER BY clause.
ROW_NUMBER requires an ORDER BY clause. SELECT NULL tells the sql-server that we do not want to enforce a particular order. We just want the rows numbered by partition.
The WHERE rn = 1 obviously filters only rows that has a ROW_NUMBER of 1. This gives you one row for every V_TYPE available.
I have a table :
UNIQUE KEY ID Clicks INSERTDATE
1 100001 10 2011-05-14 00:00:00.000
2 100001 20 2011-05-13 00:00:00.000
3 100001 30 2011-05-18 00:00:00.000
4 100002 10 2011-05-20 00:00:00.000
5 100002 15 2011-05-24 00:00:00.000
6 100002 10 2011-05-05 00:00:00.000
I have a threshold value for clicks, lets say 20.
I need to write a T-SQL which should remove the clicks that do not meet the threshold of the accumulative Sum of clicks for each ID.
So for the above example ID "100001" has an accumulative clicks of 60 (10+20+30) but since the threshold is 20, the last record i.e. with the click value of 30 should get removed from the result.
However, the second record should still be included even though the sum at that point is > my threshold (10 + 20).
EDIT :
Another major rule that needs to be applied is that the INSERTDATE has to be ordered before performing any calculations
Any help would be much appreciated.
If I understood the question correctly, you'd like to filter on the RunningTotal for a given Id, like so:
select c1.*
from ClickTable c1
outer apply (
select sum(Clicks) as RunningTotal
from ClickTable
where pk < c1.pk
and id = c1.id
) c2
where isnull(RunningTotal, 0) <= 20
this implies that you have a unique key field in the table, called PK.
Running sample: http://www.sqlfiddle.com/#!3/98173/11
Update
To order by Clicks instead of the primary key, just change the line
where pk < c1.pk
to
where Clicks < c1.Clicks
Running sample: http://www.sqlfiddle.com/#!3/31750/2
I hope I read the question correctly. Seems too simple:
SELECT ID, SUM(Clicks) AS Clicks
FROM t1
WHERE Clicks <= 20 -- <== this is your threshold
GROUP BY ID
Would give you
ID Clicks
100001 30
100002 35