I have a use case where I have 200 tables. I need to get the latest record from all the 200 tables store them in staging table. Now using each staging record need to check if it is already existing
in Final table and status column for that record is open or closed.
Initial table:(generic schema for all 200 tables)
ID, timestamp, name
Staging Table:
ID, timestamp, name
Final Table:
ID, timestamp, name, status, count
My approach:
Ordering by timestamp and limit 1 will give latest record in each table
Union all those latest record from 200 tables( 200 select statements with union)
staging table will now have 200 records
check each record if it is already existing in Final table, if existing and status="open" need
to increment the increment the count, if status="closed" or didn't find any match it should be
inserted as new record in Final table
came across TSQL "IF NOT EXISTS () BEGIN END ELSE BEGIN END" and while loop (not sure how use in this case)
All this process happens every 15 mins.
Any better approach or suggestions and how can I handle the last step of checking and inserting each row.
I am new to SQL.
More Info:
Those initial tables are in hive, where 200 different process trying to write simultaneously into tables, So table lock will happen for each write and remaining process should wait, so I had each table for each process. there will not be 200 records in staging every time, I gave the worst case. ideally it will be of range 0 to 10 at any given point, but it has to check all the 200 tables every 15 mins. this staging table from hive is brought into sql server and pushed to Final table to server other purpose
Although it sounds very strange that you have 200 Tables all with the same scheme, the following MERGE-Statement should achieve what you want.
WITH STAGING_DATA ([ID], [TIMESTAMP], [NAME])
as
(
SELECT TOP 1 [ID], [TIMESTAMP], [NAME] FROM <TABLE_1> ORDER BY [TIMESTAMP] DESC
UNION ALL
SELECT TOP 1 [ID], [TIMESTAMP], [NAME] FROM <TABLE_2> ORDER BY [TIMESTAMP] DESC
UNION ALL
...
UNION ALL
SELECT TOP 1 [ID], [TIMESTAMP], [NAME] FROM <TABLE_N> ORDER BY [TIMESTAMP] DESC
)
MERGE INTO <FINAL_TABLE> AS TARGET
USING (
SELECT [ID], [TIMESTAMP], [NAME] FROM STAGING_DATA
)
AS SOURCE ([ID], [TIMESTAMP], [NAME])
ON TARGET.ID = SOURCE.ID AND TARGET.STATUS = 'OPEN'
WHEN MATCHED THEN
UPDATE SET [COUNT] = ISNULL([COUNT], 0) + 1
WHEN NOT MATCHED BY TARGET THEN
INSERT ([ID], [TIMESTAMP], [NAME], [STATUS], [COUNT]) VALUES ([ID], [TIMESTAMP], [NAME], 'OPEN', 0)
The STAGING_DATA CTE is collecting all the data (the top 1 datatset from each table ordered by timestamp) and the merge statement takes care of merging the result into your final table. The merge statement also checks if a dataset with the same ID and the Status 'OPEN' already exists, in which case it just updates the according dataset in the final table by incrementing the counter by 1. Should the dataset not be found (or have another status than 'OPEN') we add a new dataset to the final table.
ORDER BY with UNION ALL Statements:
The ORDER BY does work with the UNION ALL as long as they are within the CTE. At least when I tested it on SQL Server 2012, 2017 and 2019 with the following setup:
WITH STAGING_DATA ([ID], [TIMESTAMP], [NAME])
as
(
SELECT TOP 1 [ID], [TIMESTAMP], [NAME]
FROM (VALUES
('1', '2021-01-01 00:00:00.000', 'Käser'),
('74', '2021-01-01 00:00:00.000', 'Valérie Maier'),
('2', '2021-01-01 00:00:00.000', 'Jäggi'),
('84', '2021-01-01 00:00:00.000', 'D'),
('83', '2021-01-01 00:00:00.000', 'Wyss')
) as DATA ([ID], [TIMESTAMP], [NAME])
ORDER BY [ID] ASC
UNION ALL
SELECT TOP 1 [ID], [TIMESTAMP], [NAME]
FROM (VALUES
('1', '2021-01-01 00:00:00.000', 'Käser'),
('74', '2021-01-01 00:00:00.000', 'Valérie Maier'),
('2', '2021-01-01 00:00:00.000', 'Jäggi'),
('84', '2021-01-01 00:00:00.000', 'D'),
('83', '2021-01-01 00:00:00.000', 'Wyss')
) as DATA ([ID], [TIMESTAMP], [NAME])
ORDER BY [ID] DESC
UNION ALL
SELECT TOP 2 [ID], [TIMESTAMP], [NAME]
FROM (VALUES
('1', '2021-01-01 00:00:00.000', 'Käser'),
('74', '2021-01-01 00:00:00.000', 'Valérie Maier'),
('2', '2021-01-01 00:00:00.000', 'Jäggi'),
('84', '2021-01-01 00:00:00.000', 'D'),
('83', '2021-01-01 00:00:00.000', 'Wyss')
) as DATA ([ID], [TIMESTAMP], [NAME])
ORDER BY [ID] ASC
UNION ALL
SELECT TOP 2 [ID], [TIMESTAMP], [NAME]
FROM (VALUES
('1', '2021-01-01 00:00:00.000', 'Käser'),
('74', '2021-01-01 00:00:00.000', 'Valérie Maier'),
('2', '2021-01-01 00:00:00.000', 'Jäggi'),
('84', '2021-01-01 00:00:00.000', 'D'),
('83', '2021-01-01 00:00:00.000', 'Wyss')
) as DATA ([ID], [TIMESTAMP], [NAME])
ORDER BY [ID] DESC
)
SELECT [ID], [TIMESTAMP], [NAME] FROM STAGING_DATA
Your approach to insert into the staging table would work logically, unfortunately in SQL Server you cannot UNION queries that contain an ORDER BY, so the following WILL NOT WORK
SELECT TOP(1) ID,[timestamp], [name] FROM dbo.TblA ORDER BY timestamp
UNION ALL
SELECT TOP(1) ID,[timestamp], [name] FROM dbo.TblB ORDER BY timestamp
UNION ALL
SELECT TOP(1) ID,[timestamp], [name] FROM dbo.TblC ORDER BY timestamp
If you want to do the UNION, you have to put the ORDER BY in a subquery and then do the UNION. It looks like this:
--INSERT INTO dbo.Staging (ID, [timestamp], [name])
SELECT q1.ID, q1.[timestamp], q1.[name] FROM
(SELECT TOP(1) ID, [timestamp], [name] FROM dbo.TblA ORDER BY [timestamp] DESC) AS q1
UNION ALL
SELECT q2.ID, q2.[timestamp], q2.[name] FROM
(SELECT TOP(1) ID, [timestamp], [name] FROM dbo.TblB ORDER BY [timestamp] DESC) AS q2
UNION ALL
SELECT q3.ID, q3.[timestamp], q3.[name] FROM
(SELECT TOP(1) ID, [timestamp], [name] FROM dbo.TblC ORDER BY [timestamp] DESC) AS q3
It is very ugly for sure and I don't know if you would be better off with 200 separate INSERT statements, but let's just stick with this approach for now. So you can stage those records now:
INSERT INTO dbo.Staging (ID, [timestamp], [name])
SELECT q1.ID, q1.[timestamp], q1.[name] FROM
(SELECT TOP(1) ID, [timestamp], [name] FROM dbo.TblA ORDER BY [timestamp] DESC) AS q1
UNION ALL
SELECT q2.ID, q2.[timestamp], q2.[name] FROM
(SELECT TOP(1) ID, [timestamp], [name] FROM dbo.TblB ORDER BY [timestamp] DESC) AS q2
UNION ALL
SELECT q3.ID, q3.[timestamp], q3.[name] FROM
(SELECT TOP(1) ID, [timestamp], [name] FROM dbo.TblC ORDER BY [timestamp] DESC) AS q3
I assume you TRUNCATE the staging table before each run so it will only contain the records you are about to load into the final table. I myself prefer to use a combination of INNER JOINs and LEFT OUTER JOINs to find what doesn't exist and what already exists (makes debugging and development easier in my opinion, but others may disagree), but there is the MERGE approach (I will not show that here).
So to load the final table you can do something like:
-- increment existing open records
-- the INNER JOIN guarantees an existing record that matches ID
UPDATE final SET final.[count] = final.[count] + 1
FROM dbo.Staging AS stage
INNER JOIN dbo.Final AS final ON final.ID = stage.ID AND final.[status] = 'open';
-- add closed records
-- same comment about the INNER JOIN
INSERT INTO dbo.Final(ID, [timestamp], [name], [status], [count])
SELECT final.ID, final.[timestamp], final.[name], 'open', 1
FROM dbo.Staging AS stage
INNER JOIN dbo.Final AS final ON final.ID = stage.ID AND final.[status] = 'closed'
-- no match, insert these records
-- the LEFT OUTER JOIN with the WHERE clause guarantees no matching record
INSERT INTO dbo.Final(ID, [timestamp], [name], [status], [count])
SELECT stage.ID, stage.[timestamp], stage.[name], 'open', 1
FROM dbo.Staging AS stage
LEFT OUTER JOIN dbo.Final AS final ON final.ID = stage.ID
WHERE final.ID IS NULL;
I just matched on the ID value, but you can modify what is considered a match easily in the ON clause.
I am trying to load a standard Kimball SCD2 Dimension, using a merge statement which I got from the following website:
http://www.kimballgroup.com/2008/11/design-tip-107-using-the-sql-merge-statement-for-slowly-changing-dimension-processing/
This merge statement is the same except to handle new entities. This will be handled as a direct insert in the dataflow. This problem concerns only multiple versions of the same business key.
When I execute the merge statement SQL returns the error:
Msg 8672, Level 16, State 1, Line 3
The MERGE statement attempted to UPDATE or DELETE the same row more than once. This happens when a target row matches more than one source row. A MERGE statement cannot UPDATE/DELETE the same row of the target table multiple times. Refine the ON clause to ensure a target row matches at most one source row, or use the GROUP BY clause to group the source rows.
I am using SQL Server 2012:
SOURCE DATASET
TARGET DATASET
This is what I expected:
Below you can find the script to reproduce the problem:
CREATE TABLE SANDBOX.EHN.SOURCE_SCD2 (
BUSINESS_KEY BIGINT
,DESCRIPTION_A VARCHAR(2)
,M_CRC BIGINT
,StartDATE DATE
,EndDATE DATE )
CREATE TABLE SANDBOX.EHN.TARGET_SCD2 (
BUSINESS_KEY BIGINT
,DESCRIPTION_A VARCHAR(2)
,M_CRC BIGINT
,StartDATE DATE
,EndDATE DATE )
select *
from SANDBOX.EHN.TARGET_SCD2
truncate table SANDBOX.EHN.TARGET_SCD2
INSERT INTO SANDBOX.EHN.SOURCE_SCD2 VALUES (1, 'B', 1, '2015-05-16', '2015-06-01')
INSERT INTO SANDBOX.EHN.SOURCE_SCD2 VALUES (1, 'C', 2, '2015-06-01', '2015-06-11')
INSERT INTO SANDBOX.EHN.SOURCE_SCD2 VALUES (1, 'D', 3, '2015-06-11', '9999-12-31')
INSERT INTO SANDBOX.EHN.TARGET_SCD2 VALUES (1, 'A', 0, '2015-01-16', '9999-12-31')
INSERT INTO SANDBOX.EHN.TARGET_SCD2
SELECT BUSINESS_KEY
,DESCRIPTION_A
,M_CRC
,StartDATE
,EndDATE
FROM (
MERGE SANDBOX.EHN.TARGET_SCD2 D
USING SANDBOX.EHN.SOURCE_SCD2 UPD
ON(D.BUSINESS_KEY = UPD.BUSINESS_KEY )
WHEN MATCHED AND D.EndDATE = '9999-12-31'
THEN UPDATE SET D.EndDATE = UPD.EndDATE
OUTPUT $Action Action_Out, UPD.BUSINESS_KEY
, UPD.DESCRIPTION_A
, UPD.M_CRC
, UPD.StartDATE
, UPD.EndDATE
)AS MERGE_OUT
WHERE MERGE_OUT.Action_Out = 'UPDATE'
Can you help me to fix this problem?
For the last update only use;
INSERT INTO SANDBOX.EHN.TARGET_SCD2
SELECT BUSINESS_KEY
,DESCRIPTION_A
,M_CRC
,StartDATE
,EndDATE
FROM (
MERGE SANDBOX.EHN.TARGET_SCD2 D
USING SANDBOX.EHN.SOURCE_SCD2 UPD
ON(D.BUSINESS_KEY = UPD.BUSINESS_KEY AND UPD.EndDATE = '9999-12-31')
WHEN MATCHED AND D.EndDATE = '9999-12-31'
THEN UPDATE SET D.EndDATE = UPD.StartDATE
OUTPUT $Action Action_Out, UPD.BUSINESS_KEY
, UPD.DESCRIPTION_A
, UPD.M_CRC
, UPD.StartDATE
, UPD.EndDATE
)AS MERGE_OUT
WHERE MERGE_OUT.Action_Out = 'UPDATE'
If you want ALL SRC rows in your target table then I agree with Nick.McDermaid
For ALL rows use;
UPDATE TRG
SET TRG.EndDate = SRC.StartDATE
FROM SANDBOX.EHN.TARGET_SCD2 TRG
JOIN ( select SRC.BUSINESS_KEY, min(src.StartDATE)StartDATE
from SANDBOX.EHN.SOURCE_SCD2 SRC
group by SRC.BUSINESS_KEY
)SRC
on ( TRG.BUSINESS_KEY = SRC.BUSINESS_KEY
AND SRC.StartDate > TRG.StartDate )
where 1 = 1
INSERT SANDBOX.EHN.TARGET_SCD2
SELECT * FROM SANDBOX.EHN.SOURCE_SCD2
I have written a stored procedure based on a set of process, I'm just passing a single parameter as input to the procedure but it seems it is not taking the value. But when I give input value instead of parameter in the procedure it is working.
There is no mistake in the flow of process, but seems something missing in the procedure syntax end.
below is the stored procedure I used.
ALTER PROCEDURE [TransferIn]
#ponumber NVARCHAR = NULL
AS
BEGIN
--step 1 Delete Temp Pur_ID
IF EXISTS (
SELECT TABLE_NAME
FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_NAME = 'Pur_ID_IN')
DROP TABLE Pur_ID_IN;
-- =============================================
--step 2 select PO Number
--IF #ponumber IS NOT NULL
SELECT
ponumber, id
INTO Pur_ID_IN
FROM purchaseorder
WHERE potype IN (2, 4)
AND status = 0
AND ponumber = #ponumber;
-- =============================================
--step 3
--delete Temp. Tabel P_Test20_12_IN
IF EXISTS (
SELECT TABLE_NAME
FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_NAME = 'P_Test20_12_IN')
DROP TABLE P_Test20_12_IN;
-- =============================================
-- step 4 (Insert Data For Invoice To Temp Tabel After Group )
SELECT
ItemDescription, PurchaseOrderID,
SUM(QuantityOrdered) AS QuantityOrdered,
itemid, Price
INTO P_Test20_12_IN
FROM PurchaseOrderEntry
WHERE PurchaseOrderID IN (SELECT id FROM Pur_ID_IN)
GROUP BY
ItemDescription, StoreID, PurchaseOrderID,
itemid, Price;
--order by 3
-- =============================================
-- step 5 Delete Record From PurchaseOrderEntry
DELETE PurchaseOrderEntry
FROM PurchaseOrderEntry
WHERE PurchaseOrderID IN (SELECT id FROM Pur_ID_IN);
-- =============================================
INSERT INTO [W07].[dbo].[PurchaseOrderEntry] ([ItemDescription], [LastUpdated], [PurchaseOrderID], [QuantityOrdered], [ItemID], [Price])
SELECT
[ItemDescription],
GETDATE() AS [LastUpdated],
[PurchaseOrderID], [QuantityOrdered],
[ItemID], [Price]
FROM
P_Test20_12_IN;
END
Problem is
#ponumber nvarchar = null
change it to
#ponumber nvarchar(max) = null
Note:If you do NOT specify the size(char, nchar, varchar, nvarchar),
sql server will default to 1 character.
ALTER PROCEDURE [TransferIn]
(
#ponumber NVARCHAR(100)
)
AS BEGIN
SET NOCOUNT ON
IF OBJECT_ID('tempdb.dbo.#temp') IS NOT NULL
DROP TABLE #temp
CREATE TABLE #temp (id INT PRIMARY KEY)
INSERT INTO #temp (id)
SELECT /*DISTINCT*/ id
FROM dbo.purchaseorder
WHERE potype IN (2, 4)
AND [status] = 0
AND ponumber = #ponumber
IF OBJECT_ID('tempdb.dbo.#temp2') IS NOT NULL
DROP TABLE #temp2
SELECT ItemDescription,
PurchaseOrderID,
SUM(QuantityOrdered) AS QuantityOrdered,
itemid,
Price
INTO #temp2
FROM PurchaseOrderEntry
WHERE PurchaseOrderID IN (SELECT * FROM #temp)
GROUP BY ItemDescription,
StoreID, --?
PurchaseOrderID,
itemid,
Price;
DELETE PurchaseOrderEntry
FROM PurchaseOrderEntry
WHERE PurchaseOrderID IN (SELECT * FROM #temp)
INSERT INTO [W07].[dbo].[PurchaseOrderEntry] ([ItemDescription], [LastUpdated], [PurchaseOrderID], [QuantityOrdered], [ItemID], [Price])
SELECT [ItemDescription],
GETDATE() AS [LastUpdated],
[PurchaseOrderID],
[QuantityOrdered],
[ItemID],
[Price]
FROM #temp2
END
Need to delete all rows from a source table , then insert the deleted rows to target table
ONLY if the deleted row doesn't exists yet in the target table.
Is it possible to issue using a single sql?
Code is the one I tried so far (though with error).
Thank You!
create table #Target (column01 varchar(100)
,employee_number varchar(10)
)
Insert into #Target (column01, employee_number)
values ('2','222')
create table #Srs (column01 varchar(100)
,employee_number varchar(10)
)
Insert into #Srs (column01, employee_number)
values ('1','111')
,('2','222')
,('3','333')
,('4','444')
;with cteTable as (Select column01, employee_number from #Srs)
insert into #Target (column01, employee_number)
select * from (Delete from cteTable output deleted.column01, deleted.employee_number) t
where not exists (select 1
from #Target t1
where t1.employee_number = t.employee_number)
The 2,'222' should not be inserted into #Target on call to ";with cteTable.."
SQL FIDDLE DEMO
Composable DML is quite limited.
You can do this if you change the definition of #Target though
CREATE TABLE #Target
(
column01 VARCHAR(100),
employee_number VARCHAR(10) PRIMARY KEY WITH (IGNORE_DUP_KEY=ON)
)
INSERT INTO #Target
(column01,
employee_number)
VALUES ('2',
'222')
CREATE TABLE #Srs
(
column01 VARCHAR(100),
employee_number VARCHAR(10)
)
INSERT INTO #Srs
(column01,
employee_number)
VALUES ('1', '111'),
('2', '222'),
('3', '333'),
('4', '444');
WITH cteTable
AS (SELECT column01,
employee_number
FROM #Srs)
INSERT INTO #Target
(column01,
employee_number)
SELECT * from (Delete from cteTable output deleted.column01, deleted.employee_number) t
Does it have to be only one statement? If not you can use this.
begin transaction;
insert into Target(column01, employee_number)
select column01, employee_number
from Srs with (updlock, holdlock)
except
select column01, employee_number
from Target;
delete from Srs;
commit transaction;
I have 2 temp Tables [Description] and [Institution], I want to have these two in one table.
They are both tables that look like this:
Table1; #T1
|Description|
blabla
blahblah
blagblag
Table2; #T2
|Institution|
Inst1
Inst2
Inst3
I want to get it like this:
Table3; #T3
|Description| |Institution|
blabla Inst1
blahblah Inst2
blagblag Inst3
They are already in sort order.
I just need to get them next to each other..
Last time I asked was something almost the same.
I used this query
Create Table #T3
(
[From] Datetime
,[To] Datetime
)
INSERT INTO #T3
SELECT #T1.[From]
, MIN(#T2.[To])
FROM #T1
JOIN #T2 ON #T1.[From] < #T2.[To]
GROUP BY #T1.[From]
Select * from #T3
It did work for the date values, but it won't work here ? :s
Thank you.
One thing that concerns me is that you say that the values "are already in sort order". There really is no default sort order -- if you don't specify a sort order, you are at the mercy of SQL Server to determine the order in which the data is returned. The solution below assumes that there is some way to sort the data such that the records "match up" (using the ORDER BY clauses).
Hope this helps,
John
-- Table 1 test data
Create Table #T1
(
[Description] nvarchar(30)
)
INSERT INTO #T1 ([Description]) VALUES ('desc1')
INSERT INTO #T1 ([Description]) VALUES ('desc2')
INSERT INTO #T1 ([Description]) VALUES ('desc3')
-- Table 2 test data
Create Table #T2
(
[Institution] nvarchar(30)
)
INSERT INTO #T2 (Institution) VALUES ('Inst1')
INSERT INTO #T2 (Institution) VALUES ('Inst2')
INSERT INTO #T2 (Institution) VALUES ('Inst3')
-- Create table 3
Create Table #T3
(
[Description] nvarchar(30),
[Institution] nvarchar(30)
);
-- Use CTE2 to add row numbers to the data; use the row numbers to join the tables
-- you must specify the sort order for the data in the tables
WITH CTE1 (Description, RowNum) AS
(
SELECT [Description], ROW_NUMBER() OVER(ORDER BY [Description]) as RowNum
FROM #T1
),
CTE2 (Institution, RowNum) AS
(
SELECT Institution, ROW_NUMBER() OVER(ORDER BY Institution) as RowNum
FROM #T2
)
INSERT INTO #T3
SELECT CTE1.Description, CTE2.Institution
FROM CTE1
LEFT JOIN CTE2 ON CTE1.RowNum = CTE2.RowNum
Select * from #T3