Related
I have a use case where I have 200 tables. I need to get the latest record from all the 200 tables store them in staging table. Now using each staging record need to check if it is already existing
in Final table and status column for that record is open or closed.
Initial table:(generic schema for all 200 tables)
ID, timestamp, name
Staging Table:
ID, timestamp, name
Final Table:
ID, timestamp, name, status, count
My approach:
Ordering by timestamp and limit 1 will give latest record in each table
Union all those latest record from 200 tables( 200 select statements with union)
staging table will now have 200 records
check each record if it is already existing in Final table, if existing and status="open" need
to increment the increment the count, if status="closed" or didn't find any match it should be
inserted as new record in Final table
came across TSQL "IF NOT EXISTS () BEGIN END ELSE BEGIN END" and while loop (not sure how use in this case)
All this process happens every 15 mins.
Any better approach or suggestions and how can I handle the last step of checking and inserting each row.
I am new to SQL.
More Info:
Those initial tables are in hive, where 200 different process trying to write simultaneously into tables, So table lock will happen for each write and remaining process should wait, so I had each table for each process. there will not be 200 records in staging every time, I gave the worst case. ideally it will be of range 0 to 10 at any given point, but it has to check all the 200 tables every 15 mins. this staging table from hive is brought into sql server and pushed to Final table to server other purpose
Although it sounds very strange that you have 200 Tables all with the same scheme, the following MERGE-Statement should achieve what you want.
WITH STAGING_DATA ([ID], [TIMESTAMP], [NAME])
as
(
SELECT TOP 1 [ID], [TIMESTAMP], [NAME] FROM <TABLE_1> ORDER BY [TIMESTAMP] DESC
UNION ALL
SELECT TOP 1 [ID], [TIMESTAMP], [NAME] FROM <TABLE_2> ORDER BY [TIMESTAMP] DESC
UNION ALL
...
UNION ALL
SELECT TOP 1 [ID], [TIMESTAMP], [NAME] FROM <TABLE_N> ORDER BY [TIMESTAMP] DESC
)
MERGE INTO <FINAL_TABLE> AS TARGET
USING (
SELECT [ID], [TIMESTAMP], [NAME] FROM STAGING_DATA
)
AS SOURCE ([ID], [TIMESTAMP], [NAME])
ON TARGET.ID = SOURCE.ID AND TARGET.STATUS = 'OPEN'
WHEN MATCHED THEN
UPDATE SET [COUNT] = ISNULL([COUNT], 0) + 1
WHEN NOT MATCHED BY TARGET THEN
INSERT ([ID], [TIMESTAMP], [NAME], [STATUS], [COUNT]) VALUES ([ID], [TIMESTAMP], [NAME], 'OPEN', 0)
The STAGING_DATA CTE is collecting all the data (the top 1 datatset from each table ordered by timestamp) and the merge statement takes care of merging the result into your final table. The merge statement also checks if a dataset with the same ID and the Status 'OPEN' already exists, in which case it just updates the according dataset in the final table by incrementing the counter by 1. Should the dataset not be found (or have another status than 'OPEN') we add a new dataset to the final table.
ORDER BY with UNION ALL Statements:
The ORDER BY does work with the UNION ALL as long as they are within the CTE. At least when I tested it on SQL Server 2012, 2017 and 2019 with the following setup:
WITH STAGING_DATA ([ID], [TIMESTAMP], [NAME])
as
(
SELECT TOP 1 [ID], [TIMESTAMP], [NAME]
FROM (VALUES
('1', '2021-01-01 00:00:00.000', 'Käser'),
('74', '2021-01-01 00:00:00.000', 'Valérie Maier'),
('2', '2021-01-01 00:00:00.000', 'Jäggi'),
('84', '2021-01-01 00:00:00.000', 'D'),
('83', '2021-01-01 00:00:00.000', 'Wyss')
) as DATA ([ID], [TIMESTAMP], [NAME])
ORDER BY [ID] ASC
UNION ALL
SELECT TOP 1 [ID], [TIMESTAMP], [NAME]
FROM (VALUES
('1', '2021-01-01 00:00:00.000', 'Käser'),
('74', '2021-01-01 00:00:00.000', 'Valérie Maier'),
('2', '2021-01-01 00:00:00.000', 'Jäggi'),
('84', '2021-01-01 00:00:00.000', 'D'),
('83', '2021-01-01 00:00:00.000', 'Wyss')
) as DATA ([ID], [TIMESTAMP], [NAME])
ORDER BY [ID] DESC
UNION ALL
SELECT TOP 2 [ID], [TIMESTAMP], [NAME]
FROM (VALUES
('1', '2021-01-01 00:00:00.000', 'Käser'),
('74', '2021-01-01 00:00:00.000', 'Valérie Maier'),
('2', '2021-01-01 00:00:00.000', 'Jäggi'),
('84', '2021-01-01 00:00:00.000', 'D'),
('83', '2021-01-01 00:00:00.000', 'Wyss')
) as DATA ([ID], [TIMESTAMP], [NAME])
ORDER BY [ID] ASC
UNION ALL
SELECT TOP 2 [ID], [TIMESTAMP], [NAME]
FROM (VALUES
('1', '2021-01-01 00:00:00.000', 'Käser'),
('74', '2021-01-01 00:00:00.000', 'Valérie Maier'),
('2', '2021-01-01 00:00:00.000', 'Jäggi'),
('84', '2021-01-01 00:00:00.000', 'D'),
('83', '2021-01-01 00:00:00.000', 'Wyss')
) as DATA ([ID], [TIMESTAMP], [NAME])
ORDER BY [ID] DESC
)
SELECT [ID], [TIMESTAMP], [NAME] FROM STAGING_DATA
Your approach to insert into the staging table would work logically, unfortunately in SQL Server you cannot UNION queries that contain an ORDER BY, so the following WILL NOT WORK
SELECT TOP(1) ID,[timestamp], [name] FROM dbo.TblA ORDER BY timestamp
UNION ALL
SELECT TOP(1) ID,[timestamp], [name] FROM dbo.TblB ORDER BY timestamp
UNION ALL
SELECT TOP(1) ID,[timestamp], [name] FROM dbo.TblC ORDER BY timestamp
If you want to do the UNION, you have to put the ORDER BY in a subquery and then do the UNION. It looks like this:
--INSERT INTO dbo.Staging (ID, [timestamp], [name])
SELECT q1.ID, q1.[timestamp], q1.[name] FROM
(SELECT TOP(1) ID, [timestamp], [name] FROM dbo.TblA ORDER BY [timestamp] DESC) AS q1
UNION ALL
SELECT q2.ID, q2.[timestamp], q2.[name] FROM
(SELECT TOP(1) ID, [timestamp], [name] FROM dbo.TblB ORDER BY [timestamp] DESC) AS q2
UNION ALL
SELECT q3.ID, q3.[timestamp], q3.[name] FROM
(SELECT TOP(1) ID, [timestamp], [name] FROM dbo.TblC ORDER BY [timestamp] DESC) AS q3
It is very ugly for sure and I don't know if you would be better off with 200 separate INSERT statements, but let's just stick with this approach for now. So you can stage those records now:
INSERT INTO dbo.Staging (ID, [timestamp], [name])
SELECT q1.ID, q1.[timestamp], q1.[name] FROM
(SELECT TOP(1) ID, [timestamp], [name] FROM dbo.TblA ORDER BY [timestamp] DESC) AS q1
UNION ALL
SELECT q2.ID, q2.[timestamp], q2.[name] FROM
(SELECT TOP(1) ID, [timestamp], [name] FROM dbo.TblB ORDER BY [timestamp] DESC) AS q2
UNION ALL
SELECT q3.ID, q3.[timestamp], q3.[name] FROM
(SELECT TOP(1) ID, [timestamp], [name] FROM dbo.TblC ORDER BY [timestamp] DESC) AS q3
I assume you TRUNCATE the staging table before each run so it will only contain the records you are about to load into the final table. I myself prefer to use a combination of INNER JOINs and LEFT OUTER JOINs to find what doesn't exist and what already exists (makes debugging and development easier in my opinion, but others may disagree), but there is the MERGE approach (I will not show that here).
So to load the final table you can do something like:
-- increment existing open records
-- the INNER JOIN guarantees an existing record that matches ID
UPDATE final SET final.[count] = final.[count] + 1
FROM dbo.Staging AS stage
INNER JOIN dbo.Final AS final ON final.ID = stage.ID AND final.[status] = 'open';
-- add closed records
-- same comment about the INNER JOIN
INSERT INTO dbo.Final(ID, [timestamp], [name], [status], [count])
SELECT final.ID, final.[timestamp], final.[name], 'open', 1
FROM dbo.Staging AS stage
INNER JOIN dbo.Final AS final ON final.ID = stage.ID AND final.[status] = 'closed'
-- no match, insert these records
-- the LEFT OUTER JOIN with the WHERE clause guarantees no matching record
INSERT INTO dbo.Final(ID, [timestamp], [name], [status], [count])
SELECT stage.ID, stage.[timestamp], stage.[name], 'open', 1
FROM dbo.Staging AS stage
LEFT OUTER JOIN dbo.Final AS final ON final.ID = stage.ID
WHERE final.ID IS NULL;
I just matched on the ID value, but you can modify what is considered a match easily in the ON clause.
I have a table that stores data in a one minute timestamp from each other and I'd like to create a select command, that would fetch data from the :59 minute mark from each hour of a requested period, for example from 01.01.2020 to 01.02.2020.
How could I do this? I attach a sample of data from that table, to which the select command will refer to:
I think you're looking for something like this. In plain language the code says "For the range of start date to end date, select the hourly summary statistics for the test table without skipping any hours."
Table
drop table if exists test_table;
go
create table test_table(
ID int primary key not null,
date_dt datetime,
INP3D decimal(4, 3),
ID_device varchar(20));
Data
insert test_table(ID, date_dt, INP3D, ID_device) values
(1, '2020-08-21 13:44:34.590', 3.631, 'A1'),
(2, '2020-08-21 13:44:34.590', 1.269, 'A1'),
(3, '2020-08-21 13:44:34.590', 0.131, 'A1'),
(4, '2020-08-21 13:44:34.590', 8.169, 'A1');
--select * from test_table;
insert test_table(ID, date_dt, INP3D, ID_device) values
(5, '2020-08-21 11:44:34.590', 3.631, 'A1'),
(6, '2020-08-21 02:44:34.590', 1.269, 'A1'),
(7, '2020-08-22 11:44:34.590', 0.131, 'A1'),
(8, '2020-08-22 01:44:34.590', 8.169, 'A1');
Query
declare
#start_dt datetime='2020-08-21',
#end_dt datetime='2020-08-22';
;with
hours_cte as (
select hours_n
from
(VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12),
(13),(14),(15),(16),(17),(18),(19),(20),(21),(22),(23),(24)) v(hours_n)),
days_cte as (
select dateadd(d, hours_n-1, #start_dt) calc_day from hours_cte where hours_n<=datediff(d, #start_dt, #end_dt)+1)
select
dc.calc_day,
hc.hours_n,
count(*) row_count,
isnull(avg(INP3D), 0) inp3d_avg,
isnull(sum(INP3D+0000.000),0) inp3d_sum
from days_cte dc
cross join hours_cte hc
left join test_table t on t.date_dt between dateadd(hour, (hours_n-1), dc.calc_day)
and dateadd(hour, (hours_n), dc.calc_day)
group by
dc.calc_day,
hc.hours_n
order by
1,2;
This?
SELECT * FROM table WHERE DATEPART(MINUTE, 'your_datetime') = '59'
Datepart
I know this sounds weird, but is it possible to have a view that use dynamic SQL to build it? I know the views are compiled so most probably this is not possible. For sure I could do it using an stored procedure instead but I just want to make sure is not possible.
Here I have an example:
declare #Table1 as table (
Id int,
Name nvarchar(50),
Provider nvarchar(50)
)
insert #Table1 values (1, 'John', 'Provider1')
insert #Table1 values (2, 'Peter', 'Provider1')
insert #Table1 values (3, 'Marcus', 'Provider2')
declare #Table2 as table (
Id int,
Info nvarchar(50),
AnotherInfo nvarchar(50)
)
insert #Table2 values (1, 'Expense', '480140')
insert #Table2 values (1, 'Maintenance', '480130')
insert #Table2 values (2, 'Set Up Cost', '480150')
insert #Table2 values (2, 'Something', '480160')
--No columns from Table2
select a.Id, a.Name, a.Provider from #Table1 a left join #Table2 b on a.Id = b.Id
--With columns from Table2
select a.Id, a.Name, a.Provider, b.Info, b.AnotherInfo from #Table1 a left join #Table2 b on a.Id = b.Id
The first select looks like I have repeated data, which is normal because I did the left join, the problem is that for avoiding that I need to perform a distinct and this is what I don't want to do. My example is short but I have much more columns and table is quite big.
Let's say I have a table with an ID Identity column, some data, and a datestamp. Like this:
1 data 5/1/2013 12:30
2 data 5/2/2013 15:32
3 data 5/2/2013 16:45
4 data 5/3/2013 9:32
5 data 5/5/2013 8:21
6 data 5/4/2013 9:36
7 data 5/6/2013 11:42
How do I write a query that will show me the one record that is timestamped 5/4? The table has millions of records. I've done some searching, but I don't know what to call what I'm searching for. :/
declare #t table(id int, bla char(4), timestamp datetime)
insert #t values
(1,'data','5/1/2013 12:30'),
(2,'data','5/2/2013 15:32'),
(3,'data','5/2/2013 16:45'),
(4,'data','5/3/2013 9:32'),
(5,'data','5/5/2013 8:21'),
(6,'data','5/4/2013 9:36'),
(7,'data','5/6/2013 11:42')
select timestamp
from
(
select rn1 = row_number() over (order by id),
rn2 = row_number() over (order by timestamp), timestamp
from #t
) a
where rn1 not in (rn2, rn2-1)
in 2008 r2, this would be a way
DECLARE #Table AS TABLE
(id INT , ladate DATETIME)
INSERT INTO #Table VALUES (1, '2013-05-01')
INSERT INTO #Table VALUES (2, '2013-05-02')
INSERT INTO #Table VALUES (3, '2013-05-03')
INSERT INTO #Table VALUES (4, '2013-05-05')
INSERT INTO #Table VALUES (5, '2013-05-04')
INSERT INTO #Table VALUES (6, '2013-05-06')
INSERT INTO #Table VALUES (7, '2013-05-07')
INSERT INTO #Table VALUES (8, '2013-05-08')
--I added the records in the sort order but if not just make sure you are sorted in the query
SELECT t2.ladate FROM #Table T1
INNER JOIN #Table T2 ON T1.Id = T2.Id + 1
INNER JOIN #Table t3 ON t2.id = t3.id + 1
WHERE t3.ladate < t2.ladate AND t2.ladate > t1.ladate
-- I made the assumption that your Id are all there, 1,2,3,4,5.... none missing... if there are rownumbers missing, you can use row_number()
i cant add zero values instead of null, here is my sql:
SELECT
S.STOCK_ID,
S.PRODUCT_NAME,
SUM(COALESCE(AMOUNT,0)) AMOUNT,
DATEPART(MM,INVOICE_DATE) AY
FROM
#DSN3_ALIAS#.STOCKS S
LEFT OUTER JOIN DAILY_PRODUCT_SALES DPS ON S.STOCK_ID = DPS.PRODUCT_ID
WHERE
MONTH(INVOICE_DATE) >= #attributes.startdate# AND
MONTH(INVOICE_DATE) < #attributes.finishdate+1#
GROUP BY
DATEPART(MM,INVOICE_DATE),
S.STOCK_ID,
S.PRODUCT_NAME
ORDER BY
S.PRODUCT_NAME
and my table:
<cfoutput query="get_sales_total" group="stock_id">
<tr height="20" class="color-row">
<td>#product_name#</td>
<cfoutput group="ay"><td><cfif len(amount)>#amount#<cfelse>0</cfif></td></cfoutput>
</tr>
</cfoutput>
the result i want:
and the result i get:
thank you all for the help!
+ EDIT :
I have used the cross join technique, rewrote the sql:
SELECT
SUM(COALESCE(AMOUNT,0)) AMOUNT,S.STOCK_ID,S.PRODUCT_NAME,DPS.AY
FROM
#DSN3_ALIAS#.STOCKS S
CROSS JOIN (SELECT DISTINCT <cfif attributes.time_type eq 2>DATEPART(MM,INVOICE_DATE) AY<cfelse>DATEPART(DD,INVOICE_DATE) AY</cfif>
FROM DAILY_PRODUCT_SALES) DPS
LEFT OUTER JOIN DAILY_PRODUCT_SALES DP ON S.STOCK_ID = DP.PRODUCT_ID AND
<cfif attributes.time_type eq 2>DATEPART(MM,DP.INVOICE_DATE)<cfelse>DATEPART(DD,DP.INVOICE_DATE)</cfif> = DPS.AY
WHERE
<cfif attributes.time_type eq 2>
MONTH(INVOICE_DATE) >= #attributes.startdate# AND
MONTH(INVOICE_DATE) < #attributes.finishdate+1#
<cfelse>
MONTH(INVOICE_DATE) = #attributes.startdate#
</cfif>
<cfif len(trim(attributes.product_cat)) and len(attributes.product_code)>
AND S.STOCK_CODE LIKE '#attributes.product_code#%'
</cfif>
GROUP BY DPS.AY,S.STOCK_ID,S.PRODUCT_NAME
ORDER BY DPS.AY,S.STOCK_ID,S.PRODUCT_NAME
and the result is:
Use CASE instead
SUM(CASE WHEN A IS NULL THEN 0 ELSE A END)
You can do it in the database as Lasse suggested, or you can wrap each output value in a Val function, like so:
<cfoutput group="ay"><td>#Val(amount)#</td></cfoutput>
The Val function will convert any non-numeric value to 0.
Can you use ISNULL instead, ie;
SUM(ISNULL(AMOUNT,0)) AMOUNT,
?
EDIT: okay, given that the problem seems to be missing values rather than nulls as such. try something like this.
First, create a permanent reporting_framework table. This one is based on months and years but you could extend it into days if you wished.
create table reporting_framework
([month] smallint, [year] smallint);
go
declare #year smallint;
declare #month smallint;
set #year=2000;
while #year<2500
begin
set #month=1;
while #month<13
begin
insert into reporting_framework ([month], [year]) values (#month, #year);
set #month=#month+1;
end
set #year=#year+1;
end
select * from reporting_framework;
(this gives you 6000 rows, from 2000 to 2499 - adjust to taste!)
Now we'll make a table of parts and a table of orders
create table parts
([part_num] integer, [description] varchar(100));
go
insert into parts (part_num, [description]) values (100, 'Widget');
insert into parts (part_num, [description]) values (101, 'Sprocket');
insert into parts (part_num, [description]) values (102, 'Gizmo');
insert into parts (part_num, [description]) values (103, 'Foobar');
create table orders
([id] integer, part_num integer, cost numeric(10,2), orderdate datetime);
go
insert into orders ([id], part_num, cost, orderdate) values
(1, 100, 49.99, '2011-10-30');
insert into orders ([id], part_num, cost, orderdate) values
(2, 101, 109.99, '2011-10-31');
insert into orders ([id], part_num, cost, orderdate) values
(3, 100, 47.99, '2011-10-31');
insert into orders ([id], part_num, cost, orderdate) values
(4, 102, 429.99, '2011-11-01');
insert into orders ([id], part_num, cost, orderdate) values
(5, 101, 111.17, '2011-11-01');
insert into orders ([id], part_num, cost, orderdate) values
(6, 101, 111.17, '2011-11-01');
insert into orders ([id], part_num, cost, orderdate) values
(7, 103, 21.00, '2011-09-15');
Now this is the table you base your query on, eg;
select rf.month, rf.year, p.description, sum(isnull(o.cost,0))
from reporting_framework rf cross join parts p
full outer join orders o
on rf.year=year(o.orderdate) and rf.month=month(o.orderdate)
and p.part_num=o.part_num
where rf.year='2011'
group by p.description, rf.month, rf.year
order by rf.year, rf.month, p.description
Does this example help? There are probably loads of better ways of doing this (hello StackOverflow) but it might get you started thinking about what your problem is.
Not the CROSS JOIN to get all parts/dates combinations and then the FULL OUTER JOIN to get the orders into it.
The 'where' clause is just controlling your date range.