Should this T-SQL be done using a UNION - sql-server

Using the table below (call it TableA), I need to create an SQL statement that selects two sets of data and combines them together. First, I need to select those rows where Status = 1 and the DateCreated is greater (meaning newer) than a specified date, that I'll call the StartDate. I also need to select all those rows where Status = 0 and the DateCreated is also greater than specified date BUT where the results are sorted by DateCreated descendingly AND the number of these records is limited to 2.
So if my table data looks like this:
ID Status DateCreated
1 1 2013-05-01 14:00
2 1 2013-05-01 15:00
3 1 2013-05-01 16:00
4 0 2013-05-01 17:00
5 0 2013-05-01 18:00
6 0 2013-05-01 19:00
7 0 2013-05-01 20:00
and I set the #startDate to 2013-05-01 14:30, I want the result set to look like this:
2 1 2013-05-01 15:00
3 1 2013-05-01 16:00
6 0 2013-05-01 19:00
7 0 2013-05-01 20:00
Is this best done with a Union that joins two results or is there a better more efficient way?

You should benchmark with your real data set for performance differences, but just to give you an alternative you can write it using ROW_NUMBER() instead;
SELECT id, status, datecreated FROM (
SELECT id, status, datecreated,
ROW_NUMBER() OVER (PARTITION BY status ORDER BY DateCreated DESC) rn
FROM Table1 WHERE DateCreated > '2013-05-01 14:30'
) a
WHERE status = 1 OR rn < 3
ORDER BY DateCreated;
An SQLfiddle to test with.

No need for UNION - just a WHERE clause translation of your requirements:
declare #t table (ID int not null,Status int not null,DateCreated datetime not null)
insert into #t(ID,Status,DateCreated) values
(1,1,'2013-05-01T14:00:00'),
(2,1,'2013-05-01T15:00:00'),
(3,1,'2013-05-01T16:00:00'),
(4,0,'2013-05-01T17:00:00'),
(5,0,'2013-05-01T18:00:00'),
(6,0,'2013-05-01T19:00:00'),
(7,0,'2013-05-01T20:00:00')
declare #startDate datetime
set #startDate ='2013-05-01T14:30:00'
;With Numbered as (
select *,ROW_NUMBER() OVER (PARTITION BY Status ORDER BY DateCreated desc) as rn
from #t
)
select * from Numbered
where
DateCreated > #startDate and
(
Status = 1 or
Status = 0 and rn <= 2
)
Admittedly, you only need the row numbers for Status 0, but there shouldn't be any harm in running it across all rows.

Try this one -
Query:
DECLARE #temp TABLE
(
ID INT
, [Status] INT
, DateCreated DATETIME
)
INSERT INTO #temp(ID, [Status], DateCreated)
VALUES
(1, 1, '20130501 14:00:00'),
(2, 1, '20130501 15:00:00'),
(3, 1, '20130501 16:00:00'),
(4, 0, '20130501 17:00:00'),
(5, 0, '20130501 18:00:00'),
(6, 0, '20130501 19:00:00'),
(7, 0, '20130501 20:00:00')
DECLARE #startDate DATETIME = '20130501 14:30:00'
SELECT
ID
, [Status]
, DateCreated
FROM (
SELECT
ID
, [Status]
, DateCreated
, ROW_NUMBER() OVER (PARTITION BY [Status] ORDER BY DateCreated DESC) AS rn
FROM #temp
) t
WHERE DateCreated > #startDate
AND (
[Status] % 1 = 1
OR
rn < 3
)
ORDER BY t.DateCreated
Output:
ID Status DateCreated
----------- ----------- -----------------------
2 1 2013-05-01 15:00:00.000
3 1 2013-05-01 16:00:00.000
6 0 2013-05-01 19:00:00.000
7 0 2013-05-01 20:00:00.000

Related

SQL query to count the number of records in a time range?

Suppose you have a table with an user id and date+time (for simplicity in steps of 1 hour)
The table here is ordered by agent and time stamp.
Usr Date Comment
1 2022-11-29 12:00 <- Start of a sequence
1 2022-11-29 13:00
1 2022-11-29 14:00
1 2022-11-30 12:00 <- Start of a sequence
1 2022-11-30 16:00 <- Start of a sequence
2 2022-11-29 22:00 <- Start of a sequence
2 2022-11-29 23:00
2 2022-11-30 00:00 <- Start of a sequence
2 2022-11-30 01:00
3 2022-11-29 13:00 <- Start of a sequence
3 2022-11-29 14:00
3 2022-11-30 12:00 <- Start of a sequence
3 2022-11-30 13:00
3 2022-11-30 14:00
4 2022-11-30 12:00 <- Start of a sequence
4 2022-11-30 13:00
4 2022-11-30 14:00
5 2022-11-30 16:00 <- Start of a sequence
Expected result is the start of a sequence and its length.
For simplicity each gap is 1 hour.
The start of a new day (00:00) always starts a new sequence
Usr Date Length
1 2022-11-29 12:00 3
1 2022-11-30 12:00 1
1 2022-11-30 16:00 1
2 2022-11-29 22:00 2
2 2022-11-30 00:00 2
3 2022-11-29 13:00 2
3 2022-11-30 12:00 3
4 2022-11-30 12:00 3
5 2022-11-30 16:00 1
I found some code samples with dense_rank and row_number but didn't got a result that was expected.
I have a solution running over each record in the source table and and creating the result table, but it is slow.
The query has to run on a SQL 2012 or later.
This problem has many solutions. It is a question of the required query performance.
I would recommend to calculate the duration at the moment of recording, if there is a question of speed.
Example of a query that returns the required result:
DROP TABLE IF EXISTS dbo.test;
CREATE TABLE dbo.test
(
Usr INT, [Date] DATETIME);
DECLARE #gap INT = 1;
INSERT INTO dbo.test (Usr, [Date])
VALUES (1, '2022-11-29T12:00:00')
, (1, '2022-11-29T13:00:00')
, (1, '2022-11-29T14:00:00')
, (1, '2022-11-30T12:00:00')
, (1, '2022-11-30T16:00:00')
, (2, '2022-11-29T22:00:00')
, (2, '2022-11-29T23:00:00')
, (2, '2022-11-30T00:00:00')
, (2, '2022-11-30T01:00:00')
, (3, '2022-11-29T13:00:00')
, (3, '2022-11-29T14:00:00')
, (3, '2022-11-30T12:00:00')
, (3, '2022-11-30T13:00:00')
, (3, '2022-11-30T14:00:00')
, (4, '2022-11-30T12:00:00')
, (4, '2022-11-30T13:00:00')
, (4, '2022-11-30T14:00:00')
, (5, '2022-11-30T16:00:00');
WITH lag_cte AS
(
SELECT *
, LAG([Date], 1, [Date]) OVER (PARTITION BY Usr, CAST([Date] AS DATE) ORDER BY [Date]) lead_date --previous time by usr and date
, DATEADD(HOUR, -#gap, [Date]) gap_date --calc same group time for comprassion
, ROW_NUMBER() OVER (ORDER BY Usr, [Date]) rn --sort of identity
FROM dbo.test
)
SELECT lc.Usr
, MIN(lc.[Date]) AS [date]
, COUNT(1) AS [length]
FROM lag_cte lc
OUTER APPLY (--previous start of sequence
SELECT TOP 1 rn AS grouping_rn
FROM lag_cte li
WHERE li.Usr = lc.Usr
AND li.[Date] <= lc.[Date]
AND li.lead_date != li.gap_date --sequence staert marker
ORDER BY li.[Date] DESC
) g
GROUP BY lc.usr, CAST(lc.[Date] AS DATE), g.grouping_rn
As #Arzanis mentioned, there are many solutions that provide the desired result. The example below should perform reasonably well with a composite primary key on Usr and Date.
WITH time_sequences AS (
SELECT
Usr
,Date
,LAG(Usr) OVER(PARTITION BY Usr ORDER BY Date) AS PrevUsr
,LAG(Date) OVER(PARTITION BY Usr ORDER BY Date) AS PrevDate
,LEAD(Usr) OVER(PARTITION BY Usr ORDER BY Date) AS NextUsr
,LEAD(Date) OVER(PARTITION BY Usr ORDER BY Date) AS NextDate
FROM dbo.test
)
,start_sequences AS (
SELECT
Usr
,Date
,'start' AS comment
,ROW_NUMBER() OVER(PARTITION BY Usr ORDER BY Date) AS seq
FROM time_sequences
WHERE PrevUsr IS NULL OR PrevDate <> DATEADD(hour, -1, Date) OR CAST(Date AS date) <> CAST(PrevDate AS date)
)
,end_sequences AS (
SELECT
Usr
,Date
,'end' AS comment
,ROW_NUMBER() OVER(PARTITION BY Usr ORDER BY Date) AS seq
FROM time_sequences
WHERE NextUsr IS NULL OR NextDate <> DATEADD(hour, 1, Date) OR CAST(Date AS date) <> CAST(NextDate AS date)
)
SELECT ss.Usr, ss.Date, DATEDIFF(hour, ss.Date, es.Date) + 1 AS SeqLength
FROM start_sequences AS ss
JOIN end_sequences AS es ON es.Usr = ss.Usr AND es.seq = ss.seq
ORDER BY
es.Usr
, es.Date;
your data
CREATE TABLE mytable(
Usr INTEGER NOT NULL
,Date DATEtime NOT NULL
,Comment VARCHAR(100)
);
INSERT INTO mytable(Usr,Date,Comment) VALUES
(1,'2022-11-29 12:00','<- Start of a sequence'),
(1,'2022-11-29 13:00',NULL),
(1,'2022-11-29 14:00',NULL),
(1,'2022-11-30 12:00','<- Start of a sequence'),
(1,'2022-11-30 16:00','<- Start of a sequence'),
(2,'2022-11-29 22:00','<- Start of a sequence'),
(2,'2022-11-29 23:00',NULL),
(2,'2022-11-30 00:00','<- Start of a sequence'),
(2,'2022-11-30 01:00',NULL),
(3,'2022-11-29 13:00','<- Start of a sequence'),
(3,'2022-11-29 14:00',NULL),
(3,'2022-11-30 12:00','<- Start of a sequence'),
(3,'2022-11-30 13:00',NULL),
(3,'2022-11-30 14:00',NULL),
(4,'2022-11-30 12:00','<- Start of a sequence'),
(4,'2022-11-30 13:00',NULL),
(4,'2022-11-30 14:00',NULL),
(5,'2022-11-30 16:00','<- Start of a sequence');
your query
select usr,
comment,
count(comment) Length
From (
SELECT
usr
,CASE
WHEN Comment IS NULL THEN (
SELECT TOP 1
cast(inner_table.Date as varchar(100))
FROM
mytable as inner_table
WHERE
inner_table.Usr = mytable.Usr
AND inner_table.Date < mytable.Date
AND inner_table.Comment IS NOT NULL
ORDER BY
inner_table.Date DESC
)
ELSE
Date
END as Comment
FROM
mytable) a
group by usr,comment
order by usr,comment
dbfiddle

Add a Total row after each week

I am searching for a way to sum columns by week.
This is the initial table data.
Date WeekNo Col1 Col2 Col3
2020/07/01 27 1 4 3
2020/07/04 27 3 3 1
2020/07/06 28 1 1 1
2020/07/11 28 1 3 8
and I want to add a row total to every end of the week like this:
Date WeekNo Col1 Col2 Col3
2020/07/01 27 1 4 3
2020/07/04 27 3 3 1
TOTAL 27 4 7 4
2020/07/06 28 1 1 1
2020/07/11 28 1 3 8
TOTAL 28 2 4 9
I tried something similar the the code below but I have multiple columns to sum.
Do you have other ideas, suggestions?
Also, grouping sets does not create a new row if there is no data for that week to sum (like 0 or NULL).
SELECT
YEAR(Date) AS OrderYear,
MONTH(Date) AS OrderMonth,
SUM(Col1) AS SumCol1
FROM tb
GROUP BY
GROUPING SETS
(
YEAR(Date), --1st grouping set
(YEAR(Date),MONTH(Date)) --2nd grouping set
)
Is this what you are after?
My solution uses the built in with rollup addition to the group by clause.
-- create sample data
declare #data table
(
[Date] Date,
WeekNo int,
Col1 int,
Col2 int,
Col3 int
);
insert into #data ([Date], WeekNo, Col1, Col2, Col3) values
('2020-07-01', 27, 1, 4, 3),
('2020-07-04', 27, 3, 3, 1),
('2020-07-06', 28, 1, 1, 1),
('2020-07-11', 28, 1, 3, 8);
-- solution
select case when grouping(d.Date) = 0
then convert(nvarchar(10), d.Date) -- type conversion so all column values have the same type
else 'TOTAL'
end as 'Date',
d.WeekNo,
sum(d.Col1) as 'Col1',
sum(d.Col2) as 'Col2',
sum(d.Col3) as 'Col3'
from #data d
group by d.WeekNo, d.Date with rollup -- "roll up" the aggregations
having grouping(d.WeekNo) = 0; -- filter out aggregation across weeks
Gives me:
Date WeekNo Col1 Col2 Col3
---------- ----------- ----------- ----------- -----------
2020-07-01 27 1 4 3
2020-07-04 27 3 3 1
TOTAL 27 4 7 4
2020-07-06 28 1 1 1
2020-07-11 28 1 3 8
TOTAL 28 2 4 9
You only need to add (DATEPART(wk, Date)) as aggregation rule after listing the other non-aggregated columns Date,Col1, Col2, Col3. Btw, you do not need a WeekNo column, since it could already be computer by use of (DATEPART(wk, Date)).
So far so good, but ordering is such a daunting task as returning null values for WeekNo column for subtotal. I used two analytic functions ( ROW_NUMBER() and FISRT_VALUE() to overcome this problem ) :
SELECT COALESCE(Date,'TOTAL') As Date,
DATEPART(wk, Date) AS WeekNo,
SUM(Col1) AS Col1,
SUM(Col2) AS Col2,
SUM(Col3) AS Col3
FROM tb
GROUP BY GROUPING SETS ( ( Date,Col1, Col2, Col3 ),
(DATEPART(wk, Date))
)
ORDER BY CASE WHEN DATEPART(wk, Date) IS NULL
THEN
ROW_NUMBER() OVER (PARTITION BY DATEPART(wk, Date)
ORDER BY COALESCE(Date,'TOTAL') )
ELSE
DATEPART(wk, Date) -
FIRST_VALUE(DATEPART(wk, Date))
OVER ( ORDER BY COALESCE(Date,'TOTAL') ) + 1
END,
DATEPART(wk, Date)
Demo

SQL Server: fill a range with dates from overlapping intervals with priority

I need to fill the range from 2017-04-01 to 2017-04-30 with the data from this table, knowing that the highest priority records should prevail over those with lower priorities
id startValidity endValidity priority
-------------------------------------------
1004 2017-04-03 2017-04-30 1
1005 2017-04-10 2017-04-22 2
1010 2017-04-19 2017-04-23 3
1006 2017-04-24 2017-04-28 2
1008 2017-04-26 2017-04-28 3
In practice I would need to get a result like this:
id startValidity endValidity priority
--------------------------------------------
1004 2017-04-03 2017-04-09 1
1005 2017-04-10 2017-04-18 2
1010 2017-04-19 2017-04-23 3
1006 2017-04-24 2017-04-25 2
1008 2017-04-26 2017-04-28 3
1004 2017-04-29 2017-04-30 1
can't think of anything elegant or more efficient solution right now . . .
-- Sample Table
declare #tbl table
(
id int,
startValidity date,
endValidty date,
priority int
)
-- Sample Data
insert into #tbl select 1004, '2017-04-03', '2017-04-30', 1
insert into #tbl select 1005, '2017-04-10', '2017-04-22', 2
insert into #tbl select 1010, '2017-04-19', '2017-04-23', 3
insert into #tbl select 1006, '2017-04-24', '2017-04-28', 2
insert into #tbl select 1008, '2017-04-26', '2017-04-28', 3
-- Query
; with
date_range as -- find the min and max date for generating list of dates
(
select start_date = min(startValidity), end_date = max(endValidty)
from #tbl
),
dates as -- gen the list of dates using recursive CTE
(
select rn = 1, date = start_date
from date_range
union all
select rn = rn + 1, date = dateadd(day, 1, d.date)
from dates d
where d.date < (select end_date from date_range)
),
cte as -- for each date, get the ID based on priority
(
select *, grp = row_number() over(order by id) - rn
from dates d
outer apply
(
select top 1 x.id, x.priority
from #tbl x
where x.startValidity <= d.date
and x.endValidty >= d.date
order by x.priority desc
) t
)
-- final result
select id, startValidity = min(date), endValidty = max(date), priority
from cte
group by grp, id, priority
order by startValidity
I do not understand the purpose of Calendar CTE or table.
So I am not using any REcursive CTE or calendar.
May be I hvn't understood the requirement completly.
Try this with diff sample data,
declare #tbl table
(
id int,
startValidity date,
endValidty date,
priority int
)
-- Sample Data
insert into #tbl select 1004, '2017-04-03', '2017-04-30', 1
insert into #tbl select 1005, '2017-04-10', '2017-04-22', 2
insert into #tbl select 1010, '2017-04-19', '2017-04-23', 3
insert into #tbl select 1006, '2017-04-24', '2017-04-28', 2
insert into #tbl select 1008, '2017-04-26', '2017-04-28', 3
;With CTE as
(
select * ,ROW_NUMBER()over(order by startValidity)rn
from #tbl
)
,CTE1 as
(
select c.id,c.startvalidity,isnull(dateadd(day,-1, c1.startvalidity)
,c.endValidty) Endvalidity
,c.[priority],c.rn
from cte c
left join cte c1
on c.rn+1=c1.rn
)
select id,startvalidity,Endvalidity,priority from cte1
union ALL
select id,startvalidity,Endvalidity,priority from
(
select top 1 id,ca.startvalidity,ca.Endvalidity,priority from cte1
cross apply(
select top 1
dateadd(day,1,endvalidity) startvalidity
,dateadd(day,-1,dateadd(month, datediff(month,0,endvalidity)+1,0)) Endvalidity
from cte1
order by rn desc)CA
order by priority
)t4
--order by startvalidity --if req

TSQL for Repeating rows within table

I have single table with below fields.
id name startdate enddate
1 u1 2013-01-15 00:00:00.000 2013-01-17 00:00:00.000
2 u2 2013-01-22 00:00:00.000 2013-01-23 00:00:00.000
3 u3 2013-01-23 00:00:00.000 2013-01-23 00:00:00.000
Now, I want multiple rows depends on start and end dates. So as per above rows.. It returns with three rows for first record.. date 15 to 17 which returns 3 rows (3 days).
I am bit confused for query. Is there any better way or any sample to achieve?
Thanks.
You could use a CTE to solve that:
DECLARE #Id int
SELECT #Id = 1
;
WITH Multiple AS
(
SELECT 1 Sequence, Id, Name, StartDate, EndDate
FROM ( VALUES
(1, 'u1', '2013-01-15', '2013-01-17'),
(2, 'u2', '2013-01-22', '2013-01-23'),
(3, 'u3', '2013-01-23', '2013-01-23')
) AS Sample(Id, Name, StartDate, EndDate)
WHERE Id = #Id
UNION ALL
SELECT Sequence + 1, Id, Name, StartDate, EndDate
FROM Multiple
WHERE Id = #Id AND DATEADD(d, Sequence, StartDate) <= EndDate
)
SELECT *
FROM Multiple
If you have a 'Dates' table with a 'Date' column in there, just join your table to the 'Dates' on 'Dates.Date BETWEEN startdate AND enddate'.
I am not pretty sure but below is what I have tried..
SELECT c1.*
FROM master..spt_values p
INNER JOIN tempQuery c1
ON RIGHT(CAST(c1.stdate AS DATE),2) <= (CASE WHEN p.number = 0 THEN 1 ELSE p.number END)
WHERE TYPE='p'
AND p.number BETWEEN RIGHT(CAST(c1.stdate AS DATE),2) AND RIGHT(CAST(c1.endate AS DATE),2)

SQL - 2 Counts in one query

I have 2 queries which return counts of different information in a table:
SELECT Date, COUNT(*) AS Total
FROM Table
WHERE Type = 7 AND Date >= '2010-01-01'
GROUP BY Date
HAVING COUNT(*) > 5000
ORDER BY Date
which returns the totals for all of the 'busy' dates:
Date Total
---------- -----------
2010-01-05 9466
2010-02-02 8747
2010-03-02 9010
2010-04-06 7916
2010-05-05 9342
2010-06-02 8723
2010-07-02 7829
2010-08-03 8411
2010-09-02 7687
2010-10-04 7706
2010-11-02 8567
2010-12-02 7645
and
SELECT Date, COUNT(*) AS Failures
FROM Table
WHERE Type = 7 AND ErrorCode = -2 AND Date >= '2010-01-01'
GROUP BY Date
ORDER BY Date
which returns the total failures (all of which happened on busy dates):
Date Failures
---------- -----------
2010-09-02 29
2010-10-04 16
2010-11-02 8
Is it possible to combine these into a single query to return one result?
E.g.:
Date Total Failures
---------- ----------- -----------
2010-01-05 9466
2010-02-02 8747
2010-03-02 9010
2010-04-06 7916
2010-05-05 9342
2010-06-02 8723
2010-07-02 7829
2010-08-03 8411
2010-09-02 7687 29
2010-10-04 7706 16
2010-11-02 8567 8
2010-12-02 7645
;With baseData As
(
SELECT
Date,
COUNT(*) AS Total,
COUNT(CASE WHEN ErrorCode = -2 THEN 1 END) AS Failures
FROM Table
WHERE Type = 7 AND Date >= '2010-01-01'
GROUP BY Date
)
SELECT
Date,
Total,
Failures,
CAST(Failures AS float)/Total AS Ratio
FROM baseData
WHERE Total > 5000 OR Failures > 0
ORDER BY Date
If you can refactor to the same where clause, this should be possible.
I haven't taken your HAVING(Count()) into consideration
SELECT [Date], COUNT(*) AS Total, SUM(CASE WHEN ErrorCode = -2 THEN 1 ELSE 0 END) AS Failures
FROM [Table]
WHERE [Type] = 7 AND [Date] >= '2010-01-01'
GROUP BY [Date]
ORDER BY [Date]
Edit : Here is some test data
create table [Table]
(
[ErrorCode] int,
[Type] int,
[Date] datetime
)
insert into [table]([Date], [Type], [ErrorCode] )values ('1 Jan 2010', 7, 0)
insert into [table]([Date], [Type], [ErrorCode] )values ('1 Jan 2010', 7, -2)
insert into [table]([Date], [Type], [ErrorCode] )values ('2 Jan 2010', 7, -2)
insert into [table]([Date], [Type], [ErrorCode] )values ('2 Jan 2010', 8, -2)
insert into [table]([Date], [Type], [ErrorCode] )values ('2 Jan 2010', 7, 1)
yes you should be able to do a UNION ALL between the 2 tables

Resources