I am stuck with a problem related to SQL while implementing data densification.
I have a table containing weather data for different stations. This table updates every 10 minutes. But sometimes not all the station data is available after the 10 minute update. Additionally, I want to drill down it's granularity to 1 minute i.e all data points should be available every minute.
I have implemented a CTE to make a column of DateTime with 1 minute granularity. Now I need to populate the NULL values with the station names and sum of rainfall data wrt to it's station name and previous data.
I am trying to use windowing partition functions but it's not working out yet.
enter image description here
The SQL query is this:
DECLARE #FromDate DateTime = '2021-01-01 00:00:00.000',
#ToDate DateTime = '2021-08-01 00:00:00.000'
;WITH DateCte (DateTime) AS
(
SELECT #FromDate
UNION ALL
SELECT DATEADD(MINUTE, 1, DateTime)
FROM DateCte
WHERE DateTime < #ToDate
)
SELECT
DateTime,
DATEADD(mi, DATEDIFF(mi, 0, [LastObservations].[Local_Time]), 0) AS [Local Time],
[LastObservations].[Name_En],
[LastObservations].[Rainfall] AS [Value]
FROM
DateCte
LEFT OUTER JOIN
[ODW].[QMD].[LastObservations] ON DateCte.DateTime = DATEADD(mi, DATEDIFF(mi, 0, [LastObservations].[Local_Time]), 0)
AND [LastObservations].[Local_Time] >= '2021-01-01 00:00:00.000'
AND [LastObservations].[Local_Time] <= '2021-08-01 00:00:00.000'
ORDER BY
DateCte.DateTime
OPTION (MaxRecursion 0)
Related
I have a subscription table with a user ID, a subscription start date and a subscription end date. I also have a calendar table with a datestamp field, that is every single date starting from the first subscription date in my subscription table.
I am trying to write something that would give me a table with a date column and three numbers: number of total active (on that day), number of new subscribers, number of unsubscribers.
(N.B. I tried to insert sample tables using the suggested GitHub Flavoured Markdown but it just all goes into one row.)
Currently I am playing with a query that creates multiple joins between the two tables, one for each number:
select a.datestamp
,count(distinct case when b_sub.UserID is not null then b_sub.UserID end) as total_w_subscription
,count(distinct case when b_in.UserID is not null then b_in.UserID end) as total_subscribed
,count(distinct case when b_out.UserID is not null then b_out.UserID end) as total_unsubscribed
from Calendar as a
left join Subscription as b_sub -- all those with subscription on given date
on b_sub.sub_dt <= a.datestamp
and (b_sub.unsub_dt > a.datestamp or b_sub.unsub_dt is null)
left join Subscription as b_in -- all those that subscribed on given date
on b_in.sub_dt = a.datestamp
left join Subscription as b_out -- all those that unsubscribed on given date
on b_out.unsub_dt = a.datestamp
where a.datestamp > '2021-06-10'
group by a.datestamp
order by datestamp asc
;
I have indexed the date fields in both tables. If I only look at one day, it runs in 3 seconds. Two days already takes forever. The Sub table is over 2.6M records and ideally I'll need my timeline to begin sometime in 2012.
What would be the most time efficient way to do this?
You're on the right track. I created some table variables and assumed a data structure that has each subscription include a start and end date.
--Create #dates table variable for calendar
DECLARE #startDate DATETIME = '2018-01-01'
DECLARE #endDate DATETIME = '2021-06-18'
DECLARE #dates TABLE
(
reportingdate DATETIME
)
WHILE #startDate <= #endDate
BEGIN
INSERT INTO #dates SELECT #startDate
SET #startDate += 1
END
--Create #subscriptions table variable for subcriptions to join onto calendar
DECLARE #subscriptions TABLE
(
id INT
,startDate DATETIME
,endDate DATETIME
)
INSERT INTO #subscriptions
VALUES
(1,'2018-01-01 00:00:00.000','2019-10-07 00:00:00.000')
,(2,'2018-01-11 00:00:00.000','2019-12-21 00:00:00.000')
,(3,'2019-04-21 00:00:00.000','2020-03-19 00:00:00.000')
,(4,'2019-12-09 00:00:00.000','2020-05-14 00:00:00.000')
,(5,'2020-04-26 00:00:00.000','2020-07-06 00:00:00.000')
,(6,'2020-05-02 00:00:00.000',NULL)
,(7,'2020-08-31 00:00:00.000','2020-10-29 00:00:00.000')
,(8,'2020-12-13 00:00:00.000','2021-01-13 00:00:00.000')
,(9,'2021-02-12 00:00:00.000','2021-04-19 00:00:00.000')
,(10,'2021-06-10 00:00:00.000',NULL)
;
Then I join the subscription onto the calendar table.
--CTE to join subscription onto calendar and use ROW_NUMBER functions
WITH cte AS (
SELECT
s.id AS SubID
,d.ReportingDate
,ROW_NUMBER() OVER (PARTITION BY s.id ORDER BY d.ReportingDate) AS asc_rn --used to identify 1st
,ROW_NUMBER() OVER (PARTITION BY s.id ORDER BY d.ReportingDate DESC) AS desc_rn --used to identify last
,CASE WHEN s.endDate IS NULL THEN 1 ELSE 0 END AS ActiveSub
FROM #subscriptions s
LEFT JOIN #dates d ON
d.reportingdate BETWEEN s.startDate AND ISNULL(s.endDate,'9999-12-31')
)
I used ROW_NUMBER to identify the first and last date rows of the subscription, as well as checking if the subscription endDate is NULL (still active). I then query the CTE to count subscriptions grouped by day, as well as summing new and terminated subscriptions grouped by day.
--Query CTE using asc_rn, desc_rn, and ActiveSub to identify new subscribers and unsubscribers.
SELECT
ReportingDate
,COUNT(*) AS TotalSubscribers
,SUM(CASE WHEN asc_rn = 1 THEN 1 ELSE 0 END) AS NewSubscribers
,SUM(CASE WHEN desc_rn = 1 AND ActiveSub = 0 THEN 1 ELSE 0 END) AS UnSubscribers
FROM cte
GROUP BY ReportingDate
ORDER BY ReportingDate
In SAP B1, I wanted to count data that is created 3 hours prior to current time.
I already have this logic
SELECT COUNT(*)
FROM OIVL T0
WHERE T0.CreateDate >= DATEADD(hour, -12, GETDATE())
But the problem is the CreateDate column only contains Date, without time.
It has time but in smallint
Given the discussion in comments about how CreateTime is encoded by SAP B1, you're going to need to compute each row's datetime value to compare it with the current time, e.g.:
create table #Example (
CreateDate date,
CreateTime smallint
);
insert #Example (CreateDate, CreateTime) values ('2020-08-30', 2324);
select
CreateDate,
CreateTime,
CreateDateTime --2020-08-30 23:24:00.000
from #Example
outer apply (
select dateadd(mi, (CreateTime/100)*60+CreateTime%100, cast(CreateDate as datetime)) as CreateDateTime
) Calc
where CreateDateTime >= dateadd(hour, -12, getdate());
Starting data:
Desired results something like this:
So it calculated the number of hours until the end of StartDateTime, if the EndDateTime is greater than end of day for StartDateTime. Then for every full day in between, it calculates 24 hours (this could stretch numerous days). And then when it gets to the EndDateTime - it calculates time from midnight (morning) to EndDateTime
I'm reading that I will probably need to use a recursive CTE, but I don't have any experience with recursions and am struggling.
this might get tricky, but I guess it can be solved using so called number table - i.e. table which has only one column populated with number sequence. In our case 0 based sequence.
The trick here is to get the number of days between start and end datetime. This value used in join between the data table and the numbers table will create the needed extra rows for each per day interval.
Of course we also have to setup properly start and end datetime of each day interval (CASE terms in the CTE)
Then we get for each per day interval number of minutes and divide by 60 to get proper decimal value.
Hope this helps.
Lets see the code:
-- input data
DECLARE #v_Dates TABLE
(
id varchar(20),
StartDateTime SMALLDATETIME,
EndDateTime SMALLDATETIME
)
INSERT INTO #v_Dates (id, StartDateTime, EndDateTime)
VALUES ('example 1', '02-17-2019 0:45', '02-19-19 12:30'),
('example 2', '02-21-2019 18:00', '02-22-19 12:15'),
('example 3', '02-22-2019 20:15', '02-22-19 20:30');
-- so called Number table which holds numbers 0 - 9999 in this case
DECLARE #v_Numbers TABLE
(
Number INT
);
-- populating the number table
INSERT INTO #v_Numbers
SELECT TOP 10000 ROW_NUMBER() OVER(ORDER by t1.number) - 1 as Number
FROM master..spt_values t1
CROSS JOIN master..spt_values t2
-- we parse the dates into the per day intervals
;WITH IntervalsParsed(id, StartDateTime, EndDateTime, Number, IntervalStartDateTime, IntervalEndDateTime) AS
(
SELECT id
,StartDateTime
,EndDateTime
,Number
, InervalStartDateTime = CASE
WHEN D.StartDateTime > DATEADD(day, DATEDIFF(day, 0, D.StartDateTime), N.Number) THEN D.StartDateTime
ELSE DATEADD(day, DATEDIFF(day, 0, D.StartDateTime), N.Number)
END
, IntervalEndDateTime = CASE
WHEN D.EndDateTime < DATEADD(day, DATEDIFF(day, 0, D.StartDateTime), N.Number + 1) THEN D.EndDateTime
ELSE DATEADD(day, DATEDIFF(day, 0, D.StartDateTime), N.Number + 1)
END
FROM #v_Dates D
--this join basically creates the needed number of rows
INNER JOIN #v_Numbers N ON DATEDIFF(day, D.StartDateTime, D.EndDateTime) + 1 > N.Number
)
-- final select
SELECT id
, StartDateTime
, EndDateTime
, IntervalStartDateTime
, IntervalEndDateTime
, Number
, DecimalValue = CAST( DATEDIFF(minute, IntervalStartDateTime, IntervalEndDateTime) AS DECIMAL)/60
FROM IntervalsParsed
ORDER BY id, Number
Just another option is an ad-hoc tally table in concert with a CROSS APPLY
Example
Select A.[column1]
,A.[StartDateTime]
,A.[EndDateTime]
,Hours = sum(1) / 60.0
From #YourTable A
Cross Apply (
Select Top (DateDiff(MINUTE,[StartDateTime],[EndDateTime])+1)
D=DateAdd(MINUTE,-1+Row_Number() Over (Order By (Select Null)),[StartDateTime])
From master..spt_values n1,master..spt_values n2
) B
Group By [column1],[StartDateTime],[EndDateTime],cast(D as Date)
Returns
This may be little complicated, but here is one way to use recursive cte to get the output. You can add the start date with one day as long as it is less than end date of your column. Also declared a Static value to make sure we can get difference of 24 hours.
--Create a table
Select 'example1' exm, '2019-02-17 00:45:00' startdate, '2019-02-19 12:30:00' Enddate into #temp union all
Select 'example2' exm, '2019-02-21 18:00:00' startdate, '2019-02-22 12:15:00' Enddate union all
Select 'example3' exm, '2019-02-22 20:15:00' startdate, '2019-02-22 20:30:00' Enddate
Declare #datevalue time = '23:59:59'
;with cte as (select exm, startdate, enddate, case when datediff(day, startdate, enddate) = 0 then datediff(SECOND, startdate, enddate)
when datediff(day, startdate, enddate)>0 then
datediff(SECOND, cast(startdate as time), #datevalue)
end as Hoursn, cast(dateadd(day, 1,cast(startdate as date)) as smalldatetime) valueforhours from #temp
union all
select exm, startdate, enddate, case when datediff(day, valueforhours, enddate) = 0 then datediff(SECOND, valueforhours, enddate)
when datediff(day, valueforhours, enddate)>0 then datediff(SECOND, cast(valueforhours as time), #datevalue) end as Hoursn, case when datediff(day,valueforhours, enddate) > 0 then dateadd(day,1,valueforhours) end as valueforhours
from cte
where
valueforhours <= cast(enddate as date)
)
select exm, startdate, Enddate, round(Hoursn*1.0/3600,2) as [hours] from cte
order by exm
Output:
exm startdate Enddate hours
example1 2019-02-17 00:45:00 2019-02-19 12:30:00 23.250000
example1 2019-02-17 00:45:00 2019-02-19 12:30:00 24.000000
example1 2019-02-17 00:45:00 2019-02-19 12:30:00 12.500000
example2 2019-02-21 18:00:00 2019-02-22 12:15:00 6.000000
example2 2019-02-21 18:00:00 2019-02-22 12:15:00 12.250000
example3 2019-02-22 20:15:00 2019-02-22 20:30:00 0.250000
I'm trying to count Holiday bookings (B.ID) for dates 2 days either side of today.
It works but my results are separated as I have to introduce the end date of
the holiday too, which varies for each start date (holidays have different durations).
The separates out my counts. What I need is one count for each date. Is there a way of working round this? I kinda just want to exclude the vwReturnDate from the group by but have to put it there as I've used it in my count.
In English I want - For each [date] count the number of [B.id] where [B.Depart] <= [date] and [vwReturnDate] > [date]
DECLARE #startDate DATE
DECLARE #endDate DATE
SET #startDate = Getdate()-2
SET #endDate = Getdate()+2;
WITH dates(Date) AS
( SELECT #startdate as Date
UNION ALL
SELECT DATEADD(d,1,[Date])
FROM dates
WHERE DATE < #enddate )
SELECT
[Date] as 'Calendar Date',
--CONVERT(VARCHAR(10), [Date],103) AS 'Date'
-- ,CONVERT(CHAR(2), [Date], 113) AS 'Day'
-- ,CONVERT(CHAR(4), [Date], 100) AS 'Month'
-- ,CONVERT(CHAR(4), [Date], 120) AS 'Year',
Case when B.Depart <= [date] AND vwR.ReturnDate >=[date] then count (B.ID) end AS 'Number of holidays live on date'
FROM [dates]
left join Booking B on B.depart=[Date]
inner join Quote Q on Q.ID=B.QuoteID
inner join vwReturnDate vwR on vwR.ID=B.ID
Group by [date], B.depart, vwR.ReturnDate
order by [date]
OPTION (MAXRECURSION 0)
GO
I have a table with two fields - datetime and int. I want to do a group by on the datetime only on the date ignoring the hour and minute. The SELECT statement should return a date that maps to the sum of the int of a single day.
SELECT CAST(Datetimefield AS DATE) as DateField, SUM(intfield) as SumField
FROM MyTable
GROUP BY CAST(Datetimefield AS DATE)
As he didn't specify which version of SQL server he uses (date type isn't available in 2005), one could also use
SELECT CONVERT(VARCHAR(10),date_column,112),SUM(num_col) AS summed
FROM table_name
GROUP BY CONVERT(VARCHAR(10),date_column,112)
I came researching the options that I would have to do this, however, I believe the method I use is the simplest:
SELECT COUNT(*),
DATEADD(dd, DATEDIFF(dd, 0, date_field),0) as dtgroup
FROM TABLE
GROUP BY DATEADD(dd, DATEDIFF(dd, 0, date_field),0)
ORDER BY dtgroup ASC;
-- I like this as the data type and the format remains consistent with a date time data type
;with cte as(
select
cast(utcdate as date) UtcDay, DATEPART(hour, utcdate) UtcHour, count(*) as Counts
from dbo.mytable cd
where utcdate between '2014-01-14' and '2014-01-15'
group by
cast(utcdate as date), DATEPART(hour, utcdate)
)
select dateadd(hour, utchour, cast(utcday as datetime)) as UTCDateHour, Counts
from cte
Personally i prefer the format function, allows you to simply change the date part very easily.
declare #format varchar(100) = 'yyyy/MM/dd'
select
format(the_date,#format),
sum(myfield)
from mytable
group by format(the_date,#format)
order by format(the_date,#format) desc;