SQL Calculate a Total Time in a specific State - sql-server

I have these datas in a Table
_____DateTime____|Variable__|Value
2017/03/29 23:00:00 | Variable1 | 1
2017/03/31 01:00:00 | Variable1 | 0
2017/03/31 02:00:00 | Variable1 | 1
2017/03/31 03:00:00 | Variable1 | 0
2017/03/31 04:00:00 | Variable2 | 1
2017/03/31 23:00:00 | Variable1 | 1
2017/04/01 01:00:00 | Variable1 | 0
And I would like to calculate the total duration where each variable was in state 1 between two date
For example between for Var1 2017/03/31 00:00:00 and 2017/04/01 00:00:00
The result is :
1 hour between 2017/03/31 00:00:00 and 2017/03/31 01:00:00
1 hour between 2017/03/31 02:00:00 and 2017/03/31 03:00:00
1 hour between 2017/03/31 23:00:00 and 2017/04/01 00:00:00
So the result I want for Var1 should be 3 hours
For example between for Var2 2017/03/31 00:00:00 and 2017/04/01 00:00:00
The result is :
1 hour between 2017/03/31 04:00:00 and 2017/04/01 00:00:00 (no value before but because it change to 1 I suppose that it was 0 before)
So the result I want for Var2 should be 20 hours
Variable|__Time in Value (second)
Variable1 | 180
Variable2 | 1200
If someone can help me.
Thanks in advance

For SQL Server 2012+ (because of lead() and concat())
Using a stacked cte to generate an hours table to inner join a subquery that uses the lead() window function to get the next date for status change partitioned by Variable.
To adapt for prior versions, use an outer apply() to get the next dt for each variable instead of lead(); and regular string concatenation with proper conversions instead of concat().
declare #fromdate datetime = '20170331 00:00:00';
declare #thrudate datetime = '20170401 00:00:00';
;with n as (select n from (values(0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) t(n))
, hours as (
select top ((datediff(hour, #fromdate, #thrudate)+1))
[DateHour]=dateadd(hour,(row_number() over (order by (select 1)) -1),#fromdate)
from n as deka cross join n as hecto cross join n as kilo
cross join n as tenK cross join n as hundredK
order by 1
)
select variable, value
, hours = count(h.datehour)
, start_dt = convert(varchar(20),min(h.datehour),120)
, end_dt = convert(varchar(20),end_dt,120)
, txt = concat(
count(h.datehour),' '
, case when count(h.datehour) < 2 then 'hour' else 'hours' end
, ' between '
, convert(varchar(20),min(h.datehour),120)
, ' and '
, convert(varchar(20),end_dt,120)
)
from hours h
inner join (
select
variable
, value
, start_dt = dt
, end_dt = case when coalesce(lead(dt) over (partition by variable order by dt),#thrudate) > #thrudate
then #thrudate
else coalesce(lead(dt) over (partition by variable order by dt),#thrudate)
end
from t
) s
on h.datehour >= s.start_dt
and h.datehour < s.end_dt
where h.datehour >= #fromdate
and h.datehour < #thrudate
and s.value = 1
group by variable, value, start_dt, end_dt
rextester demo: http://rextester.com/ZBWP22523
returns:
+-----------+-------+-------+---------------------+---------------------+--------------------------------------------------------------+
| variable | value | hours | start_dt | end_dt | txt |
+-----------+-------+-------+---------------------+---------------------+--------------------------------------------------------------+
| Variable1 | 1 | 1 | 2017-03-31 00:00:00 | 2017-03-31 01:00:00 | 1 hour between 2017-03-31 00:00:00 and 2017-03-31 01:00:00 |
| Variable1 | 1 | 1 | 2017-03-31 02:00:00 | 2017-03-31 03:00:00 | 1 hour between 2017-03-31 02:00:00 and 2017-03-31 03:00:00 |
| Variable1 | 1 | 1 | 2017-03-31 23:00:00 | 2017-04-01 01:00:00 | 1 hour between 2017-03-31 23:00:00 and 2017-04-01 01:00:00 |
| Variable2 | 1 | 20 | 2017-03-31 04:00:00 | 2017-04-01 00:00:00 | 20 hours between 2017-03-31 04:00:00 and 2017-04-01 00:00:00 |
+-----------+-------+-------+---------------------+---------------------+--------------------------------------------------------------+
If you need to do this often, you might consider creating an actual table for hours. Otherwise, using the stacked cte is as fast as most other options, and is much faster than a recursive cte as the number of values generated increases.
Number and Calendar table reference:
Generate a set or sequence without loops - 1 - Aaron Bertrand
Generate a set or sequence without loops - 2 - Aaron Bertrand
Generate a set or sequence without loops - 3 - Aaron Bertrand
The "Numbers" or "Tally" Table: What it is and how it replaces a loop - Jeff Moden
Creating a Date Table/Dimension in SQL Server 2008 - David Stein
Calendar Tables - Why You Need One - David Stein
Creating a date dimension or calendar table in SQL Server - Aaron Bertrand
TSQL Function to Determine Holidays in SQL Server - Tim Cullen
F_TABLE_DATE - Michael Valentine Jones

Related

Speed up a select in SQL Server

I have a table that has values like these :
Table 1 :
Name | DateTimeFrom | DateTimeTo
A | 2017-02-03 02:00 | 2017-02-10 23:55
B | 2017-01-03 14:00 | 2017-05-10 19:55
And another table that has values like these :
Table 2:
Name | Date | Hour | Value
A | 2017-01-01 | 00:00 | 0.25
A | 2017-01-01 | 00:15 | 0.25
A | 2017-01-01 | 00:30 | 0
A | 2017-01-01 | 00:45 | 0
A | 2017-01-01 | 01:00 | 0.25
[...] Contains values 0 or 0.25 every 15mins
Result :
Name | DateTimeFrom | DateTimeTo | Value
A | 2017-02-03 02:00 | 2017-02-10 23:55 | 345.0
B | 2017-01-03 14:00 | 2017-05-10 19:55 | 1202
I've created a view that contains all the columns from table 1 and the SUM of all the values from the table 2 according to the daterange on the table 1. The problem is that Table 2 contains more than 3 million rows and the SELECT takes about 10 mins...
Is there a way to speed up the process ?
I tried to create an index on the table 2 but I don't know which index (clustered ? on which columns ? ) i must create to lower the execution time.
Edit (here is the query) :
SELECT Name, DateTimeFrom, DateTimeTo FROM Table1
LEFT OUTER JOIN Table2 ON Table1.Name = Table2.Name AND Table1.DateTimeFrom <=
CAST(Table2.Date AS DATETIME) + CAST(Table2.Hour AS DATETIME)
AND (CASE WHEN Table1.DateTimeTo IS NULL THEN GETDATE() ELSE
Table1.DateTimeTo END) > CAST(Table2.Date AS DATETIME) + CAST(Table2.Hour AS DATETIME)
Op(Swapper) - Are you trying to only return the past 2 days?
Start with a non clustered index on table 2 date include value column.
Then add a filter for only the data set you need, no one can consume 3 million records. something like where datetimefrom > datediff(month, 1, sysdatetime()) (in the view definition)
A second thought, why compute this data over and over again via a view, consider materializing this data into a table.

Compute lag difference for different days

I need help computing a date difference across different rows with variable lag (specifically, rows that are not on the same day) without subqueries, joins, etc. I think this should be possible with some inline t-SQL aggregates that use OVER(PARTITION BY) clause, such as LAG, DENSE_RANK, etc., but I can't quite put a finger on it. This is for a SQL Server 2017 Developer's edition.
A clarifying example:
Consider a dataset with Job beginning and end dates (across various projects). Some jobs start and end on the same day (such as jobs 2 & 3, 4 & 5). I need to compute the idle time between consequent jobs that started on different days (per project). That is the days between last job's ending time and current job's beginning time. If the previous job started on the same day, then look further back in history of the same project. I.e. the jobs that started on the same day can be considered as parts of the same job.
UPDATE: I simplified the code/output by dropping time values (question's history has original dataset).
IF OBJECT_ID('tempdb..#t') IS NOT NULL DROP TABLE #t;
CREATE TABLE #t(Prj TINYINT, Beg DATE, Eñd DATE);
INSERT INTO #t SELECT 1, '1/1/17', '1/2/17';
INSERT INTO #t SELECT 1, '1/5/17', '1/7/17';
INSERT INTO #t SELECT 1, '1/5/17', '1/7/17';
INSERT INTO #t SELECT 1, '1/15/17', '1/15/17';
INSERT INTO #t SELECT 1, '1/15/17', '1/18/17';
INSERT INTO #t SELECT 1, '1/20/17', '1/24/17';
INSERT INTO #t SELECT 2, '2/2/17', '2/5/17';
INSERT INTO #t SELECT 2, '2/7/17', '2/9/17';
ALTER TABLE #t ADD Job INT NOT NULL IDENTITY (1,1) PRIMARY KEY;
A LAG(.,1) function uses precisely the previous job's ending time, which is not what I want. It yields incorrect idle duration for jobs 2 & 3, 4 & 5. Jobs 2 & 3 should both use the ending time of job 1. Jobs 4 & 5 should both use the ending time of job 3. The joined query computes idle duration correctly, but an inline calculation is desirable here (without joins, subqueries).
SELECT c.Job, c.Prj, c.Beg, c.Eñd,
-- in-line computation with OVER clause
PrvEñd_lg=LAG(c.Eñd,1) OVER(PARTITION BY c.Prj ORDER BY c.Beg),
Idle_lg=DATEDIFF(DAY, LAG(c.Eñd,1) OVER(PARTITION BY c.Prj ORDER BY c.Beg), c.Beg),
-- calculation over current and (joined) previous records
PrvEñd_j=MAX(p.Eñd),
IdleDur_j=DATEDIFF(DAY, MAX(p.Eñd), c.Beg)
FROM #t c LEFT JOIN #t p ON c.Prj=p.Prj AND c.Beg > p.Eñd
GROUP BY c.Job, c.Prj, c.Beg, c.Eñd
ORDER BY c.Prj, c.Beg
Job Prj Beg Eñd PrvEñd_lg Idle_lg PrvEñd_j IdleDur_j
1 1 2017-01-01 2017-01-02 NULL NULL NULL NULL
2 1 2017-01-05 2017-01-07 2017-01-02 3 2017-01-02 3
3 1 2017-01-05 2017-01-07 2017-01-07 -2 2017-01-02 3
4 1 2017-01-15 2017-01-15 2017-01-07 8 2017-01-07 8
5 1 2017-01-15 2017-01-18 2017-01-15 0 2017-01-07 8
6 1 2017-01-20 2017-01-24 2017-01-18 2 2017-01-18 2
7 2 2017-02-02 2017-02-05 NULL NULL NULL NULL
8 2 2017-02-07 2017-02-09 2017-02-05 2 2017-02-05 2
Please let me know, if I can further clarify any specific details.
Many thanks!
You can use a self-join.
select a.Job
, a.Prj
, a.Beg
, a.Eñd
, max(b.Eñd) as PrevEñd
, min(datediff(mi, b.Eñd, a.Beg) / (60*24.0)) as IdleDur
from #t as a
left join #t as b on a.Prj = b.Prj
and cast(a.Beg as date) > cast(b.Eñd as date)
group by a.Job
, a.Prj
, a.Beg
, a.Eñd
This produces the following output:
+-----+-----+---------------------+---------------------+---------------------+-----------+
| Job | Prj | Beg | Eñd | PrevEñd | IdleDur |
+-----+-----+---------------------+---------------------+---------------------+-----------+
| 1 | 1 | 2017-01-01 01:00:00 | 2017-01-02 02:00:00 | NULL | NULL |
| 2 | 1 | 2017-01-05 02:00:00 | 2017-01-07 03:00:00 | 2017-01-02 02:00:00 | 3.0000000 |
| 3 | 1 | 2017-01-05 03:00:00 | 2017-01-07 02:00:00 | 2017-01-02 02:00:00 | 3.0416666 |
| 4 | 1 | 2017-01-15 04:00:00 | 2017-01-15 03:00:00 | 2017-01-07 03:00:00 | 8.0416666 |
| 5 | 1 | 2017-01-15 15:00:00 | 2017-01-18 03:00:00 | 2017-01-07 03:00:00 | 8.5000000 |
| 6 | 1 | 2017-01-20 05:00:00 | 2017-01-24 02:00:00 | 2017-01-18 03:00:00 | 2.0833333 |
| 7 | 2 | 2017-02-02 06:00:00 | 2017-02-05 03:00:00 | NULL | NULL |
| 8 | 2 | 2017-02-07 07:00:00 | 2017-02-09 02:00:00 | 2017-02-05 03:00:00 | 2.1666666 |
+-----+-----+---------------------+---------------------+---------------------+-----------+

SQL Query with Average and Grouping

I just want to ask you guys, especially those with MsSQL knowledge, regarding my query.
My goal is to get the average delivery time and group my data by delivery date and route id daily/weekly/monthly.
Here's my query:
SELECT
RouteID,
CONVERT(date, [DeliveryDate]) AS delivery_date,
AVG(
DATEDIFF(
day,
CONVERT(date, [UnloadDate]),
CONVERT(date, [DeliveryDate])
)
) as Averate_Delivery_Time
FROM [CARGODB].[dbo].[Cargo_Transactions]
WHERE
[DeliveryDate] IS NOT NULL AND
[UnloadDate] != 0 AND
[StageID] = 'D' AND
( CONVERT(date, [DeliveryDate]) LIKE '%2016%' or
CONVERT(date, [DeliveryDate]) LIKE '%2017%')
GROUP BY CONVERT(date, [DeliveryDate]), [RouteID]
ORDER BY CONVERT(date, [DeliveryDate]) DESC
I am not confident if the average delivery time is correct so if you think it's wrong or there are other things in my query that needs to be corrected, please let me know.
UPDATE:
I was able to get the right query:
SELECT [RouteID],
CAST(DATEPART(YEAR,[DeliveryDate]) as varchar) + ' Week ' +
CAST(DATEPART(WEEK,[DeliveryDate]) AS varchar) AS week_name,
AVG(DATEDIFF(day, CONVERT(date, [UnloadDate]), CONVERT(date,
[DeliveryDate]))) as Average_Delivery_Days
FROM [CARGODB].[dbo].[Cargo_Transactions]
WHERE [DeliveryDate] IS NOT NULL AND [DeliveryDate] != 0
AND CONVERT(date, [DeliveryDate]) BETWEEN '2016-01-01' AND GETDATE()
AND [UnloadDate] IS NOT NULL AND [UnloadDate] != 0 AND [DeliveryDate] >
[UnloadDate]
AND [Deleted] = 0 and [StageID] = 'D'
GROUP BY DATEPART(YEAR,[DeliveryDate]), DATEPART(WEEK,[DeliveryDate]),
[RouteID]
ORDER BY DATEPART(YEAR,[DeliveryDate]), DATEPART(WEEK,[DeliveryDate]),
Average_Delivery_Days desc
But I have a more complicated query to do now. I have this sample data:
RouteID | week_name | yearnum | weeknum | Average_Delivery_Days
=======================================================================
MK | 2016 Week 2 | 2016 | 2 | 1
-----------------------------------------------------------------------
TSM | 2016 Week 2 | 2016 | 2 | 1
-----------------------------------------------------------------------
E | 2016 Week 2 | 2016 | 2 | 1
-----------------------------------------------------------------------
A | 2016 Week 2 | 2016 | 2 | 1
-----------------------------------------------------------------------
D | 2016 Week 2 | 2016 | 2 | 1
-----------------------------------------------------------------------
MP | 2016 Week 2 | 2016 | 2 | 1
-----------------------------------------------------------------------
CTN | 2016 Week 3 | 2016 | 3 | 9
-----------------------------------------------------------------------
BIS | 2016 Week 3 | 2016 | 3 | 8
-----------------------------------------------------------------------
C | 2016 Week 3 | 2016 | 3 | 1
-----------------------------------------------------------------------
PN | 2016 Week 4 | 2016 | 4 |10
-----------------------------------------------------------------------
How can I make the above data be like:
MK and TSM are merged into 1 new routeID like Manila1
E, A, and D are merged into another as Manila2
MP, CTN, AND BIS as Visayas
C and PN as Mindanao
and so on..
And the average delivery days will be changed as well.
Your help is highly appreciated. Thank you!

How to sum up values in a column for each week number?

I need to sum up values from Money column for each WeekNumber.
Now I have view:
WeekNumber | DayTime | Money
---------------------------------------
1 | 2012-01-01 | 20.4
1 | 2012-01-02 | 30.5
1 | 2012-01-03 | 55.1
2 | 2012-02-01 | 67.3
2 | 2012-02-02 | 33.4
3 | 2012-03-01 | 11.8
3 | 2012-03-04 | 23.9
3 | 2012-03-05 | 34.3
4 | 2012-04-01 | 76.6
4 | 2012-04-02 | 90.3
Tsql:
SELECT datepart(week,DayTime) AS WeekNumber, DayTime, Money FROM dbo.Transactions
In conclusion, I would like to get something like this:
WeekNumber | DayTime | Sum
---------------------------------------
1 | 2012-01-01 | 106
2 | 2012-02-02 | 100.7
3 | 2012-03-03 | 470
4 | 2012-04-01 | 166.9
DayTime should be random for each Week Number but exists in column DayTime from view above.
Please, be free to write your ideas. Thanks.
SELECT datepart(week,DayTime) AS WeekNumber
, MIN(DayTime) DayTime --<-- Instead of random get first date from your data in that week
, SUM(Money) AS [Sum]
FROM dbo.Transactions
GROUP BY datepart(week,DayTime)
Try this
SELECT datepart(week,DayTime) AS WeekNumber, SUM(Money) FROM dbo.Transactions GROUP BY WeekNumber
As you will have number of rows for each week you cannot get DayTime with the same table. There are other ways to add that too like JOIN
Change your SQL to sum the money column. Like this
SELECT
datepart(week,DayTime) AS WeekNumber,
DayTime, Money = SUM(Money)
FROM dbo.Transactions
GROUP BY datepart(week,DayTime),DayTime
SELECT datepart(week, DayTime) AS WeekNumber
,MIN(DayTime)
,SUM(MONEY)
FROM dbo.Transactions
GROUP BY datepart(week, DayTime)

Recursive first day of each month for current getdate

Using T-SQL, I want a new column that will show me the first day of each month, for the current year of getdate().
After that I need to count the rows on this specific date. Should I do it with CTE or a temp table?
If 2012+, you can use DateFromParts()
To Get a List of Dates
Select D = DateFromParts(Year(GetDate()),N,1)
From (values (1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12)) N(N)
Returns
D
2017-01-01
2017-02-01
2017-03-01
2017-04-01
2017-05-01
2017-06-01
2017-07-01
2017-08-01
2017-09-01
2017-10-01
2017-11-01
2017-12-01
Edit For Trans Count
To get Transactions (assuming by month). It becomes a small matter of a left join to created Dates
-- This is Just a Sample Table Variable for Demonstration.
-- Remove this and Use your actual Transaction Table
--------------------------------------------------------------
Declare #Transactions table (TransDate date,MoreFields int)
Insert Into #Transactions values
('2017-02-18',6)
,('2017-02-19',9)
,('2017-03-05',5)
Select TransMonth = A.MthBeg
,TransCount = count(B.TransDate)
From (
Select MthBeg = DateFromParts(Year(GetDate()),N,1)
,MthEnd = EOMonth(DateFromParts(Year(GetDate()),N,1))
From (values (1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12)) N(N)
) A
Left Join #Transactions B on TransDate between MthBeg and MthEnd
Group By A.MthBeg
Returns
TransMonth TransCount
2017-01-01 0
2017-02-01 2
2017-03-01 1
2017-04-01 0
2017-05-01 0
2017-06-01 0
2017-07-01 0
2017-08-01 0
2017-09-01 0
2017-10-01 0
2017-11-01 0
2017-12-01 0
For an adhoc table of months for a given year:
declare #year date = dateadd(year,datediff(year,0,getdate() ),0)
;with Months as (
select
MonthStart=dateadd(month,n,#year)
from (values(0),(1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11)) t(n)
)
select MonthStart
from Months
rextester demo: http://rextester.com/POKPM51023
returns:
+------------+
| MonthStart |
+------------+
| 2017-01-01 |
| 2017-02-01 |
| 2017-03-01 |
| 2017-04-01 |
| 2017-05-01 |
| 2017-06-01 |
| 2017-07-01 |
| 2017-08-01 |
| 2017-09-01 |
| 2017-10-01 |
| 2017-11-01 |
| 2017-12-01 |
+------------+
The first part: dateadd(year,datediff(year,0,getdate() ),0) adds the number of years since 1900-01-01 to the date 1900-01-01. So it will return the first date of the year. You can also swap year for other levels of truncation: year, quarter, month, day, hour, minute, second, et cetera.
The second part uses a common table expression and the table value constructor (values (...),(...)) to source numbers 0-11, which are added as months to the start of the year.
Not sure why you require recursive... But for first day of month you can try query like below:
Select Dateadd(day,1,eomonth(Dateadd(month, -1,getdate())))
declare #year date = dateadd(year,datediff(year,0,getdate() ),0)
;WITH months(MonthNumber) AS
(
SELECT 0
UNION ALL
SELECT MonthNumber+1
FROM months
WHERE MonthNumber < 11
)
select dateadd(month,MonthNumber,#year)
from months

Resources