Calculating sum of differences from each group - sql-server

I have the following table:
Sensor | building | Date_time | Current_value
1 | 1 | 20.08.2017 | 20
1 | 1 | 21.08.2017 | 25
1 | 1 | 22.08.2017 | 35
2 | 1 | 20.08.2017 | 120
2 | 1 | 21.08.2017 | 200
2 | 1 | 22.08.2017 | 210
3 | 2 | 20.08.2017 | 20
3 | 2 | 21.08.2017 | 25
3 | 2 | 22.08.2017 | 85
5 | 2 | 20.08.2017 | 320
5 | 2 | 21.08.2017 | 400
5 | 2 | 22.08.2017 | 410
The sensor ID is assumed to be unique, as is the building ID.
I need to calculate the total value for each building for any given timeframe by subtracting the MIN value from the MAX value for each sensor, then group the sum by each building.
In the above sample it would be
Sensor 1: (35 - 20)=15
Sensor 2: (210-120)=90
Building 1 = 15+90 = 105
(...)
Building 2 = 65+90 = 155
Any pointers in the right direction are greatly appreciated!

You are asking how to calculate the difference between min and max values per sensor, then aggregate the differences per building.
with diffs as (
SELECT Building,Sensor, MAX(Current_Value)-MIN(Current_Value) as diff
FROM SomeTable
GROUP BY Building, Sensor
)
SELECT Building,sum(diff)
FROM diffs
GROUP BY Building
If you want to restrict the time period, you'll have to do so inside the CTE :
with diffs as (
SELECT Building,Sensor, MAX(Current_Value)-MIN(Current_Value) as diff
FROM SomeTable
WHERE Date_Time between #start and #end
GROUP BY Building, Sensor
)
SELECT Building,sum(diff)
FROM diffs
GROUP BY Building
You can convert this query into a user defined function that can be used in other queries :
create function fn_TotalDiffs(#start datetime2(0), #end datetime2(0))
returns table
as
Return (
with diffs as (
select Building,Sensor, MAX(Current_Value)-MIN(Current_Value) as diff
from SomeTable
Group by Building, Sensor
)
select Building,sum(diff) as Total
from diffs
Group by Building
)

Another option using window function min/max over()
Example
Select Building
,Total = sum(R1)
From (
Select Distinct
Building
,R1 = max([Current_value]) over (Partition By Building,Sensor)
-min([Current_value]) over (Partition By Building,Sensor)
From YourTable
Where Date_time between #Date1 and #Date2
) A
Group By Building
Returns
Building Total
1 105
2 155

Related

Maximum Daisy Chain Length

I have a bunch of value pairs (Before, After) by users in a table. In ideal scenarios these values should form an unbroken chain. e.g.
| UserId | Before | After |
|--------|--------|-------|
| 1 | 0 | 10 |
| 1 | 10 | 20 |
| 1 | 20 | 30 |
| 1 | 30 | 40 |
| 1 | 40 | 30 |
| 1 | 30 | 52 |
| 1 | 52 | 0 |
Unfortunately, these records originate in multiple different tables and are imported into my investigation table. The other values in the table do not lend themselves to ordering (e.g. CreatedDate) due to some quirks in the system saving them out of order.
I need to produce a list of users with gaps in their data. e.g.
| UserId | Before | After |
|--------|--------|-------|
| 1 | 0 | 10 |
| 1 | 10 | 20 |
| 1 | 20 | 30 |
// Row Deleted (30->40)
| 1 | 40 | 30 |
| 1 | 30 | 52 |
| 1 | 52 | 0 |
I've looked at the other Daisy Chaining questions on SO (and online in general), but they all appear to be on a given problem space, where one value in the pair is always lower than the other in a predictable fashion. In my case, there can be increases or decreases.
Is there a way to quickly calculate the longest chain that can be created? I do have a CreatedAt column that would provide some (very rough) relative ordering - When the date is more than about 10 seconds apart, we could consider them orderable)
Are you not therefore simply after this to get the first row where the "chain" is broken?
SELECT UserID, Before, After
FROM dbo.YourTable YT
WHERE NOT EXISTS (SELECT 1
FROM dbo.YourTable NE
WHERE NE.After = YT.Before)
AND YT.Before != 0;
If you want to last row where the row where the "chain" is broken, just swap the aliases on the columns in the WHERE in the NOT EXISTS.
the following performs hierarchical recursion on your example data and calculates a "chain" count column called 'h_level'.
;with recur_cte([UserId], [Before], [After], h_level) as (
select [UserId], [Before], [After], 0
from dbo.test_table
where [Before] is null
union all
select tt.[UserId], tt.[Before], tt.[After], rc.h_level+1
from dbo.test_table tt join recur_cte rc on tt.UserId=rc.UserId
and tt.[Before]=rc.[After]
where tt.[Before]<tt.[after])
select * from recur_cte;
Results:
UserId Before After h_level
1 NULL 10 0
1 10 20 1
1 20 30 2
1 30 40 3
1 30 52 3
Is this helpful? Could you further define which rows to exclude?
If you want users that have more than one chain:
select t.UserID
from <T> as t left outer join <T> as t2
on t2.UserID = t.UserID and t2.Before = t.After
where t2.UserID is null
group by t.UserID
having count(*) > 1;

Reconstructing Balances By Weekly Transaction Sums

I am looking for some advice or pointers on how to construct this. I have spent the last year self-learning SQL. I am at work and I only have access to the query interface in report builder. Which for me means, no procedures, no create tables and no IDE :(. So thats the limitations!
I am trying to reconstruct account balances. I have no intervening balances. I have the current balance and a table full of the transaction history
My current approach is to sum the transactions by posting week (Which I have done) in my CTE named
[SUMTRANSREF]
+--------------+------------+-----------+
| TNCY-SYS-REF | POSTING-WK | SUM-TRANS |
+--------------+------------+-----------+
| 1 | 47 | 37.95 |
| 1 | 46 | 37.95 |
| 1 | 45 | 37.95 |
| 2 | 47 | 50.00 |
| 2 | 46 | 25.00 |
| 2 | 45 | 25.00 |
+--------------+------------+-----------+
I then get the current balances in another CTE called
[CBAL]
+--------------+-------------+-----------+
| TNCY-SYS-REF | CUR-BALANCE | CURR-WEEK |
+--------------+-------------+-----------+
| 1 | 27.52 | 47 |
| 1 | 52.00 | 47 |
+--------------+-------------+-----------+
Now I am assuming I could create intervening CTEs to sum and then splice those altogether but is there a smarter (more automated) way?
Ideally my result should be
+--------------+-------------+----------+----------+
| TNCY-SYS-REF | CUR-BALANCE | BAL-WK46 | BAL-Wk45 |
+--------------+-------------+----------+----------+
| 1 | 27.52 | -10.43 | -48.38 |
| 2 | 52.00 | 2.00 | -48.00 |
+--------------+-------------+----------+----------+
I just am uncertain because each column requires the sum of intervening transactions
So BAL-WK46 is (CURR-BALANCE) - SUM(Transactions from 47)
So BAL-WK46 is (CURR-BALANCE) - SUM(Transactions 46+47)
So BAL-WK45 is (CURR-BALANCE) - SUM(Transactions 45+46+47)
and so on.
Normally I have an idea where to start but I am flummoxed by this one.
Any help you can give would be appreciated. Thank you
Here is some T-SQL that gets the result you require. Should be easy enough to play with to get what you want.
It makes use of Recursive CTE and a PIVOT
IF OBJECT_ID('Tempdb..#SUMTRANSREF') IS NOT NULL
DROP TABLE #SUMTRANSREF
IF OBJECT_ID('Tempdb..#CBAL') IS NOT NULL
DROP TABLE #CBAL
IF OBJECT_ID('Tempdb..#TEMP') IS NOT NULL
DROP TABLE #TEMP
CREATE TABLE #SUMTRANSREF
(
[TNCY-SYS-REF] int,
[POSTING-WK] int,
[SUM-TRANS] float
)
CREATE TABLE #CBAL
(
[TNCY-SYS-REF] int ,
[CUR-BALANCE] float , [CURR-WEEK] int
)
INSERT INTO #SUMTRANSREF
VALUES (1 ,47 , 37.95),
(1 ,46 , 37.95),
(1 ,45 , 37.95),
(2 ,47 , 50.00),
(2 ,46 , 25.00),
(2 ,45 , 25.00 )
INSERT INTO #CBAL
VALUES (1,27.52,47),(2,52.00,47);
WITH CBAL AS
(SELECT * FROM #CBAL),
SUMTRANSREF AS(SELECT * FROM #SUMTRANSREF),
RecursiveTotals([TNCY-SYS-REF],[CURR-WEEK],[CUR-BALANCE],RunningBalance)
AS
(
select C.[TNCY-SYS-REF], C.[CURR-WEEK],C.[CUR-BALANCE],C.[CUR-BALANCE] + S.RunningTotal RunningBalance from CBAL C
JOIN (select *,-SUM([SUM-TRANS]) OVER (PARTITION BY [TNCY-SYS-REF] ORDER BY [POSTING-WK] DESC) RunningTotal
from SUMTRANSREF) S
ON C.[CURR-WEEK]=S.[POSTING-WK] AND C.[TNCY-SYS-REF]=S.[TNCY-SYS-REF]
UNION ALL
select RT.[TNCY-SYS-REF], RT.[CURR-WEEK] -1 [CURR_WEEK],RT.[CUR-BALANCE],RT.[CUR-BALANCE] + S.RunningTotal RunningBalance FROM RecursiveTotals RT
JOIN (select *,-SUM([SUM-TRANS]) OVER (PARTITION BY [TNCY-SYS-REF] ORDER BY [POSTING-WK] DESC) RunningTotal
from #SUMTRANSREF) S ON RT.[TNCY-SYS-REF] = S.[TNCY-SYS-REF] AND RT.[CURR-WEEK]-1 = S.[POSTING-WK]
)
select [TNCY-SYS-REF],[CUR-BALANCE],[46] as 'BAL-WK46',[45] as 'BAL-WK45',[44] as 'BAL-WK44'
FROM (
select [TNCY-SYS-REF],[CUR-BALANCE],RunningBalance,BalanceWeek from (SELECT *,R.[CURR-WEEK]-1 'BalanceWeek' FROm RecursiveTotals R
) RT) AS SOURCETABLE
PIVOT
(
AVG(RunningBalance)
FOR BalanceWeek in ([46],[45],[44])
) as PVT

Altering vs adding column - setting all dates in month to one end date

I'm trying to add rank by sales and also change the date column to a 'month end' field that would have one month end date per month - if that makes sense?
Would you alter table and add column or could you just rename the date field and use set and case to make all March dates = 3-31-18 and all April 4-30-18?
I got this far:
UPDATE table1
SET DATE=EOMONTH(DATE) AS MONTH_END;
ALTER TABLE table1
ADD COLUMN RANK INT AFTER sales;
UPDATE table1
SET RANK=
RANK() OVER(PARTITION BY cust ORDER BY sales DESC);
LIMIT 2
can i do two sets in a row like that without adding an update? I'm looking for top 2 within each month - would this work? I feel like this is right and most efficient query, but its not working - any help appreciated!!
orig table
+------+----------+-------+--+
| CUST | DATE | SALES | |
+------+----------+-------+--+
| 36 | 3-5-2018 | 50 | |
| 37 | 3-15-18 | 100 | |
| 38 | 3-25-18 | 65 | |
| 37 | 4-5-18 | 95 | |
| 39 | 4-21-18 | 500 | |
| 40 | 4-45-18 | 199 | |
+------+----------+-------+--+
desired output
+------+-----------+-------+------+
| CUST | Month End | SALES | Rank |
+------+-----------+-------+------+
| | | | |
| 37 | 3-31-18 | 100 | 1 |
| 38 | 3-31-18 | 65 | 2 |
| 39 | 4-30-18 | 500 | 1 |
| 40 | 4-30-18 | 199 | 2 |
+------+-----------+-------+------+
Based on your expected output I think this may work as well.
create table Salesdate (Cust int, Dates date, Sales int)
insert into Salesdate values
(36 , '2018-03-05' , 50 )
,(37 , '2018-03-15' , 100 )
,(38 , '2018-03-25' , 65 )
,(37 , '2018-04-05' , 95 )
,(40 , '2018-04-25' , 199 )
,(39 , '2018-04-21' , 500 )
Updating the same column dates to the last day of the month (EOmonth will help to give last day of the month), you can add a separate column or update the column as you prefer.
Update Salesdate
set Dates = eomonth(Dates)
Add a column called rank in the table.
Alter table Salesdate
add rank int
Update the column rank which was just added.
update Salesdate
set Salesdate.[rank] = tbl.Ranked from
(select Cust, Sales, Dates , rank() over (Partition by Dates order by Sales Desc)
Ranked from Salesdate ) tbl
where tbl.Cust = salesdate.Cust
and tbl.Sales = salesdate.Sales
and tbl.dates = salesdate.Dates
--Not sure if this step is necessary if you want your final table to have only rank 1 and 2, then you can delete the data. Or it can be filtered out only on select list as well. Please note that sometimes rank may skip the number if we don't have unique set of sales amount for a given customer.
;With cte as (
select * from Salesdate)
delete from cte
where [RANK] > 2
select * from Salesdate
order by dates, [RANK]
Output
Cust Dates Sales rank
37 2018-03-31 100 1
38 2018-03-31 65 2
39 2018-04-30 500 1
40 2018-04-30 199 2

SQL group by date difference with previous row

I looking for some grouping using datetime daily rows to build date range intervals
My table is something like:
id | A | B | Date
1 | 1 | 2 | 1/10/2010
2 | 1 | 2 | 2/10/2010
3 | 1 | 2 | 3/10/2010
4 | 1 | 3 | 4/10/2010
5 | 1 | 3 | 5/10/2010
6 | 1 | 2 | 6/10/2010
7 | 1 | 2 | 7/10/2010
8 | 1 | 2 | 8/10/2010
My first try was:
SELECT A, B, MIN(DATE), MAX(date)
FROM table
GROUP BY A, B
So after group by A, B and use min and max with date on my select, I get invalid results due the repetition of B = 2.
A B Date A B min(Date) max(Date)
1 | 1 | 2 | 1/10/2010 1 2 | 1/10/2010 8/10/2010
2 | 1 | 2 | 2/10/2010 Invalid
3 | 1 | 2 | 3/10/2010 ------->
6 | 1 | 2 | 6/10/2010
7 | 1 | 2 | 7/10/2010
8 | 1 | 2 | 8/10/2010
I'm looking for how to calculate the third member of the group by...
So the expected intervals results:
A B Start Date End Date
.. | 1 | 2 | 1/10/2010 | 3/10/2010
.. | 1 | 3 | 4/10/2010 | 5/10/2010
.. | 1 | 2 | 6/10/2010 | 8/10/2010
I need to support SQL Server 2008
Thank you in advance for your help
The following is an easy way to deal with "islands and gaps" where you need to find gaps in consecutive dates:
SELECT A, B, StartDate = MIN([Date]), EndDate = MAX([Date])
FROM
(
SELECT *,
RN = DATEDIFF(DAY, 0, [Date]) - ROW_NUMBER() OVER (PARTITION BY A, B ORDER BY [Date])
FROM myTable
) AS T
GROUP BY A, B, RN;
To break it down into slightly simpler-to-understand logic: you assign each date a number (DATEDIFF(DAY, 0, [Date]) here) and each date a row number (partitioned by A and B here), then any time there's a gap in the dates, the difference between those two will change.
There are a variety of resources you can use to understand different approaches to "islands and gaps" problems. Here is one that might help you with tackling other varieties of this in the future: https://www.red-gate.com/simple-talk/sql/t-sql-programming/the-sql-of-gaps-and-islands-in-sequences/

Window function to count occurrences in last 10 minutes

I can use a traditional subquery approach to count the occurrences in the last ten minutes. For example, this:
drop table if exists [dbo].[readings]
go
create table [dbo].[readings](
[server] [int] NOT NULL,
[sampled] [datetime] NOT NULL
)
go
insert into readings
values
(1,'20170101 08:00'),
(1,'20170101 08:02'),
(1,'20170101 08:05'),
(1,'20170101 08:30'),
(1,'20170101 08:31'),
(1,'20170101 08:37'),
(1,'20170101 08:40'),
(1,'20170101 08:41'),
(1,'20170101 09:07'),
(1,'20170101 09:08'),
(1,'20170101 09:09'),
(1,'20170101 09:11')
go
-- Count in the last 10 minutes - example periods 08:31 to 08:40, 09:12 to 09:21
select server,sampled,(select count(*) from readings r2 where r2.server=r1.server and r2.sampled <= r1.sampled and r2.sampled > dateadd(minute,-10,r1.sampled)) as countinlast10minutes
from readings r1
order by server,sampled
go
How can I use a window function to obtain the same result ? I've tried this:
select server,sampled,
count(case when sampled <= r1.sampled and sampled > dateadd(minute,-10,r1.sampled) then 1 else null end) over (partition by server order by sampled rows between unbounded preceding and current row) as countinlast10minutes
-- count(case when currentrow.sampled <= r1.sampled and currentrow.sampled > dateadd(minute,-10,r1.sampled) then 1 else null end) over (partition by server order by sampled rows between unbounded preceding and current row) as countinlast10minutes
from readings r1
order by server,sampled
But the result is just the running count. Any system variable that refers to the current row pointer ? currentrow.sampled ?
This isn't a very pleasing answer but one possibility is to first create a helper table with all the minutes
CREATE TABLE #DateTimes(datetime datetime primary key);
WITH E1(N) AS
(
SELECT 1 FROM (VALUES(1),(1),(1),(1),(1),
(1),(1),(1),(1),(1)) V(N)
) -- 1*10^1 or 10 rows
, E2(N) AS (SELECT 1 FROM E1 a, E1 b) -- 1*10^2 or 100 rows
, E4(N) AS (SELECT 1 FROM E2 a, E2 b) -- 1*10^4 or 10,000 rows
, E8(N) AS (SELECT 1 FROM E4 a, E4 b) -- 1*10^8 or 100,000,000 rows
,R(StartRange, EndRange)
AS (SELECT MIN(sampled),
MAX(sampled)
FROM readings)
,N(N)
AS (SELECT ROW_NUMBER()
OVER (
ORDER BY (SELECT NULL)) AS N
FROM E8)
INSERT INTO #DateTimes
SELECT TOP (SELECT 1 + DATEDIFF(MINUTE, StartRange, EndRange) FROM R) DATEADD(MINUTE, N.N - 1, StartRange)
FROM N,
R;
And then with that in place you could use ROWS BETWEEN 9 PRECEDING AND CURRENT ROW
WITH T1 AS
( SELECT Server,
MIN(sampled) AS StartRange,
MAX(sampled) AS EndRange
FROM readings
GROUP BY Server )
SELECT Server,
sampled,
Cnt
FROM T1
CROSS APPLY
( SELECT r.sampled,
COUNT(r.sampled) OVER (ORDER BY N.datetime ROWS BETWEEN 9 PRECEDING AND CURRENT ROW) AS Cnt
FROM #DateTimes N
LEFT JOIN readings r
ON r.sampled = N.datetime
AND r.server = T1.server
WHERE N.datetime BETWEEN StartRange AND EndRange ) CA
WHERE CA.sampled IS NOT NULL
ORDER BY sampled
The above assumes that there is at most one sample per minute and that all the times are exact minutes. If this isn't true it would need another table expression pre-aggregating by datetimes rounded to the minute.
As far as I know, there is not a simple exact replacement for your subquery using window functions.
Window functions operate on a set of rows and allow you to work with them based on partitions and order.
What you are trying to do isn't the type of partitioning that we can work with in window functions.
To generate the partitions we would need to be able to use window functions in this instance would just result in overly complicated code.
I would suggest cross apply() as an alternative to your subquery.
I am not sure if you meant to restrict your results to within 9 minutes, but with sampled > dateadd(...) that is what is happening in your original subquery.
Here is what a window function could look like based on partitioning your samples into 10 minute windows, along with a cross apply() version.
select
r.server
, r.sampled
, CrossApply = x.CountRecent
, OriginalSubquery = (
select count(*)
from readings s
where s.server=r.server
and s.sampled <= r.sampled
/* doesn't include 10 minutes ago */
and s.sampled > dateadd(minute,-10,r.sampled)
)
, Slices = count(*) over(
/* partition by server, 10 minute slices, not the same thing*/
partition by server, dateadd(minute,datediff(minute,0,sampled)/10*10,0)
order by sampled
)
from readings r
cross apply (
select CountRecent=count(*)
from readings i
where i.server=r.server
/* changed to >= */
and i.sampled >= dateadd(minute,-10,r.sampled)
and i.sampled <= r.sampled
) as x
order by server,sampled
results: http://rextester.com/BMMF46402
+--------+---------------------+------------+------------------+--------+
| server | sampled | CrossApply | OriginalSubquery | Slices |
+--------+---------------------+------------+------------------+--------+
| 1 | 01.01.2017 08:00:00 | 1 | 1 | 1 |
| 1 | 01.01.2017 08:02:00 | 2 | 2 | 2 |
| 1 | 01.01.2017 08:05:00 | 3 | 3 | 3 |
| 1 | 01.01.2017 08:30:00 | 1 | 1 | 1 |
| 1 | 01.01.2017 08:31:00 | 2 | 2 | 2 |
| 1 | 01.01.2017 08:37:00 | 3 | 3 | 3 |
| 1 | 01.01.2017 08:40:00 | 4 | 3 | 1 |
| 1 | 01.01.2017 08:41:00 | 4 | 3 | 2 |
| 1 | 01.01.2017 09:07:00 | 1 | 1 | 1 |
| 1 | 01.01.2017 09:08:00 | 2 | 2 | 2 |
| 1 | 01.01.2017 09:09:00 | 3 | 3 | 3 |
| 1 | 01.01.2017 09:11:00 | 4 | 4 | 1 |
+--------+---------------------+------------+------------------+--------+
Thanks, Martin and SqlZim, for your answers. I'm going to raise a Connect enhancement request for something like %%currentrow that can be used in window aggregates. I'm thinking this would lead to much more simple and natural sql:
select count(case when sampled <= %%currentrow.sampled and sampled > dateadd(minute,-10,%%currentrow.sampled) then 1 else null end) over (...whatever the window is...)
We can already use expressions like this:
select count(case when sampled <= getdate() and sampled > dateadd(minute,-10,getdate()) then 1 else null end) over (...whatever the window is...)
so thinking would be great if we could reference a column that's in the current row.

Resources