Calculate row difference within groups - sql-server

I'm looking for help with calculating the difference between consecutive ordered rows within groups in SQL (Microsoft SQL server).
I have a table like this:
ID School_ID Enrollment_Start_Date Order
1 56 1/1/2018 10
1 56 5/5/2018 24
1 56 7/7/2018 35
1 103 4/4/2019 26
1 103 3/3/2019 19
I want to calculate the difference between Order, group by ID, School_ID, and order by Enrollment_Start_Date.
so I want something like this:
ID School_ID Enrollment_Start_Date Order Diff
1 56 1/1/2018 10 10 # nothing to be subtracted from 10
1 56 5/5/2018 24 14 # 24-10
1 56 7/7/2018 35 11 # 35-24
1 103 3/3/2019 19 19 # nothing to be subtracted from 19
1 103 4/4/2019 26 7 # 26-19
I have hundreds of IDs, and each ID can have at most 6 Enrollment_Start_Date, so I'm looking for some generalizable implementations.

Use LAG(<column>) analytic function to obtain a "previous" column value specified within the OVER part, then substract current value from it and make it a positive number multiplying it by -1. If previous value isn't present (is null) then take the current value.
Pseudo code would be:
If previous_order_value exists:
-1 * (previous_order_value - current_order_value)
Else
current_order_value
where previous_order_value is based on the same id & school_id and is sorted by enrollment_start_date in ascending order
SQL Code:
select
id,
school_id,
enrollment_start_date,
[order],
coalesce(-1 * (lag([order]) over (partition by id, school_id order by enrollment_start_date ) - [order]), [order]) as diff
from yourtable
Also note, that order keyword is reserved in SQL Server, which is why your column was created with name wrapped within [ ]. I suggest using some other word for this column, if possible.

use lag() analytic function for getting difference of two row and case when for getting orginal value of order column where no difference exist
with cte as
(
select 1 as id, 56 as sclid, '2018-01-01' as s_date, 10 as orders
union all
select 1,56,'2018-05-05',24 union all
select 1,56,'2018-07-07',35 union all
select 1,103,'2019-04-04',26 union all
select 1,103,'2019-03-03',19
) select t.*,
case when ( lag([orders])over(partition by id,sclid order by s_date ) -[orders] )
is null then [orders] else
( lag([orders])over(partition by id,sclid order by s_date ) -[orders] )*(-1) end
as diff
from cte t
output
id sclid s_date orders diff
1 56 2018-01-01 10 10
1 56 2018-05-05 24 14
1 56 2018-07-07 35 11
1 103 2019-03-03 19 19
1 103 2019-04-04 26 7
demo link

Use LAG(COLUMN_NAME)
Query
SELECT id, School_ID, Enrollment_Start_Date, cOrder,
ISNULL((cOrder - (LAG(cOrder) OVER(PARTITION BY id, School_ID ORDER BY Enrollment_Start_Date))),cOrder)Diff
FROM Table1
Samle Output
| id | School_ID | Enrollment_Start_Date | cOrder | Diff |
|----|-----------|-----------------------|--------|------|
| 1 | 56 | 2018-01-01 | 10 | 10 |
| 1 | 56 | 2018-05-05 | 24 | 14 |
| 1 | 56 | 2018-07-07 | 35 | 11 |
| 1 | 103 | 2019-03-03 | 19 | 19 |
| 1 | 103 | 2019-04-04 | 26 | 7 |
SQL Fiddle Demo

Related

Maximum Daisy Chain Length

I have a bunch of value pairs (Before, After) by users in a table. In ideal scenarios these values should form an unbroken chain. e.g.
| UserId | Before | After |
|--------|--------|-------|
| 1 | 0 | 10 |
| 1 | 10 | 20 |
| 1 | 20 | 30 |
| 1 | 30 | 40 |
| 1 | 40 | 30 |
| 1 | 30 | 52 |
| 1 | 52 | 0 |
Unfortunately, these records originate in multiple different tables and are imported into my investigation table. The other values in the table do not lend themselves to ordering (e.g. CreatedDate) due to some quirks in the system saving them out of order.
I need to produce a list of users with gaps in their data. e.g.
| UserId | Before | After |
|--------|--------|-------|
| 1 | 0 | 10 |
| 1 | 10 | 20 |
| 1 | 20 | 30 |
// Row Deleted (30->40)
| 1 | 40 | 30 |
| 1 | 30 | 52 |
| 1 | 52 | 0 |
I've looked at the other Daisy Chaining questions on SO (and online in general), but they all appear to be on a given problem space, where one value in the pair is always lower than the other in a predictable fashion. In my case, there can be increases or decreases.
Is there a way to quickly calculate the longest chain that can be created? I do have a CreatedAt column that would provide some (very rough) relative ordering - When the date is more than about 10 seconds apart, we could consider them orderable)
Are you not therefore simply after this to get the first row where the "chain" is broken?
SELECT UserID, Before, After
FROM dbo.YourTable YT
WHERE NOT EXISTS (SELECT 1
FROM dbo.YourTable NE
WHERE NE.After = YT.Before)
AND YT.Before != 0;
If you want to last row where the row where the "chain" is broken, just swap the aliases on the columns in the WHERE in the NOT EXISTS.
the following performs hierarchical recursion on your example data and calculates a "chain" count column called 'h_level'.
;with recur_cte([UserId], [Before], [After], h_level) as (
select [UserId], [Before], [After], 0
from dbo.test_table
where [Before] is null
union all
select tt.[UserId], tt.[Before], tt.[After], rc.h_level+1
from dbo.test_table tt join recur_cte rc on tt.UserId=rc.UserId
and tt.[Before]=rc.[After]
where tt.[Before]<tt.[after])
select * from recur_cte;
Results:
UserId Before After h_level
1 NULL 10 0
1 10 20 1
1 20 30 2
1 30 40 3
1 30 52 3
Is this helpful? Could you further define which rows to exclude?
If you want users that have more than one chain:
select t.UserID
from <T> as t left outer join <T> as t2
on t2.UserID = t.UserID and t2.Before = t.After
where t2.UserID is null
group by t.UserID
having count(*) > 1;

Altering vs adding column - setting all dates in month to one end date

I'm trying to add rank by sales and also change the date column to a 'month end' field that would have one month end date per month - if that makes sense?
Would you alter table and add column or could you just rename the date field and use set and case to make all March dates = 3-31-18 and all April 4-30-18?
I got this far:
UPDATE table1
SET DATE=EOMONTH(DATE) AS MONTH_END;
ALTER TABLE table1
ADD COLUMN RANK INT AFTER sales;
UPDATE table1
SET RANK=
RANK() OVER(PARTITION BY cust ORDER BY sales DESC);
LIMIT 2
can i do two sets in a row like that without adding an update? I'm looking for top 2 within each month - would this work? I feel like this is right and most efficient query, but its not working - any help appreciated!!
orig table
+------+----------+-------+--+
| CUST | DATE | SALES | |
+------+----------+-------+--+
| 36 | 3-5-2018 | 50 | |
| 37 | 3-15-18 | 100 | |
| 38 | 3-25-18 | 65 | |
| 37 | 4-5-18 | 95 | |
| 39 | 4-21-18 | 500 | |
| 40 | 4-45-18 | 199 | |
+------+----------+-------+--+
desired output
+------+-----------+-------+------+
| CUST | Month End | SALES | Rank |
+------+-----------+-------+------+
| | | | |
| 37 | 3-31-18 | 100 | 1 |
| 38 | 3-31-18 | 65 | 2 |
| 39 | 4-30-18 | 500 | 1 |
| 40 | 4-30-18 | 199 | 2 |
+------+-----------+-------+------+
Based on your expected output I think this may work as well.
create table Salesdate (Cust int, Dates date, Sales int)
insert into Salesdate values
(36 , '2018-03-05' , 50 )
,(37 , '2018-03-15' , 100 )
,(38 , '2018-03-25' , 65 )
,(37 , '2018-04-05' , 95 )
,(40 , '2018-04-25' , 199 )
,(39 , '2018-04-21' , 500 )
Updating the same column dates to the last day of the month (EOmonth will help to give last day of the month), you can add a separate column or update the column as you prefer.
Update Salesdate
set Dates = eomonth(Dates)
Add a column called rank in the table.
Alter table Salesdate
add rank int
Update the column rank which was just added.
update Salesdate
set Salesdate.[rank] = tbl.Ranked from
(select Cust, Sales, Dates , rank() over (Partition by Dates order by Sales Desc)
Ranked from Salesdate ) tbl
where tbl.Cust = salesdate.Cust
and tbl.Sales = salesdate.Sales
and tbl.dates = salesdate.Dates
--Not sure if this step is necessary if you want your final table to have only rank 1 and 2, then you can delete the data. Or it can be filtered out only on select list as well. Please note that sometimes rank may skip the number if we don't have unique set of sales amount for a given customer.
;With cte as (
select * from Salesdate)
delete from cte
where [RANK] > 2
select * from Salesdate
order by dates, [RANK]
Output
Cust Dates Sales rank
37 2018-03-31 100 1
38 2018-03-31 65 2
39 2018-04-30 500 1
40 2018-04-30 199 2

Partion based on Specified value

Am trying to write q query which Partition based on value 90. Below is My table
create table #temp(StudentID char(2), Status int)
insert #temp values('S1',75 )
insert #temp values('S1',85 )
insert #temp values('S1',90)
insert #temp values('S1',85)
insert #temp values('S1',83)
insert #temp values('S1',90 )
insert #temp values('S1',85)
insert #temp values('S1',90)
insert #temp values('S1',93 )
insert #temp values('S1',93 )
insert #temp values('S1',93 )
Required Out put:
ID Status Result
S1 75 0
S1 85 0
S1 90 0
S1 85 1
S1 83 1
S1 90 1
S1 85 2
S1 90 2
S1 93 3
S1 93 3
S1 93 3
Please any one has the solution to partition based status id 90,Result should be 1,2,3 ..etc incrementing based on number of time value 90
Assuming that the actual question is "How can I find ranges/islands of incrementing values", the answer could use LAG to compare the current Status value with the previous one base on some order. If the previous value is 90, you have a new island :
declare #temp table (ID int identity PRIMARY KEY, StudentID char(2), Status int)
insert into #temp (StudentID,Status)
values
('S1',75),
('S1',85),
('S1',90),
('S1',85),
('S1',83),
('S1',90),
('S1',85),
('S1',90),
('S1',93),
('S1',93),
('S1',93);
select
* ,
case LAG(Status,1,0) OVER (PARTITION BY StudentID ORDER BY ID)
when 90 then 1 else 0 end as NewIsland
from #temp
This returns :
+----+-----------+--------+-----------+
| ID | StudentID | Status | NewIsland |
+----+-----------+--------+-----------+
| 1 | S1 | 75 | 0 |
| 2 | S1 | 85 | 0 |
| 3 | S1 | 90 | 0 |
| 4 | S1 | 85 | 1 |
| 5 | S1 | 83 | 0 |
| 6 | S1 | 90 | 0 |
| 7 | S1 | 85 | 1 |
| 8 | S1 | 90 | 0 |
| 9 | S1 | 93 | 1 |
| 10 | S1 | 93 | 0 |
| 11 | S1 | 93 | 0 |
+----+-----------+--------+-----------+
You can create an Island ID from this by summing all NewIsland values before the current one, using SUM with the ROWS clause of OVER:
with islands as
(
select
* ,
case LAG(Status,1,0) OVER (PARTITION BY StudentID ORDER BY ID)
when 90 then 1 else 0 end as NewIsland
from #temp
)
select * ,
SUM(NewIsland) OVER (PARTITION BY StudentID ORDER BY ID ROWS UNBOUNDED PRECEDING)
from islands
This produces :
+----+-----------+--------+-----------+--------+
| ID | StudentID | Status | NewIsland | Result |
+----+-----------+--------+-----------+--------+
| 1 | S1 | 75 | 0 | 0 |
| 2 | S1 | 85 | 0 | 0 |
| 3 | S1 | 90 | 0 | 0 |
| 4 | S1 | 85 | 1 | 1 |
| 5 | S1 | 83 | 0 | 1 |
| 6 | S1 | 90 | 0 | 1 |
| 7 | S1 | 85 | 1 | 2 |
| 8 | S1 | 90 | 0 | 2 |
| 9 | S1 | 93 | 1 | 3 |
| 10 | S1 | 93 | 0 | 3 |
| 11 | S1 | 93 | 0 | 3 |
+----+-----------+--------+-----------+--------+
BTW this is a case of the wider Gaps & Islands problem in SQL.
UPDATE
LAG and OVER are available in all supported SQL Server versions, ie SQL Server 2012 and later. OVER is also available in SQL Server 2008 but not LAG. In those versions different, slower techniques were used to calculate islands: The SQL of Gaps and Islands in Sequences
In most cases ROW_NUMBER() is used to calculate the row ordering, which results in one extra CTE. This can be avoided if the desired ordering is the same as the ID, or any other unique incrementing column. The following query returns the same results as the query that uses LAG :
select
* ,
case when exists (select ID
from #temp t1
where t1.StudentID=t2.StudentID
and t1.ID=t2.ID-1
and t2.status=90) then 1
else 0 end
as NewIsland
from #temp t2
This query returns 1 if there's any row with the same StudentID, Status 90 and ID or ROW_NUMBER one less, ie the same as LAG(,1).
After that we just need to SUM previous values. While SUM OVER was available in 2008, it only supported PARTITION BY. We need to use another subquery :
;with islands as
(
select
* ,
case when exists (select ID from #temp t1 where t1.StudentID=t2.StudentID and t1.ID=t2.ID-1 and t2.status=90) then 1
else 0 end
as NewIsland
from #temp t2
)
select * ,
(select ISNULL(SUM(NewIsland),0)
from islands i1
where i1.ID<i2.ID) AS Result
from islands i2
This sums all NewIsland values for rows with an ID less than the current one.
Performance
All those subqueries result in a lot of repeated scans. Suprisingly though, the older query is faster than the query with LAG because the first query has to order temporary results multiple times and filter by Status, with a 45% vs 55% execution plan cost.
Things change dramatically when an index is added :
declare #temp table ( ID int identity PRIMARY KEY, StudentID char(2), Status int,
INDEX IX_TMP(StudentID,ID,Status))
The multiple sorts disappear and the costs become 80% vs 20%. The query just scans the index values once without sorting the intermediate results.
The subquery version wasn't able to take advantage of the index
UPDATE 2
uzi suggested that removing LAG and summing only up to the previous row would be better :
select * ,
SUM(case when status =90 then 1 else 0 end)
OVER (PARTITION BY StudentID
ORDER BY ID ROWS
BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING)
from #temp;
Semantically, this is the same thing - for each row find all previous ones, calculate 1 for the 90s and 0 for the other rows, and sum them.
The server generate similar execution plans in both cases. The LAG version used two streaming aggregate operators while the version without it one. The end result for this limited data set was essentially the same though.
For a larger data set the results may be different, eg if the server has to spool data to tempdb because they didn't fit in memory.
Perhaps this is not a very good solution, but it works.
SELECT StudentID ID
, Marks Status
, CASE
WHEN Marks = 90
THEN SUM(q) OVER(order by row) - 1
ELSE SUM(q) OVER(order by row)
END Result
FROM (
SELECT row_number() OVER(order by StudentID desc) row
, *
, CASE
WHEN Marks = 90
THEN 1
ELSE 0
END q
FROM #temp
) a
You could simply use subquery
select *,
coalesce((select sum(case when Marks = 90 then 1 else 0 end)
from table
where studentid = t.studentid and
? < t.?) , 0) as Result
from table t;
However, ? (i.e. id) specify your actual data ordering columns

TSQL Divide set in groups

I have a table with 3 fields: wk, cor, id
"wk" is the week, "cor" groups items from same location, "id" is the id of each item to retrieve from warehouse.
Given a certain number of items to retrieve, I must take almost the same quantity of items from each group ("cor" represents groups) for balancing the warehouse performance, respecting the week precedence (before going to the following week, the previous must be ehausted).
If you follow the link the image may be clear:
Data sample
rows are taken in this order:
yellow, orange, green, gray (this last one starts with "cor 2" because "cor 1" was the last used in week 28)
The RES column (done by hand in the sample) represents the right order I should take items; currently this is obtained with a cursor, which is very very slow and I'd like to do something better, if possible; I've tried with windowed functions, cte, recursive cte but was not able to get anything right.
With this script you can have the same table
DECLARE #t TABLE (wk int, cor int, id int)
INSERT INTO #t
(
wk
,cor
,id
)
VALUES
(28,1,4044534),
(28,1,6778322),
(28,1,7921336),
(28,1,4326390),
(28,2,2669622),
(28,2,6580257),
(28,2,1179795),
(28,3,3980111),
(28,3,2549129),
(28,3,6763533),
(29,1,6023538),
(29,1,8219574),
(29,1,3836858),
(29,2,3355314),
(29,2,148847),
(29,2,8083320),
(29,3,1359966),
(29,3,8746308)
The expected result:
All fields are given while the RES field must be calculated and represents the order in which items will be taken out (explained below the table).
+----+-----+---------+-----+
| wk | cor | id | RES |
+----+-----+---------+-----+
| 28 | 1 | 4044534 | 1 |
| 28 | 1 | 6778322 | 4 |
| 28 | 1 | 7921336 | 7 |
| 28 | 1 | 4326390 | 10 |
| 28 | 2 | 2669622 | 2 |
| 28 | 2 | 6580257 | 5 |
| 28 | 2 | 1179795 | 8 |
| 28 | 3 | 3980111 | 3 |
| 28 | 3 | 2549129 | 6 |
| 28 | 3 | 6763533 | 9 |
| 29 | 1 | 6023538 | 11 |
| 29 | 1 | 8219574 | 14 |
| 29 | 1 | 3836858 | 17 |
| 29 | 2 | 3355314 | 12 |
| 29 | 2 | 148847 | 15 |
| 29 | 2 | 8083320 | 18 |
| 29 | 3 | 1359966 | 13 |
| 29 | 3 | 8746308 | 16 |
+----+-----+---------+-----+
The algo is like that:
The older week must be first exausted (in the sample, wk 28 must be finished before taking itmes from wk 29)
Items must be equally reparted in "cor"s, so if 10 items are required they must come out like that: 3 from cor1,3 from cor2, 3 from cor3. The last one may come from whichever cor because 10 is not divisible by 3, obv
If 11 items are required; week 28 only contains 10 items so the last one will be taken from week 29, with the same principle: equally distribute the exit among cors, even if weeks change. If the last article from week 28 was taken from cor 1, the next one in week 29 will be taken from cor 2
Does this answer your problem ?
DROP TABLE IF EXISTS #temp
DROP TABLE IF EXISTS #temp2
CREATE TABLE #temp (idx INT PRIMARY KEY IDENTITY(1,1), wk int, cor int, id int)
INSERT INTO #temp
VALUES
(28,1,4044534),
(28,1,6778322),
(28,1,7921336),
(28,1,4326390),
(28,2,2669622),
(28,2,6580257),
(28,2,1179795),
(28,3,3980111),
(28,3,2549129),
(28,3,6763533),
(29,1,6023538),
(29,1,8219574),
(29,1,3836858),
(29,2,3355314),
(29,2,148847),
(29,2,8083320),
(29,3,1359966),
(29,3,8746308)
SELECT wk, cor, id
, ROW_NUMBER() OVER (ORDER BY wk, RES, idx) as RES
FROM (
SELECT idx
, wk
, cor
, id
, ROW_NUMBER() OVER (PARTITION BY wk, cor ORDER BY cor) AS RES
FROM #temp
) AS t
ORDER BY idx
You don't have the correct information in your test data to support the desired output.
If however, you were to have an identity column that represents the insertion order, you could use something like the following...
WITH cte_RankOrder AS (
SELECT
t.rn, t.wk, t.cor, t.id,
RankOrder = DENSE_RANK() OVER (PARTITION BY t.wk, t.cor ORDER BY t.rn, t.wk)
FROM
#t t
)
SELECT
ro.rn, ro.wk, ro.cor, ro.id,
RES = ROW_NUMBER() OVER (ORDER BY wk, ro.RankOrder, ro.cor)
FROM
cte_RankOrder ro
ORDER BY ro.rn;
results...
rn wk cor id RES
----------- ----------- ----------- ----------- --------------------
1 28 1 4044534 1
2 28 1 6778322 4
3 28 1 7921336 7
4 28 1 4326390 10
5 28 2 2669622 2
6 28 2 6580257 5
7 28 2 1179795 8
8 28 3 3980111 3
9 28 3 2549129 6
10 28 3 6763533 9
11 29 1 6023538 11
12 29 1 8219574 14
13 29 1 3836858 17
14 29 2 3355314 12
15 29 2 148847 15
16 29 2 8083320 18
17 29 3 1359966 13
18 29 3 8746308 16
HTH, Jason

how to get sum of each column new records in SQL Server

I have a question about SQL Server. I have a table something like this:
productname |Level| January | Feburary | March | total
------------x-----x-----------x----------x-------x------
Rin | L1 | 10 | 20 | 30 | 60
Rin | L2 | 5 | 10 | 10 | 25
Rin | L3 | 20 | 5 | 5 | 30
Pen | L1 | 5 | 6 | 10 | 21
Pen | L2 | 10 | 10 | 20 | 40
Pen | L3 | 30 |10 | 40 | 80
based on above table data I want output like below
productname |Level| January | Feburary | March | total
------------x-----x-----------x----------x-------x------
Rin | L1 | 10 | 20 | 30 | 60
Rin | L2 | 5 | 10 | 10 | 25
Rin | L3 | 20 | 5 | 5 | 30
RinTotal |All | 35 | 35 | 45 | 115
Pen | L1 | 5 | 6 | 10 | 21
Pen | L2 | 10 | 10 | 20 | 40
Pen | L3 | 30 | 10 | 40 | 80
PenTotal | All | 45 | 26 | 70 |141
I tried like bellow query
SELECT productname
,LEVEL
,sum(january) AS January
,sum(Feburary) AS Feburary )
,Sum(march) AS March
,Sum(total) AS total
FROM test
UNION
SELECT *
FROM test
but its not given exact output .Please point me to right direction on how to achieve this task in SQL Server.
please try this:
SELECT * FROM TEST
UNION
SELECT PRODUCTNAME+'TOTAL','ALL' AS LEVEL,SUM(JANUARY)AS JANUARY,SUM(FEBURARY)AS FEBURARY),SUM(MARCH)AS MARCH,SUM(TOTAL)AS TOTAL
FROM TEST GROUP BY PRODUCTNAME
This really belongs in the front end. Group subtotals and such are usually really simple from most reporting tools. Also, don't get lazy and use select *, you should always be explicit in your columns. Since you have a specific order I added a couple of extra columns to use for sorting.
Also don't be afraid to add some white space and formatting to your queries. It makes your life a lot easier to read and later debug.
I think something like this should get you close. Notice I changed to a UNION ALL. When using UNION it will exclude duplicates. Since you know for a fact that there are no duplicate rows a UNION ALL will eliminate the need to check for duplicates.
select productname + 'Total' as productname
, 'All' as level
, sum(january) as January
, sum(Feburary) as Feburary
, Sum(march) as March
, Sum(total) as total
, productname as SortName
, 1 as SortOrder
from test
group by productname
union ALL
select productname
, level
, January
, Feburary
, March
, Total
, productname as SortName
, 0 as SortOrder
from test
order by SortName, SortOrder
I would do this using Group by With Rollup. For more info check here
SELECT *
FROM (SELECT productname=productname + CASE WHEN level IS NULL THEN 'Total'
ELSE '' END,
Level=Isnull(level, 'ALL'),
Sum(january) AS January,
Sum(feburary) AS Feburary,
Sum(march) AS March,
Sum(total) AS total
FROM Yourtable
GROUP BY rollup ( productname, level )) a
WHERE productname IS NOT NULL
SQLFIDDLE DEMO

Resources