Lag() with condition in sql server - sql-server

i have a table like this:
Number Price Type Date Time
------ ----- ---- ---------- ---------
23456 0,665 SV 2014/02/02 08:00:02
23457 1,3 EC 2014/02/02 07:50:45
23460 0,668 SV 2014/02/02 07:36:34
For each EC I need previous/next SV price. In this case, the query is simple.
Select Lag(price, 1, price) over (order by date desc, time desc),
Lead(price, 1, price) over (order by date desc, time desc)
from ITEMS
But, there are some special cases where two or more rows are EC type:
Number Price Type Date Time
------ ----- ---- ---------- ---------
23456 0,665 SV 2014/02/02 08:00:02
23457 1,3 EC 2014/02/02 07:50:45
23658 2,4 EC 2014/02/02 07:50:45
23660 2,4 EC 2014/02/02 07:50:48
23465 0,668 SV 2014/02/02 07:36:34
can I use Lead/Lag in this cases? If not, did I have to use a subquery?

Your question (and Anon's excellent answer) is part of the SQL of islands and gaps. In this answer, I will try to examine the "row_number() magic" in detail.
I've made a simple example based on events in a ballgame. For each event, we'd like to print the previous and next quarter related message:
create table TestTable (id int identity, event varchar(64));
insert TestTable values
('Start of Q1'),
('Free kick'),
('Goal'),
('End of Q1'),
('Start of Q2'),
('Penalty'),
('Miss'),
('Yellow card'),
('End of Q2');
Here's a query showing off the "row_number() magic" approach:
; with grouped as
(
select *
, row_number() over (order by id) as rn1
, row_number() over (
partition by case when event like '%of Q[1-4]' then 1 end
order by id) as rn2
from TestTable
)
, order_in_group as
(
select *
, rn1-rn2 as group_nr
, row_number() over (partition by rn1-rn2 order by id) as rank_asc
, row_number() over (partition by rn1-rn2 order by id desc)
as rank_desc
from grouped
)
select *
, lag(event, rank_asc) over (order by id) as last_event_of_prev_group
, lead(event, rank_desc) over (order by id) as first_event_of_next_group
from order_in_group
order by
id
The first CTE called "grouped" calculates two row_number()s. The first is 1 2 3 for each row in the table. The second row_number() places pause announcements in one list, and other events in a second list. The difference between the two, rn1 - rn2, is unique for each section of the game. It's helpful to check difference in the example output: it's in the group_nr column. You'll see that each value corresponds to one section of the game.
The second CTE called "order_in_group" determines the position of the current row within its island or gap. For an island with 3 rows, the positions are 1 2 3 for the ascending order, and 3 2 1 for the descending order.
Finally, we know enough to tell lag() and lead() how far to jump. We have to lag rank_asc rows to find the final row of the previous section. To find the first row of the next section, we have to lead rank_desc rows.
Hope this helps clarifying the "magic" of Gaps and Islands. Here is a working example at SQL Fiddle.

Yes, you can use LEAD/LAG. You just need to precalculate how far to jump with a little ROW_NUMBER() magic.
DECLARE #a TABLE ( number int, price money, type varchar(2),
date date, time time)
INSERT #a VALUES
(23456,0.665,'SV','2014/02/02','08:00:02'),
(23457,1.3 ,'EC','2014/02/02','07:50:45'),
(23658,2.4 ,'EC','2014/02/02','07:50:45'),
(23660,2.4 ,'EC','2014/02/02','07:50:48'),
(23465,0.668,'SV','2014/02/02','07:36:34');
; WITH a AS (
SELECT *,
ROW_NUMBER() OVER(ORDER BY [date] DESC, [time] DESC) x,
ROW_NUMBER() OVER(PARTITION BY
CASE [type] WHEN 'SV' THEN 1 ELSE 0 END
ORDER BY [date] DESC, [time] DESC) y
FROM #a)
, b AS (
SELECT *,
ROW_NUMBER() OVER(PARTITION BY x-y ORDER BY x ASC) z1,
ROW_NUMBER() OVER(PARTITION BY x-y ORDER BY x DESC) z2
FROM a)
SELECT *,
CASE [type] WHEN 'SV' THEN
LAG(price,z1,price) OVER(PARTITION BY [type] ORDER BY x)
ELSE LAG(price,z1,price) OVER(ORDER BY x)
END,
CASE [type] WHEN 'SV' THEN
LEAD(price,z2,price) OVER(PARTITION BY [type] ORDER BY x)
ELSE LEAD(price,z2,price) OVER(ORDER BY x)
END
FROM b
ORDER BY x

Here is yet another way of achieving the same result, but using conditional max/min functions windowed over an ordinal. The ordinal can be be set up based on whatever columns fits the purpose, but in this case I believe the OP intends them to be Date and Time.
DROP TABLE IF EXISTS #t;
CREATE TABLE #t (
Number INT,
Price MONEY,
Type CHAR(2),
Date DATE,
Time TIME(0)
);
INSERT INTO #t VALUES
(23456, 0.666, 'SV', '2014/02/02', '10:00:02'),
(23457, 1.4 , 'EC', '2014/02/02', '09:50:45'),
(23658, 2.5 , 'EC', '2014/02/02', '09:50:45'),
(23660, 2.5 , 'EC', '2014/02/02', '09:50:48'),
(23465, 0.669, 'SV', '2014/02/02', '09:36:34'),
(23456, 0.665, 'SV', '2014/02/02', '08:00:02'),
(23457, 1.3 , 'EC', '2014/02/02', '07:50:45'),
(23658, 2.4 , 'EC', '2014/02/02', '07:50:45'),
(23660, 2.4 , 'EC', '2014/02/02', '07:50:48'),
(23465, 0.668, 'SV', '2014/02/02', '07:36:34'), -- which one of these?
(23465, 0.670, 'SV', '2014/02/02', '07:36:34'); --
WITH time_ordered AS (
SELECT *, DENSE_RANK() OVER (ORDER BY Date, Time) AS ordinal FROM #t
)
SELECT
*,
CASE WHEN Type = 'EC'
THEN MAX(CASE WHEN ordinal = preceding_non_EC_ordinal THEN Price END)
OVER (PARTITION BY preceding_non_EC_ordinal ORDER BY ordinal ASC) END AS preceding_price,
CASE WHEN Type = 'EC'
THEN MIN(CASE WHEN ordinal = following_non_EC_ordinal THEN Price END)
OVER (PARTITION BY following_non_EC_ordinal ORDER BY ordinal DESC) END AS following_price
FROM (
SELECT
*,
MAX(CASE WHEN Type <> 'EC' THEN ordinal END)
OVER (ORDER BY ordinal ASC) AS preceding_non_EC_ordinal,
MIN(CASE WHEN Type <> 'EC' THEN ordinal END)
OVER (ORDER BY ordinal DESC) AS following_non_EC_ordinal
FROM time_ordered
) t
ORDER BY Date, Time
Note that the example given by the OP has been extended to show that interspersed sequences of EC yeild the intended result. The ambiguity introduced by the earliest two consecutive rows with type SV will in this case lead to the maximum value being picked. Setting up the ordinal to include the Price is a possible way to change this behavior.
An SQLFiddle can be found here: http://sqlfiddle.com/#!18/85117/1

Anon's solution is wonderful and Andomar's explanation of it is also great, but there is a difficulty in using this approach in large data sets, namely that you can get conflicts in what Andomar called 'group_nr' (rn1 - rn2) where events from much earlier have the same group number. This skews the rownumber calculation (which is by group_nr) and presents incorrect results when these conflicts arise.
Posting because I ran into this myself after working through this solution and finding errors.
my fix was to implement this version:
; with grouped as
(
select *
, row_number() over (order by id) as rn1
, row_number() over (
partition by case when event like '%of Q[1-4]' then 1 end
order by id) as rn2
from TestTable
)
, order_in_group as
(
select *
, CASE
WHEN event like '%of Q[1-4]' THEN (-1*rn1-rn2)
ELSE rn1 - rn2
END as group_nr
, row_number() over (partition by rn1-rn2 order by id) as rank_asc
, row_number() over (partition by rn1-rn2 order by id desc)
as rank_desc
from grouped
)
, final_grouping AS
(SELECT *
, row_number() over (partition by group_nr order by jobid) AS rank_asc
, row_number() over (partition by rn1-rn2 order by id desc) AS rank_desc
FROM order_in_group
)
select *
, lag(event, rank_asc) over (order by id) as last_event_of_prev_group
, lead(event, rank_desc) over (order by id) as first_event_of_next_group
from final_grouping
order by
id
;
Changing the pause events' group_nr values to negatives makes sure there are no conflicts with large data sets.

Related

How to use Pivot with RowNumber and date

I have a SQL Server table like this:
How can I change reading column into 2 columns based on rownumber?
I have tried like this:
WITH pivot_data AS
(
SELECT
date, CurrentMeterSNID,
1 + ((ROW_NUMBER() OVER (PARTITION BY CurrentMeterSNID ORDER BY date desc) - 1) % 2) rownum,
Reading
FROM
INF_Facility_ElectricalRecord
)
SELECT
date, CurrentMeterSNID, [1], [2]
FROM
pivot_data
PIVOT
(MAX(Reading) FOR rownum IN ([1], [2])) AS p;
but the result that I get is:
I get Null record; how can I replace that null value with record from a day after the date?
actually you are not doing PIVOT. You just want to conditionally display the value on different column. For this you use the CASE statement.
For the second requirement : for the NULL value, showing subsequent day value, you can use LEAD() or LAG() window function. This is the else part of the case
select date, CurrentMeterSNID,
[1] = case when rownum2 = 1
then reading
else lead(reading) over(partition by CurrentMeterSNID order by date)
end,
[2] = case when rownum2 = 2
then reading
else lead(reading) over(partition by CurrentMeterSNID order by date)
end
from INF_Facility_ElectricalRecord
As long as you are displaying every date in that query, you can't have what you want.
So you have to pick the max(date) in other words where rownumber will be 1.
WITH pivot_data AS(
SELECT date,CurrentMeterSNID,
1 + ((row_number() over(partition by CurrentMeterSNID ORDER by date desc) - 1) % 2) rownum,
Reading
FROM dbo.Table_1 )
, T2 AS
(
SELECT CurrentMeterSNID, date, [1], [2]
FROM pivot_data PIVOT (max(Reading) FOR rownum IN ([1],[2])) AS p
)
SELECT CurrentMeterSNID, Max(date), MAX([1]), Max([2])
FROM T2
GROUP BY CurrentMeterSNID

How to obtain Additions and Deductions from table

I have this table where I am storing the Sale Orders. The scenario is that once any sale order is punched it is not finalized, and requires editing later on so if any more items are added and saved again the sale order is updated with transaction number more than the previous one to keep the track of the changes. Here is a sample data that a sale order was punched and then 2 times more items were added and amount was changed and in the last row as shown items were cancelled and amount was changed.
I want to calculate the amount of the additions made in the sale order every time new items were added and the cancellations as well that how much worth of items were cancelled.
CREATE TABLE SaleOrder
(
TransactionNo Int,
SaleOrderDate DATE,
Code VARCHAR(25),
Quantity INT,
TotalAmount Numeric(18,2),
Remarks VARCHAR(25)
)
INSERT INTO SaleOrder VALUES (NULL, '2018-10-01', 'SO-001-OCT-18', 6, '2500', 'Hello');
INSERT INTO SaleOrder VALUES (1, '2018-10-01', 'SO-001-OCT-18', 8, '2600', 'Hello');
INSERT INTO SaleOrder VALUES (2, '2018-10-01', 'SO-001-OCT-18', 12, '3400', 'Hello');
INSERT INTO SaleOrder VALUES (3, '2018-10-01', 'SO-001-OCT-18', 9, '2900', 'Hello');
This will be the result that I am expected.
Code SaleOrderDate Quantity InitialAmount Addition Cancellation
SO-001-OCT-18 2018-10-01 9 2500.00 900.00 500.00
I have written this query but it's not helping that much.
;WITH CTE AS (
SELECT
[TransactionNo], [Code], [SaleOrderDate], [Quantity], [TotalAmount],
CAST('Oct 1 2018 10:16AM' AS DATE) AS [DateFrom], CAST('Oct 4 2018 10:16AM' AS DATE) AS [DateTo]
FROM [SaleOrder]
GROUP BY
[TransactionNo], [Code], [SaleOrderDate], [TotalAmount], Quantity
)
SELECT
[D].[TransactionNo], [D].[Code], [D].[SaleOrderDate], [D].[Quantity], [D].TotalAmount,
--CAST('Oct 4 2018 4:06PM' AS DATE) AS [DateFrom],
--CAST('Oct 4 2018 4:06PM' AS DATE) AS [DateTo],
[D].[Balance], [D].[Balance]-ISNULL(NULLIF([D].TotalAmount, 0),0) [Opening]
FROM(
SELECT *,
SUM(TotalAmount) OVER (PARTITION BY [Code] ORDER BY [TransactionNo], [SaleOrderDate]) AS [Balance]
FROM CTE
)D
WHERE [SaleOrderDate] BETWEEN CAST('Oct 1 2018 10:16AM' AS DATE) AND CAST('Oct 4 2018 10:16AM' AS DATE)
ORDER BY [SaleOrderDate]
use the LAG() window function to get previous value and compare to determine it is an addition or cancellation.
; WITH cte as
(
SELECT *,
row_no = ROW_NUMBER() OVER (PARTITION BY Code ORDER BY TransactionNo DESC),
Addition = CASE WHEN TotalAmount > LAG(TotalAmount) OVER (PARTITION BY Code ORDER BY TransactionNo)
THEN TotalAmount - LAG(TotalAmount) OVER (PARTITION BY Code ORDER BY TransactionNo)
ELSE 0
END,
Cancellation = CASE WHEN TotalAmount < LAG(TotalAmount) OVER (PARTITION BY Code ORDER BY TransactionNo)
THEN LAG(TotalAmount) OVER (PARTITION BY Code ORDER BY TransactionNo) - TotalAmount
ELSE 0
END
FROM SaleOrder
)
SELECT Code,
SaleOrderDate,
Quantity = MAX (CASE WHEN row_no = 1 then Quantity END),
InitialAmount = MAX (CASE WHEN TransactionNo IS NULL THEN TotalAmount END),
Addition = SUM (Addition),
Cancellation = SUM (Cancellation)
FROM cte
GROUP BY Code, SaleOrderDate
Are you trying to do this? :
SELECT
Code
, MAX(SaleOrderDate) SaleOrderDate
, MAX(Quantity) Quantity
, MAX(InitialAmount) InitialAmount
, SUM(Addition) Addition
, ABS(SUM(Cancellation)) Cancellation
FROM (
SELECT
Code
, CASE WHEN rn = cnt THEN SaleOrderDate END SaleOrderDate
, CASE WHEN rn = cnt THEN Quantity END Quantity
, InitialAmount
, CASE WHEN Diff > 0 THEN Diff ELSE 0 END Addition
, CASE WHEN Diff < 0 THEN Diff ELSE 0 END Cancellation
FROM (
SELECT *
, CASE WHEN TransactionNo IS NULL THEN TotalAmount END InitialAmount
, LEAD(TotalAmount) OVER(PARTITION BY Code ORDER BY TransactionNo) nxtPrice
, LEAD(TotalAmount) OVER(PARTITION BY Code ORDER BY TransactionNo) - TotalAmount Diff
, COUNT(*) OVER(PARTITION BY Code) cnt
, ROW_NUMBER() OVER(PARTITION BY Code ORDER BY SaleOrderDate) rn
FROM SaleOrder
) D
) C
GROUP BY
Code

SQL LAG Days since last order

Hi I am trying to create a windowed query in SQL that shows me the days since last order for each customer.
It now shows me the days in between each order.
What do I need to change in my query to have it only show the days since the last and the previous order per customer? Now it shows it for every order the customer made.
Query:
SELECT klantnr,besteldatum,
DATEDIFF(DAY,LAG(besteldatum) OVER(PARTITION BY klantnr ORDER BY besteldatum),besteldatum) AS DaysSinceLastOrder
FROM bestelling
GROUP BY klantnr,besteldatum;
You can use row_number() to order the rows by besteldatum for each klantnr, and return the latest two using a derived table (subquery) or common table expression.
derived table version:
select klantnr, besteldatum, DaysSinceLastOrder
from (
select klantnr, besteldatum
, DaysSinceLastOrder = datediff(day,lag(besteldatum) over (partition by klantnr order by besteldatum),besteldatum)
, rn = row_number() over (partition by klantnr order by besteldatum desc)
from bestelling
group by klantnr, besteldatum
) t
where rn = 1
common table expression version:
;with cte as (
select klantnr, besteldatum
, DaysSinceLastOrder = datediff(day,lag(besteldatum) over (partition by klantnr order by besteldatum),besteldatum)
, rn = row_number() over (partition by klantnr order by besteldatum desc)
from bestelling
group by klantnr, besteldatum
)
select klantnr, besteldatum, DaysSinceLastOrder
from cte
where rn = 1
If you want one row per customer, rn = 1 is the proper filter. If you want n number of latest rows, use rn < n+1.

CASE when statement SQL Server

I have a stored procedure in the works, however, I'm missing a few pieces of data when I run it. I know the reason why, but I'm having a hard time figure it out... this is my code:
INSERT INTO tempIntake (Pop, PlanID, PopFull, ApptDate, [1stAppt], Followup, Rn, UserID)
SELECT Pop, PlanID, PopFull, InterviewDate, 1 stAppt, Followup, rn, #UserID
FROM (
SELECT *, ROW_NUMBER() OVER (PARTITION BY PlanID ORDER BY AddedDate ASC) AS rn
FROM VInfo
WHERE AddedDate IS NOT NULL
) t
WHERE rn = 1 AND interviewdate >= #fromDate AND interviewDate <= #toDate
So what I'm trying to do here is this....For some PLANID's I don't have an added date. Because I'm filtering by the AddedDate and in case is null - this data does not come up - even though I need it to be shown. In this case I'd like to make it a dummy date like '1/1/2016' this way, when the actually AddedDate becomes available in the table, it will be used instead of the dummy date.
If AddedDate can't be "bigger" then GETDATE(), you can use ISNULL(AddedDate,GETDATE())) and remove the where AddedDate is not null condition:
Insert into tempIntake(Pop, PlanID, PopFull, ApptDate, 1stAppt, Followup, Rn, UserID)
select
Pop, PlanID, PopFull, InterviewDate, 1stAppt, Followup, rn, #UserID
from
(Select
*,
row_number() over (partition by PlanID order BY ISNULL(AddedDate,GETDATE())) as rn
from
VInfo) t
where
rn = 1
and interviewdate >= #fromDate
and interviewDate <= #toDate

SQL Server, first of each time series

A table 'readings' has a list of dates
[Date] [Value]
2015-03-19 00:30:00 1.2
2015-03-19 00:40:00 1.2
2015-03-19 00:50:00 0.1
2015-03-19 01:00:00 0.1
2015-03-19 01:10:00 2
2015-03-19 01:20:00 0.5
2015-03-19 01:30:00 0.5
I need to get the most recent instance where the value is below a set point (in this case the value 1.0), but I only want the start (earliest datetime) where the value was below 1 for consecutive times.
So with the above data I want to return 2015-03-19 01:20:00, as the most recent block of times where value < 1, but I want the start of that block.
This SQL just returns the most recent date, rather than the first date whilst the value has been low (so returns 2015-03-19 01:30:00 )
select top 1 *
from readings where value <=1
order by [date] desc
I can't work out how to group the consecutive dates, to therefore only get the first ones
It is SQL Server, the real data isn't at exactly ten min intervals, and the readings table is about 70,000 rows- so fairly large!
Thanks, Charli
Demo
SELECT * FROM (
SELECT [Date]
,Value
,ROW_NUMBER() OVER (PARTITION BY cast([Date] AS DATE) ORDER BY [Date] ASC) AS RN FROM #table WHERE value <= 1
) t WHERE t.RN = 1
Select Max( [date] )
From [dbo].[readings]
Where ( [value] <= 1 )
You seem to want the minimum date for each set of consecutive records having a value that is less than 1. The query below returns exactly these dates:
SELECT MIN([Date])
FROM (
SELECT [Date], [Value],
ROW_NUMBER() OVER (ORDER BY [Date]) -
COUNT(CASE WHEN [Value] < 1 THEN 1 END) OVER (ORDER BY [Date]) AS grp
FROM mytable) AS t
WHERE Value < 1
GROUP BY grp
grp calculated field identifies consecutive records having Value<1.
Note: The above query will work for SQL Server 2012+.
Demo here
Edit:
To get the date value of the last group you can modify the above query to:
SELECT TOP 1 MIN([Date])
FROM (
SELECT [Date], [Value],
ROW_NUMBER() OVER (ORDER BY [Date]) -
COUNT(CASE WHEN [Value] < 1 THEN 1 END) OVER (ORDER BY [Date]) AS grp
FROM mytable) AS t
WHERE Value < 1
GROUP BY grp
ORDER BY grp DESC
Demo here

Resources