How to group by on consecutive values in SQL

How to group by on consecutive values in SQL - sql-server

I have a table in SQL Server 2014 with sample data as follows.
WK_NUM | NET_SPRD_LCL
10 0
11 1500
12 3600
13 3800
14 4000
I am trying to code a bonus structure at work where I need to group on WK_NUM. So, if I see NET_SPRD_LCL > 3500 for two consecutive WK_NUMs WHERE WK_NUM < 27, I need to output 2000. In this example, since NET_SPRD_LCL for WK_NUM 12 and 13 are both greater than 3500, the SQL should output 2000 and exit. So, it should ignore the fact that WK_NUM 13 and 14 also satisfy the condition that NET_SPRD_LCL > 3500.
I would appreciate any help with this.

Assuming you mean consecutive line 1, 2, 3, 4, 5 ... etc. and NOT
1, 3, 5, 8, 12, etc.
then, if you don't need to know which pair of consecutive records it was:
Select case when exists
(Select * from table f
join table n
on n.Wk_Num = f.Wk_Num + 1
and n.NET_SPRD_LCL > 3500
and f.NET_SPRD_LCL > 3500
and n.Wk_Num < 27
then 2000 else null end
If you do need to identify the pair of records, then:
Select f.wk_Num firstWorkNbr, f.NET_SPRD_LCL firstNetSpread,
n.wk_Num nextWorkNbr, n.NET_SPRD_LCL nextNetSpread
from table f
join table n
on n.Wk_Num = f.Wk_Num + 1
and n.NET_SPRD_LCL > 3500
and f.NET_SPRD_LCL > 3500
and n.Wk_Num < 27
Where not exists
(Select * from table f0
join table n0
on n0.Wk_Num = f0.wk_Num + 1
and n0.WkNum < f.Wk_Num))
on the other hand if the consecutive is simply increasing, then it's a bit harder. You need to use a subquery to determine the next consecutive record...
Select case when exists
(Select * from table f
join table n
on n.Wk_Num = (Select Min(Wk_Num) from table
Where Wk_Num > f.Wk_Num)
and n.NET_SPRD_LCL > 3500
and f.NET_SPRD_LCL > 3500
and n.Wk_Num < 27
then 2000 else null end
and if you need to fetch the data for the specific first pair of records that qualify (the 2000 at the end is unnecessary since if there is no qualifying pair nothing will be returned.)
Select f.wk_Num firstWorkNbr, f.NET_SPRD_LCL firstNetSpread,
n.wk_Num nextWorkNbr, n.NET_SPRD_LCL nextNetSpread, 2000 outValue
from table f
join table n
on n.Wk_Num = (Select Min(Wk_Num) from table
Where Wk_Num > f.Wk_Num)
and n.NET_SPRD_LCL > 3500
and f.NET_SPRD_LCL > 3500
and n.Wk_Num < 27
Where not exists
(Select * from table f0
join table n0
on n0.Wk_Num = (Select Min(Wk_Num) from table
Where Wk_Num > f0.Wk_Num)
and n0.WkNum < f.Wk_Num))

First of all, when you say you want your query to 'output' and 'exit', it makes me think you are approaching t-sql as a procedural language, which it is not. Good t-sql queries are nearly always set based.
In any case, before the query, let me add what is helpful for others to work with the data to build queries:
DECLARE #t TABLE (WK_NUM INT, NET_SPRD_LCL INT);
INSERT INTO #t VALUES
(10, 0),
(11, 1500),
(12, 3600),
(13, 3800),
(14, 4000);
You say you are using SQL Server 2014, which means you have relevant window functions at your disposal. The one I am using (LAG) will have superior performance to using subqueries, which, if you insist on using, can be greatly improved by using TOP (1) with ORDER BY and an appropriate index instead of using a MIN function over the whole dataset. With tiny amounts of data you won't notice a difference, but on a real business system it will be obvious.
Adjusted to provide the 2000 bonus on the correct line after OP's clarification:
WITH cteTemp AS
(
SELECT WK_NUM
, thisValue = NET_SPRD_LCL
, lastValue = LAG(NET_SPRD_LCL) OVER(ORDER BY WK_NUM)
FROM #t
WHERE WK_NUM < 27
)
, cteBonusWeek AS
(
SELECT TOP (1)
WK_NUM
, bonus = 2000
FROM cteTemp
WHERE thisValue > 3500 AND lastValue > 3500
ORDER BY WK_NUM
)
SELECT t.WK_NUM
, t.NET_SPRD_LCL
, bonus = COALESCE(b.bonus, 0)
FROM #t AS t
LEFT JOIN cteBonusWeek AS b
ON b.WK_NUM = t.WK_NUM;

Related

SQL Server script not working as expected

I have this little script that shall return the first number in a column of type int which is not used yet.
SELECT t1.plu + 1 AS plu
FROM tovary t1
WHERE NOT EXISTS (SELECT 1 FROM tovary t2 WHERE t2.plu = t1.plu + 1)
AND t1.plu > 0;
this returns the unused numbers like
3
11
22
27
...
The problem is, that when I make a simple select like
SELECT plu
FROM tovary
WHERE plu > 0
ORDER BY plu ASC;
the results are
1
2
10
20
...
Why the first script isn't returning some of free numbers like 4, 5, 6 and so on?

Compiling a formal answer from the comments.
Credit to Larnu:
It seems what the OP really needs here is an (inline) Numbers/Tally (table) which they can then use a NOT EXISTS against their table.
Sample data
create table tovary
(
plu int
);
insert into tovary (plu) values
(1),
(2),
(10),
(20);
Solution
Isolating the tally table in a common table expression First1000 to produce the numbers 1 to 1000. The amount of generated numbers can be scaled up as needed.
with First1000(n) as
(
select row_number() over(order by (select null))
from ( values (0),(0),(0),(0),(0),(0),(0),(0),(0),(0) ) a(n) -- 10^1
cross join ( values (0),(0),(0),(0),(0),(0),(0),(0),(0),(0) ) b(n) -- 10^2
cross join ( values (0),(0),(0),(0),(0),(0),(0),(0),(0),(0) ) c(n) -- 10^3
)
select top 20 f.n as Missing
from First1000 f
where not exists ( select 'x'
from tovary
where plu = f.n);
Using top 20 in the query above to limit the output. This gives:
Missing
-------
3
4
5
6
7
8
9
11
12
13
14
15
16
17
18
19
21
22
23
24

Choose row that equal to the max value from a query

I want to know who has the most friends from the app I own(transactions), which means it can be either he got paid, or paid himself to many other users.
I can't make the query to show me only those who have the max friends number (it can be 1 or many, and it can be changed so I can't use limit).
;with relationships as
(
select
paid as 'auser',
Member_No as 'afriend'
from Payments$
union all
select
member_no as 'auser',
paid as 'afriend'
from Payments$
),
DistinctRelationships AS (
SELECT DISTINCT *
FROM relationships
)
select
afriend,
count(*) cnt
from DistinctRelationShips
GROUP BY
afriend
order by
count(*) desc
I just can't figure it out, I've tried count, max(count), where = max, nothing worked.
It's a two columns table - "Member_No" and "Paid" - member pays the money, and the paid is the one who got the money.
Member_No
Paid
14
18
17
1
12
20
12
11
20
8
6
3
2
4
9
20
8
10
5
20
14
16
5
2
12
1
14
10
It's from Excel, but I loaded it into sql-server.
It's just a sample, there are 1000 more rows

It seems like you are massively over-complicating this. There is no need for self-joining.
Just unpivot each row so you have both sides of the relationship, then group it up by one side and count distinct of the other side
SELECT
-- for just the first then SELECT TOP (1)
-- for all that tie for the top place use SELECT TOP (1) WITH TIES
v.Id,
Relationships = COUNT(DISTINCT v.Other),
TotalTransactions = COUNT(*)
FROM Payments$ p
CROSS APPLY (VALUES
(p.Member_No, p.Paid),
(p.Paid, p.Member_No)
) v(Id, Other)
GROUP BY
v.Id
ORDER BY
COUNT(DISTINCT v.Other) DESC;
db<>fiddle

How do you select a number of random rows from different AgeGroup?

I am trying to create a for loop in python to connect it to Snowflake since Snowflake does not support loops.
I want to select a number of random rows from different AgeGroups. eg. 1500 rows from AgeGroup "30-40", 1200 rows from AgeGroup "40-50" , 875 rows from AgeGroup "50-60".
Any ideas how to do it or an alternative method for a loop in Snowflake?

Have you looked at Snowflake's Stored Procedures? They are Javascript and would allow you to loop natively in Snowflake:
https://docs.snowflake.net/manuals/sql-reference/stored-procedures-overview.html

What do you mean by "Snowflake doesn't have loops"? SQL has "loops" if you can find them...
The following query does what you asked for:
WITH POPULATION AS ( /* 10,000 persons with random age 0-100 */
SELECT 'Person ' || SEQ2() ID, ABS(RANDOM()) % 100 AGE
FROM TABLE(GENERATOR(ROWCOUNT => 10000))
)
SELECT
ID,
AGE,
CASE
WHEN AGE < 30 THEN '0-30'
WHEN AGE < 40 THEN '30-40'
WHEN AGE < 50 THEN '40-50'
WHEN AGE < 60 THEN '50-60'
ELSE '60-100'
END AGE_GROUP,
ROW_NUMBER() OVER (PARTITION BY AGE_GROUP ORDER BY RANDOM()) DRAW_ORDER
FROM POPULATION
QUALIFY DRAW_ORDER <= DECODE(AGE_GROUP, '30-40', 1500, '40-50', 1200, '50-60', 875, 0);
Addendum:
As pointed out by waldente, a simpler and more efficient way is to use SAMPLE:
WITH
POPULATION_30_40 AS (SELECT * FROM POPULATION WHERE AGE >= 30 AND AGE < 40),
POPULATION_40_50 AS (SELECT * FROM POPULATION WHERE AGE >= 40 AND AGE < 50),
POPULATION_50_60 AS (SELECT * FROM POPULATION WHERE AGE >= 50 AND AGE < 60)
SELECT * FROM POPULATION_30_40 SAMPLE(1500 ROWS) UNION ALL
SELECT * FROM POPULATION_40_50 SAMPLE(1200 ROWS) UNION ALL
SELECT * FROM POPULATION_50_60 SAMPLE(875 ROWS)

If you want to draw n random samples from each group you could create a subquery containing a row number that is randomly distributed within each group, and then select the top n rows from each group.
If you have a table like this:
USER DATE
1 2018-11-04
1 2018-11-04
1 2018-12-07
1 2018-10-09
1 2018-10-09
1 2018-11-07
1 2018-11-09
1 2018-11-09
2 2019-11-02
2 2019-10-02
2 2019-11-03
2 2019-11-06
3 2019-11-10
3 2019-11-13
3 2019-11-15
This query could be used to return two random rows for User 2 and 3, and 3 random rows for user 1:
SELECT User, Date
FROM (
SELECT *, ROW_NUMBER() OVER(PARTITION BY User ORDER BY RANDOM()) as random_row
FROM Users)
WHERE
(User = 3 AND random_row < 3) OR
(User = 2 AND random_row < 3) OR
(User = 1 AND random_row < 4);
So in your case partition on and filter age_group instead of User.

Snowflake has support for random and deterministic table sampling. For Example:
Return a sample of a table in which each row has a 10% probability of being included in the sample:
SELECT * FROM testtable SAMPLE (10);
https://docs.snowflake.net/manuals/sql-reference/constructs/sample.html

Using t-sql to select aggregate when date difference is not just equal but small

I have a table where I want to select the maximum of a column but based on when the date difference is equal or small (lets say 3 days). When two subsequent dates are very close, the data are likely spurious and I want to get the highest state when that happens.
My data looks similar to this
DECLARE #TestingResults TABLE (
IDNumber varchar(100),
DateSeen date,
[state] int)
INSERT INTO #TestingResults VALUES
('A','2015-04-21',2),
('A','2015-05-08',2),
('A','2015-07-01',3),
('B','2014-06-18',100), -- this is the one I want
('B','2014-06-19',2),
('B','2014-07-31',2),
('B','2014-08-11',3),
('B','2014-09-24',3),
('B','2014-10-24',3),
('B','2014-11-24',3),
('B','2014-12-15',3),
('B','2015-01-12',3),
('B','2015-01-13',400), -- this is the one I want
('B','2015-04-06',10), -- either will do
('B','2015-04-07',10),
('B','2015-07-06',3), -- either will do
('B','2015-07-07',3),
('B','2015-10-12',3),
('C','2012-02-20',3),
('C','2012-03-12',3),
('C','2012-04-02',3),
('C','2012-11-21',3)
What I really want is something like this where I take the maximum of state when the difference between dates is < 3 (note, some of the data may have the same state even when the differences in date are small ...) :
IDNumber DateSeen state
A 2015-04-21 2
A 2015-05-08 2
A 2015-07-01 3
-- if there are observations < 3 days apart, take MAX
B 2014-06-18 100
B 2014-07-31 2
B 2014-08-11 3
B 2014-09-24 3
B 2014-10-24 3
B 2014-11-24 3
B 2014-12-15 3
-- if there are observations < 3 days apart, take MAX
B 2015-01-13 400
-- if there are observations < 3 days apart, take MAX
B 2015-04-07 10
-- if there are observations < 3 days apart, take MAX
B 2015-07-07 3
B 2015-10-12 3
C 2012-02-20 3
C 2012-03-12 3
C 2012-04-02 3
C 2012-11-21 3
I guess I could create another variable table to hold it and then query it but there are a couple of problems. First as you can see, IDNumber='B' has a couple of triggers in its sequences of dates so I am thinking there should be an 'smarter' way.
Thanks!

After your clarifying comments (thanks for that!), I would do this as follows:
SELECT ISNULL(high.IDNumber, results.IDNumber) AS IDNumber,
ISNULL(high.DateSeen, results.DateSeen) AS DateSeen,
ISNULL(high.[state], results.[state]) AS [state]
FROM #TestingResults results
OUTER APPLY
(
SELECT TOP 1 IDNumber, DateSeen, [state]
FROM #TestingResults highest
WHERE highest.DateSeen < results.DateSeen
AND highest.IDNumber = results.IDNumber
AND DATEDIFF(DAY,highest.DateSeen,results.DateSeen) <=3
ORDER BY [state] DESC, [DateSeen] DESC
) high
WHERE NOT EXISTS
(
SELECT 1
FROM #TestingResults nearFuture
WHERE nearFuture.DateSeen > results.DateSeen
AND nearFuture.IDNumber = results.IDNumber
AND DATEDIFF(DAY,results.DateSeen,nearFuture.DateSeen) <=3
)
This is almost certainly not the most elegant way to achieve this (I suspect this could be done more efficiently with Window Functions or a recursive CTE or similar), I believe it gives you the behaviour and results you desire.

This should do it using a recursive CTE:
WITH TestingResults AS (
SELECT
*
,ROW_NUMBER() OVER(ORDER BY IDNumber, DateSeen) AS RowNum
FROM #TestingResults
), Data AS (
SELECT
tmp1.IDNumber,
tmp1.DateSeen,
tmp1.state,
tmp1.RowNum,
tmp1.RowNum AS GroupID
FROM (
SELECT
*
,ABS(DATEDIFF(DAY, DateSeen, LAG(DateSeen, 1, NULL) OVER(PARTITION BY IDNumber ORDER BY DateSeen))) AS AbsPrev
FROM TestingResults
) AS tmp1
WHERE tmp1.AbsPrev IS NULL OR tmp1.AbsPrev >= 3 --the first date in a sequence
UNION ALL
SELECT
r.IDNumber,
r.DateSeen,
r.state,
r.RowNum,
d.GroupID
FROM Data d
INNER JOIN TestingResults r ON
r.IDNumber = d.IDNumber
AND DATEDIFF(DAY, d.DateSeen, r.DateSeen) < 3
AND d.RowNum+1 = r.RowNum
)
SELECT MIN(d.IDNumber) AS IDNumber, MAX(d.DateSeen) AS DateSeen, MAX(d.state) AS state
FROM Data d
GROUP BY d.GroupID

SQL: return IDs whose val=min(exp)

Given a table like
pkg#, time
0, 20
1, 23
2, 34
3, 35
4, 59
I want to know the pkg# who has max/min time difference to its successor pkg (gap between 2 consecutive pkgs)
In this case, pkg-2 has min time difference (1), and pkg-3 has max time difference (14)
What's the sql that can return pkg# for min/max time difference to its next pkg?

If you are on SQL SERVER 2012 or above, you can try LEAD function here to get the next row value to align in your current row:
SELECT *, LEAD([time]) OVER(ORDER BY [pkg#]) as nexttime
FROM [your_table]
will yield something like this:
pkg time nexttime
0 20 23
1 23 34
2 34 35
3 35 59
4 59 NULL
Now compare these two columns values should give you what you want. (Note last row will have nexttime = NULL since there's no more row to get value from, so just filter it out when querying).
Assume new table name is new_table, to get max diff:
select top 1 *, nexttime-time as diff
from new_table
where nexttime is not null
order by (nexttime-time) desc
and to get min diff just order by nexttime-time

A slight twist on #xbb 's answer:
CREATE TABLE #t ( Pkg INT, Time INT );
INSERT #t ( Pkg, Time )
VALUES ( 0, 20 ),
( 1, 23 ),
( 2, 34 ),
( 3, 35 ),
( 4, 59 );
SELECT Pkg
, Time
, Time - LAG(Time) OVER ( ORDER BY Pkg ) AS TimeSincePrevious
, ABS(time - LEAD(Time) OVER ( ORDER BY Pkg )) AS TimeUntilNext
FROM #t;
DROP TABLE #t;
Will yield the result:
Pkg Time TimeSincePrevious TimeUntilNext
0 20 NULL 3
1 23 3 11
2 34 11 1
3 35 1 24
4 59 24 NULL

Take a look at solution below - I decomposed query into three steps:
WITH Ordered AS
(
SELECT ROW_NUMBER() OVER (ORDER BY pkg) rowNum, pkg, [time] FROM Test
),
Diffs AS
(
SELECT T1.pkg,
T2.[time]-T1.[time] diff,
MIN(T2.[time]-T1.[time]) OVER () minimum,
MAX(T2.[time]-T1.[time]) OVER () maximum
FROM Ordered T1
JOIN Ordered T2 ON T1.rowNum = T2.rowNum-1
)
SELECT pkg, diff FROM Diffs
WHERE diff=minimum OR diff=maximum
ORDER by diff
Number of rows
Join with offset 1, calculate diff, MIN and MAX
Filter rows not equal to min or max
Query may return more rows if tie occurs. Ties can be simply removed by replacing final SELECT with:
...
SELECT MIN(pkg) pkg, diff FROM Diffs
WHERE diff=minimum OR diff=maximum
GROUP BY diff
ORDER by diff

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

How to group by on consecutive values in SQL - sql-server

Related

SQL Server script not working as expected

Choose row that equal to the max value from a query

How do you select a number of random rows from different AgeGroup?

Using t-sql to select aggregate when date difference is not just equal but small

SQL: return IDs whose val=min(exp)

Categories

Resources