SQL Server: how to find maximum length continuous range?

SQL Server: how to find maximum length continuous range? - sql-server

I have a table containing one integer column (signed) NUM.
In each row, this table contains a random number. Each number is found in the table an arbitrary number of times.
I need to find the maximum length of a continuous (without missing numbers) range,
present in the table, missed is considered.
Number in the range of min(NUM) max(NUM) (where min and max functions of SQL)

This sounds like a typical gaps-and-islands problem:
SELECT TOP 1 MIN(num) num_from, MAX(num) num_upto, COUNT(DISTINCT num) num_count
FROM (
SELECT num, SUM(num_changed) OVER (ORDER BY num) num_groupno
FROM (
SELECT num, CASE WHEN LAG(num) OVER (ORDER BY num) BETWEEN num - 1 AND num THEN 0 ELSE 1 END num_changed
FROM (VALUES
(1),
(2),
(3),
(5),
(6),
(7),
(7),
(8),
(10)
) v(num)
) cte1
) cte2
GROUP BY num_groupno
ORDER BY COUNT(DISTINCT num) DESC
Result:
num_from num_upto num_count
5 8 4

--make test data
select 1 as val into #test;
insert #test (val)
values (1),(1),(2),(3),(4),(4),(5),(7),(8),(9),(10),(11),(12),(13);
select * from #test;
--With command to find start and end of 'ranges'
--then join start of range to its corresponding end, with length
--then list the longest ranges (with ties)
;WITH LB AS (SELECT t1.val from #test t1 LEFT JOIN #test t2 on t1.val - 1 = t2.val WHERE t2.val is null),
UB AS (SELECT t1.val from #test t1 LEFT JOIN #test t2 on t1.val + 1 = t2.val WHERE t2.val is null),
Ranges AS (SELECT DISTINCT LB.val s, Q.val e,q.val-lb.val + 1 cnt FROM LB
CROSS APPLY
(SELECT TOP 1 val FROM UB WHERE UB.val >= LB.val ORDER BY UB.val) Q)
SELECT TOP 1 with ties * FROM Ranges order by cnt DESC
drop table #test;

Related

Get all Numbers that are not existing from a Microsoft SQL-VarChar-Column

I have a table with a Column for ID-Numbers that are not increased in single steps.
So there are Numbers that are not used and this i need. The Column is a VarChar-Column.
For Example:
Used Numbers in the table = 2, 5, 7, 9, 10 etc.
So i need a Query that gives me = 1, 3, 4, 6, 8 etc.
Pseudo-Code something like:
select numbers from Table NOT IN (select numbers from table)!
I have tried with NOT IN and NOT EXISTS, but nothing works.
Can someone help me to achieve this?
EDIT: Range of Numbers is from 0 to 99999999 !!!

DECLARE #Table AS TABLE
(
Id VARCHAR(5)
)
INSERT INTO #Table
VALUES
('1')
,('3')
,('5')
,('7')
,('10')
DECLARE #Range AS TABLE
(
RangeId VARCHAR(10)
)
INSERT INTO #Range
SELECT TOP (1000000) n = CONVERT(VARCHAR(10), ROW_NUMBER() OVER (ORDER BY s1.[object_id]))
FROM sys.all_objects AS s1 CROSS JOIN sys.all_objects AS s2
OPTION (MAXDOP 1)
select
MissingId = RangeId
from
#Range AS R
LEFT OUTER JOIN #Table AS T ON T.Id = R.RangeId
WHERE
CONVERT(INT,R.RangeId) <= (SELECT MAX(CONVERT(INT,Id)) FROM #Table)
AND T.Id IS NULL
order by MissingId

As you don't mention what the upper limit is, and recursive Common Table Expressions are inherently slow, you would likely be better off with a Tally to achieve this:
CREATE TABLE dbo.YourTable (ID int);
INSERT INTO dbo.YourTable (ID)
VALUES(1),(3),(5),(7),(9),(11),(13),(15),(216); --Big jump on purpose
WITH N AS(
SELECT N
FROM (VALUES(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL))N(N)),
Tally AS(
SELECT TOP (SELECT MAX(ID) FROM dbo.YourTable) --Limit the tally for performance
ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS I
FROM N N1, N N2, N N3) --1000 rows, add more Ns for more rows
SELECT I AS ID
FROM Tally T
LEFT JOIN dbo.YourTable YT ON T.I = YT.ID
WHERE YT.ID IS NULL;
Warning: Based on the comment on another answer:
This is the right Direction. When I do it in my Situation, I get only 100 Numbers. But the Numbers have 8 Digits and there are MUCH more then 100 Numbers!
8 digits means you have IDs with a value of 10,000,000 (10 Million) plus. Creating over 10 million rows in a tally will be very IO intensive. I strongly suggest putting this into batches.
Edit2: Ok, the max (from a comment on this answer) is 99,999,999! This is information that definitely should have been in the question. This process must be batched or you will kill your transaction Logs.

With out using CTE you can use the master.dbo.spt_values trick. Not sure the actual purpose of this table in msdb, but it contains the values we need. Give it a try. If you have bigger values than spt_values, then divide your max id by spt_value's max, and replace number+1 with number+1+(#currentbatch*#maxsptvalues) (first batch is batch 0). I haven't tested it nor written then code for it, but something of that sort should certainly work. You can do it in a while loop for instance.
IF OBJECT_ID('tmptbl') IS NOT null
DROP TABLE tmptbl
GO
SELECT * INTO tmptbl
FROM
(
SELECT '1' [id]
UNION
SELECT '3'
UNION
SELECT '5' ) t
DECLARE #maxid INT = 0
SELECT #maxid = MAX(id) FROM tmptbl
SELECT number+1
FROM master.dbo.spt_values
WHERE number < #maxid
AND Type = 'p'
AND NOT EXISTS ( SELECT 1
FROM dbo.tmptbl
WHERE CONVERT(INT,[id]) = (number+1))
ORDER BY number
The Result:
2,4

T-SQL - Next row with greater value, continuously

I have table described bellow from which I need to select all rows with [Value] greater for example at least 5 points than [Value] from previous row (ordered by [Id]). Starting with first row of [Id] 1, desired output would be:
[Id] [Value]
---------------
1 1
4 12
8 21
Code:
declare #Data table
(
[Id] int not null identity(1, 1) primary key,
[Value] int not null
);
insert into #Data ([Value])
select 1 [Value]
union all
select 5
union all
select 3
union all
select 12
union all
select 8
union all
select 9
union all
select 16
union all
select 21;
select [t1].*
from #Data [t1];
Edit:
So, based on JNevill's and Hogan's answers I end with this:
;with [cte1]
as (
select [t1].[Id],
[t1].[Value],
cast(1 as int) [rank]
from #Data [t1]
where [t1].[Id] = 1
union all
select [t2].[Id],
[t2].[Value],
cast(row_number() over (order by [t2].id) as int) [rank]
FROM [cte1] [t1]
inner join #Data [t2] on [t2].[value] - [t1].[value] > 5
and [t2].[Id] > [t1].[Id]
where [t1].[rank] = 1
)
select [t1].[Id],
[t1].[Value]
from [cte1] [t1]
where [t1].[rank] = 1;
which is working. Alan Burstein answer is correct too (but applicable only on MSSQL 2012+ - due to LAG fc). I will do some performance tests (I'm on 2016 version) and will see performance over my real data (approx. 30 millions of records).

If you are on 2012+ you can use LAG which will provide a better performing solution that a recursive CTE. I'm including your sample data so you can just copy/paste/test...
-- Your sample data
DECLARE #Data TABLE
(
Id int not null identity(1, 1) primary key,
Value int not null
);
insert into #Data ([Value])
select 1 [Value] union all select 5 union all select 3 union all select 12 union all
select 8 union all select 9 union all select 16 union all select 21;
-- Solution using window functions
WITH
prevRows AS
(
SELECT t1.Id, t1.Value, prevDiff = LAG(t1.Value, 1) OVER (ORDER BY t1.id) - t1.Value
FROM #Data t1
),
NewPrev AS
(
SELECT t1.Id, t1.Value, NewDiff = Value - LAG(t1.Value,1) OVER (ORDER BY t1.id)
FROM prevRows t1
WHERE prevDiff <= -5 OR prevDiff IS NULL
)
SELECT t1.Id, t1.Value
FROM NewPrev t1
WHERE NewDiff >= 5 OR NewDiff IS NULL;

I believe the best way to pull this off is using a recursive CTE. A Recursive CTE is a special type of CTE that refers back to itself. It's made up of two parts.
The recursive seed/anchor which establishes the beginning of the recursion. In your case, record with ID=1.
The recursive term/member which is the statement that refers back to itself by the name of the CTE. Here we pull through the next record that is greater than 5 from the previous found record according to the ID sorted ascending.
Code:
WITH RECURSIVE recCTE AS
(
/*Select first record for recursive seed/anchor*/
SELECT
id,
value,
cast(1 as INT) as [rank]
FROM table
WHERE id = 1
UNION ALL
/*find the next value that is more than 5 from the current value*/
SELECT
table.id,
table.value
ROW_NUMBER() OVER (ORDER BY id)
FROM
recCTE INNER JOIN table
ON table.value - recCTE.value > 5
AND table.id > recCTE.id
WHERE recCTE.[rank]=1
)
SELECT id, value FROM recCTE;
I've made use of the Row_Number() Window Function to find the rank of the matching record by ID sorted Ascending. With the WHERE clause in the recursive term we only grab the first found record that is 5 more than the previous found record. Then we head into the next recursive step.

You can do it with a recursive CTE
with find_values as
(
-- Find first value
SELECT Value
FROM #Table
ORDER BY ID ASC
FETCH FIRST 1 ROW ONLY
UNION ALL
-- Find next value
SELECT Value
FROM #Table
CROSS JOIN find_values
WHERE Value >= find_values.Value + 5
ORDER BY ID ASC
FETCH FIRST 1 ROW ONLY
)
SELECT *
FROM find_values

SQL Server: How do I get the highest value not set of an int column?

Let's take an example. These are the rows of the table I want get the data:
The column I'm talking about is the reference one. The user can set this value on the web form, but the system I'm developing must suggest the lowest reference value still not used.
As you can see, the smallest value of this column is 35. I could just take the smaller reference and sum 1, but, in that case, the value 36 is already used. So, the value I want is 37.
Is there a way to do this without a loop verification? This table will grow so much.

This is for 2012+
DECLARE #Tbl TABLE (id int, reference int)
INSERT INTO #Tbl
( id, reference )
VALUES
(1, 49),
(2, 125),
(3, 35),
(4, 1345),
(5, 36),
(6, 37)
SELECT
MIN(A.reference) + 1 Result
FROM
(
SELECT
*,
LEAD(reference) OVER (ORDER BY reference) Tmp
FROM
#Tbl
) A
WHERE
A.reference - A.Tmp != -1
Result: 37

Here is yet another place where the tally table is going to prove invaluable. In fact it is so useful I keep a view on my system that looks like this.
create View [dbo].[cteTally] as
WITH
E1(N) AS (select 1 from (values (1),(1),(1),(1),(1),(1),(1),(1),(1),(1))dt(n)),
E2(N) AS (SELECT 1 FROM E1 a cross join E1 b), --10E+2 or 100 rows
E4(N) AS (SELECT 1 FROM E2 a cross join E2 b), --10E+4 or 10,000 rows max
cteTally(N) AS
(
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4
)
select N from cteTally
Next of course we need some sample data and table to hold it.
create table #Something
(
id int identity
, reference int
, description varchar(10)
)
insert #Something (reference, description)
values (49, 'data1')
, (125, 'data2')
, (35, 'data3')
, (1345, 'data4')
, (36, 'data5')
, (7784, 'data6')
Now comes the magic of the tally table.
select top 1 t.N
from cteTally t
left join #Something s on t.N = s.reference
where t.N >= (select MIN(reference) from #Something)
and s.id is null
order by t.N

This is ugly, but should get the job done:
select
top 1 reference+1
from
[table]
where
reference+1 not in (select reference from [table])
order by reference

I used a table valued express to get the next value. I first left outer joined the table to itself (shifting the key in the join by +1). I then looked only at rows that had no corresponding match (b.ID is null). The minimum a.ReferenceID + 1 gives us the answer we are looking for.
create table MyTable
(
ID int identity,
Reference int,
Description varchar(20)
)
insert into MyTable values (10,'Data')
insert into MyTable values (11,'Data')
insert into MyTable values (12,'Data')
insert into MyTable values (15,'Data')
-- Find gap
;with Gaps as
(
select a.Reference+1 as 'GapID'
from MyTable a
left join MyTable b on a.Reference = b.Reference-1
where b.ID is null
)
select min(GapID) as 'NewReference'
from Gaps
NewReference
------------
13
I hope the code was clearer than my description.

CREATE TABLE #T(ID INT , REFERENCE INT, [DESCRIPTION] VARCHAR(50))
INSERT INTO #T
SELECT 1,49 , 'data1' UNION ALL
SELECT 2,125 , 'data2' UNION ALL
SELECT 3,35 , 'data3' UNION ALL
SELECT 4,1345, 'data4' UNION ALL
SELECT 5,36 , 'data5' UNION ALL
SELECT 6,7784, 'data6'
SELECT TOP 1 REFERENCE + 1
FROM #T T1
WHERE
NOT EXISTS
(
SELECT 1 FROM #T T2 WHERE T2.REFERENCE = T1.REFERENCE + 1
)
ORDER BY T1.REFERENCE
--- OR
SELECT MIN(REFERENCE) + 1
FROM #T T1
WHERE
NOT EXISTS
(
SELECT 1 FROM #T T2 WHERE T2.REFERENCE = T1.REFERENCE + 1
)

How about using a Tally table. The following illustrates the concept. It would be better to use a persisted numbers table as opposed to the cte however the code below illustrates the concept.
For further reading as to why you should use a persisted table, check out the following link: sql-auxiliary-table-of-numbers
DECLARE #START int = 1, #END int = 1000
CREATE TABLE #TEST(UsedValues INT)
INSERT INTO #TEST(UsedValues) VALUES
(1),(3),(5),(7),(9),(11),(13),(15),(17)
;With NumberSequence( Number ) as
(
Select #start as Number
union all
Select Number + 1
from NumberSequence
where Number < #end
)
SELECT MIN(Number)
FROM NumberSequence n
LEFT JOIN #TEST t
ON n.Number = t.UsedValues
WHERE UsedValues IS NULL
OPTION ( MAXRECURSION 1000 )

You could try using a descending order:
SELECT DISTINCT reference
FROM `Resultsados`
ORDER BY `reference` ASC;
As far as I know, there is no way to do this without a loop. To prevent multiple values from returning be sure to use DISTINCT.

Insert row for each integer between 0 and <value> without cursor

I have a source table with id and count.
id count
a 5
b 2
c 31
I need to populate a destination table with each integer up to the count for each id.
id value
a 1
a 2
a 3
a 4
a 5
b 1
b 2
c 1
c 2
etc...
My current solution is like so:
INSERT INTO destination (id,value)
source.id
sequence.number
FROM
(VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9)) AS sequence(number)
INNER JOIN
source ON sequence.number <= source.count
This solution has an upper limit and is plain lame. Is there anyway to replace the sequence with a set of all integers? Or another solution that does not use looping.

this should work:
WITH r AS (
SELECT id, count, 1 AS n FROM SourceTable
UNION ALL
SELECT id, count, n+1 FROM r WHERE n<count
)
SELECT id,n FROM r
order by id,n
OPTION (MAXRECURSION 0)

Unfortunately, there is not set of all integers in SQL Server. However, using a little trickery, you can easily generate such a set:
select N from (
select ROW_NUMBER() OVER (ORDER BY t1.object_id) AS N
from sys.all_objects t1, sys.all_objects t2
) AS numbers
where N between 1 and 1000000
will generate a set of all numbers from 1 through 1000000. If you need more than a few million numbers, add sys.all_objects to the cross join a third time.

You can find many examples in this page:
DECLARE #table TABLE (ID VARCHAR(1), counter INT)
INSERT INTO #table SELECT 'a', 5
INSERT INTO #table SELECT 'b', 3
INSERT INTO #table SELECT 'c', 31
;WITH cte (ID, counter) AS (
SELECT id, 1
FROM #table
UNION ALL
SELECT c.id, c.counter +1
FROM cte AS c
INNER JOIN #table AS t
ON t.id = c.id
WHERE c.counter + 1 <= t.counter
)
SELECT *
FROM cte
ORDER BY ID, Counter

generate fixed number of rows in a table

Not able to word the question properly, so couldn't search what I want. All I need is a dummy table with a single column of say guids, which I use it for some other purposes. Without actually writing same insert .. newID() n times, wondering if there is an elegant solution.
Similar question would be how do I populate a blank table with a int column with say 1-n numbers.
Row1: 1
Row2: 2
.......
Row100:100

Instead of a recursive CTE, I recommend a set-based approach from any object you know already has more than 100 rows.
--INSERT dbo.newtable(ID, GUID)
SELECT TOP (100) ROW_NUMBER() OVER (ORDER BY [object_id]), NEWID()
FROM sys.all_columns ORDER BY [object_id];
For plenty of other ideas, see this series:
http://www.sqlperformance.com/generate-a-set-1
http://www.sqlperformance.com/generate-a-set-2
http://www.sqlperformance.com/generate-a-set-3

You can do it recursively.
For numbers, f.ex.:
WITH r AS (
SELECT 1 AS n
UNION ALL
SELECT n+1 FROM r WHERE n+1<=100
)
SELECT * FROM r

This method is blisteringly fast. If you need to generate a numbers table from nothing, it's probably the "best" means available.
WITH
t0(i) AS (SELECT 0 UNION ALL SELECT 0), -- 2 rows
t1(i) AS (SELECT 0 FROM t0 a, t0 b), -- 4 rows
t2(i) AS (SELECT 0 FROM t1 a, t1 b), -- 16 rows
t3(i) AS (SELECT 0 FROM t2 a, t2 b), -- 256 rows
--t4(i) AS (SELECT 0 FROM t3 a, t3 b), -- 65,536 rows
--t5(i) AS (SELECT 0 FROM t4 a, t4 b), -- 4,294,967,296 rows
n(i) AS (SELECT ROW_NUMBER() OVER(ORDER BY (SELECT 0)) FROM t3)
SELECT i FROM n WHERE i BETWEEN 1 AND 100
Regarding performance:
Using SQL Server 2022, on a Xeon box from 2016, with SET STATISTICS TIME ON to measure query time I got these numbers:
(With t4 and t5 commented-out), it generates 256 rows in "0ms".
(With t4 uncommented) it generates 65,536 rows in 53ms.
(With t5 uncommented in an INSERT FROM) it generated and inserted 4bn rows to a TABLE on-disk in about 65 minutes.
That's 66 million rows per minute, or about a million rows per second, nice!
Explanation:
The first CTE, t0 generates 2 rows.
Each subsequent CTE performs a CROSS JOIN of the previous CTE; a CROSS JOIN is a Cartesian Product which effectively squares the number of rows in each CTE step.
So having t0 through t3 means performing the Cartesian product three times, thus generating rows.
SELECT 0 FROM t0 a, t0 b is the same thing as SELECT 0 FROM t0 AS a CROSS JOIN t0 AS b.
Note that the results start at 1 and not 0 because ROW_NUMBER() starts at 1. To start at 0 do SELECT ( i - 1 ) FROM n in the outermost query.

One way;
;with guids( i, guid ) as
(
select 1 as i, newid()
union all
select i + 1, newid()
from guids
where i < 100
)
select guid from guids option (maxrecursion 100)

Just adding this as it wasn't listed:
A quick way to get 10 rows:
SELECT ROW_NUMBER() OVER(
ORDER BY N1.N)
, LOWER(NEWID())
FROM (VALUES(1), (1), (1), (1), (1), (1), (1), (1), (1), (1)) AS N1(N) -- 10
If you want it to be based on a variable:
DECLARE #N int = 10;
WITH Numbers(number)
AS (SELECT ROW_NUMBER() OVER(
ORDER BY N1.N)
FROM (VALUES(1), (1), (1), (1), (1), (1), (1), (1), (1), (1)) AS N1(N) -- 10
CROSS JOIN(VALUES(1), (1), (1), (1), (1), (1), (1), (1), (1), (1)) AS N2(N)-- 100
--CROSS JOIN (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) AS N3 (N) -- 1,000
--CROSS JOIN (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) AS N4 (N) -- 10,000
--CROSS JOIN (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) AS N5 (N) -- 100,000
-- Etc....
)
SELECT *
, LOWER(NEWID())
FROM Numbers
WHERE number <= #N;