generate fixed number of rows in a table - sql-server

Not able to word the question properly, so couldn't search what I want. All I need is a dummy table with a single column of say guids, which I use it for some other purposes. Without actually writing same insert .. newID() n times, wondering if there is an elegant solution.
Similar question would be how do I populate a blank table with a int column with say 1-n numbers.
Row1: 1
Row2: 2
.......
Row100:100

Instead of a recursive CTE, I recommend a set-based approach from any object you know already has more than 100 rows.
--INSERT dbo.newtable(ID, GUID)
SELECT TOP (100) ROW_NUMBER() OVER (ORDER BY [object_id]), NEWID()
FROM sys.all_columns ORDER BY [object_id];
For plenty of other ideas, see this series:
http://www.sqlperformance.com/generate-a-set-1
http://www.sqlperformance.com/generate-a-set-2
http://www.sqlperformance.com/generate-a-set-3

You can do it recursively.
For numbers, f.ex.:
WITH r AS (
SELECT 1 AS n
UNION ALL
SELECT n+1 FROM r WHERE n+1<=100
)
SELECT * FROM r

This method is blisteringly fast. If you need to generate a numbers table from nothing, it's probably the "best" means available.
WITH
t0(i) AS (SELECT 0 UNION ALL SELECT 0), -- 2 rows
t1(i) AS (SELECT 0 FROM t0 a, t0 b), -- 4 rows
t2(i) AS (SELECT 0 FROM t1 a, t1 b), -- 16 rows
t3(i) AS (SELECT 0 FROM t2 a, t2 b), -- 256 rows
--t4(i) AS (SELECT 0 FROM t3 a, t3 b), -- 65,536 rows
--t5(i) AS (SELECT 0 FROM t4 a, t4 b), -- 4,294,967,296 rows
n(i) AS (SELECT ROW_NUMBER() OVER(ORDER BY (SELECT 0)) FROM t3)
SELECT i FROM n WHERE i BETWEEN 1 AND 100
Regarding performance:
Using SQL Server 2022, on a Xeon box from 2016, with SET STATISTICS TIME ON to measure query time I got these numbers:
(With t4 and t5 commented-out), it generates 256 rows in "0ms".
(With t4 uncommented) it generates 65,536 rows in 53ms.
(With t5 uncommented in an INSERT FROM) it generated and inserted 4bn rows to a TABLE on-disk in about 65 minutes.
That's 66 million rows per minute, or about a million rows per second, nice!
Explanation:
The first CTE, t0 generates 2 rows.
Each subsequent CTE performs a CROSS JOIN of the previous CTE; a CROSS JOIN is a Cartesian Product which effectively squares the number of rows in each CTE step.
So having t0 through t3 means performing the Cartesian product three times, thus generating rows.
SELECT 0 FROM t0 a, t0 b is the same thing as SELECT 0 FROM t0 AS a CROSS JOIN t0 AS b.
Note that the results start at 1 and not 0 because ROW_NUMBER() starts at 1. To start at 0 do SELECT ( i - 1 ) FROM n in the outermost query.

One way;
;with guids( i, guid ) as
(
select 1 as i, newid()
union all
select i + 1, newid()
from guids
where i < 100
)
select guid from guids option (maxrecursion 100)

Just adding this as it wasn't listed:
A quick way to get 10 rows:
SELECT ROW_NUMBER() OVER(
ORDER BY N1.N)
, LOWER(NEWID())
FROM (VALUES(1), (1), (1), (1), (1), (1), (1), (1), (1), (1)) AS N1(N) -- 10
If you want it to be based on a variable:
DECLARE #N int = 10;
WITH Numbers(number)
AS (SELECT ROW_NUMBER() OVER(
ORDER BY N1.N)
FROM (VALUES(1), (1), (1), (1), (1), (1), (1), (1), (1), (1)) AS N1(N) -- 10
CROSS JOIN(VALUES(1), (1), (1), (1), (1), (1), (1), (1), (1), (1)) AS N2(N)-- 100
--CROSS JOIN (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) AS N3 (N) -- 1,000
--CROSS JOIN (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) AS N4 (N) -- 10,000
--CROSS JOIN (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) AS N5 (N) -- 100,000
-- Etc....
)
SELECT *
, LOWER(NEWID())
FROM Numbers
WHERE number <= #N;

Related

SQL Server: how to find maximum length continuous range?

I have a table containing one integer column (signed) NUM.
In each row, this table contains a random number. Each number is found in the table an arbitrary number of times.
I need to find the maximum length of a continuous (without missing numbers) range,
present in the table, missed is considered.
Number in the range of min(NUM) max(NUM) (where min and max functions of SQL)
This sounds like a typical gaps-and-islands problem:
SELECT TOP 1 MIN(num) num_from, MAX(num) num_upto, COUNT(DISTINCT num) num_count
FROM (
SELECT num, SUM(num_changed) OVER (ORDER BY num) num_groupno
FROM (
SELECT num, CASE WHEN LAG(num) OVER (ORDER BY num) BETWEEN num - 1 AND num THEN 0 ELSE 1 END num_changed
FROM (VALUES
(1),
(2),
(3),
(5),
(6),
(7),
(7),
(8),
(10)
) v(num)
) cte1
) cte2
GROUP BY num_groupno
ORDER BY COUNT(DISTINCT num) DESC
Result:
num_from num_upto num_count
5 8 4
--make test data
select 1 as val into #test;
insert #test (val)
values (1),(1),(2),(3),(4),(4),(5),(7),(8),(9),(10),(11),(12),(13);
select * from #test;
--With command to find start and end of 'ranges'
--then join start of range to its corresponding end, with length
--then list the longest ranges (with ties)
;WITH LB AS (SELECT t1.val from #test t1 LEFT JOIN #test t2 on t1.val - 1 = t2.val WHERE t2.val is null),
UB AS (SELECT t1.val from #test t1 LEFT JOIN #test t2 on t1.val + 1 = t2.val WHERE t2.val is null),
Ranges AS (SELECT DISTINCT LB.val s, Q.val e,q.val-lb.val + 1 cnt FROM LB
CROSS APPLY
(SELECT TOP 1 val FROM UB WHERE UB.val >= LB.val ORDER BY UB.val) Q)
SELECT TOP 1 with ties * FROM Ranges order by cnt DESC
drop table #test;

How do i get the next value and previous in a column

Suppose i have an ID column that has the values 1 , 5 , 7 .What SQL statement can i use to get the next value of in the column based on another.
Example : The next value after 1 is 5.
Example 2 : the value before 7 is 5
Without any window functions or CTEs:
select
t.id,
(select max(t1.id) from tbl t1 where t1.id < t.id) as previd,
(select min(t2.id) from tbl t2 where t2.id > t.id) as nextid
from tbl t
There are some great Windowed functions for working across records in later versions of SQL Server (2012+). But in 2008 these aren't available. Instead you could use an OUTER APPLY. This will allow you to filter a sub query using values from your main query.
Outer Apply Query Example
WITH SampleValues AS
(
/* This CTE creates some sample values.
*/
SELECT
r.n
FROM
(
VALUES
(1),
(5),
(7)
) AS r(n)
)
SELECT
o.n,
prev.Previous_n,
[next].Next_n
FROM
SampleValues AS o
OUTER APPLY
(
-- Find the previou value by looking for the TOP 1 before the current.
SELECT TOP 1
o.n AS Original_n,
p.n AS Previous_n
FROM
SampleValues AS p
WHERE
p.n < o.n
ORDER BY
p.n DESC
) AS prev
OUTER APPLY
(
-- Find the next value by looking for the TOP 1 after the current.
SELECT TOP 1
n.n AS Original_n,
n.n AS Next_n
FROM
SampleValues AS n
WHERE
n.n > o.n
ORDER BY
n.n ASC
)AS [next]
;

SELECT NOT IN goes extremely slow while separate SELECTS do not take long

I am using a function to return all the dates between a startdate and an enddate. The function works fine and fast (around 300 records returned).
SELECT thedate FROM dbo.ExplodeDates('20141216','20151011')
In another table, I am checking if my reports are received so I get distinct values of my report date. This query also takes less than a second to complete (around 200 records returned).
SELECT DISTINCT(reportdate) FROM dbo.MyReportTable
But when I use these two like the following, the query becomes unresponsive:
SELECT thedate FROM dbo.ExplodeDates('20141216','20151011')
WHERE thedate NOT IN
(SELECT DISTINCT(reportdate) FROM dbo.MyReportTable)
Here is the code to ExplodeDates function:
CREATE FUNCTION [dbo].[ExplodeDates](#startdate datetime, #enddate datetime)
returns table as
return (
with
N0 as (SELECT 1 as n UNION ALL SELECT 1)
,N1 as (SELECT 1 as n FROM N0 t1, N0 t2)
,N2 as (SELECT 1 as n FROM N1 t1, N1 t2)
,N3 as (SELECT 1 as n FROM N2 t1, N2 t2)
,N4 as (SELECT 1 as n FROM N3 t1, N3 t2)
,N5 as (SELECT 1 as n FROM N4 t1, N4 t2)
,N6 as (SELECT 1 as n FROM N5 t1, N5 t2)
,nums as (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 1)) as num FROM N6)
SELECT DATEADD(day,num-1,#startdate) as thedate
FROM nums
WHERE num <= DATEDIFF(day,#startdate,#enddate) + 1
);
GO
Sometimes you can use temp table and it helps with performence. For example:
IF object_id('tempdb..#TMP') IS NOT NULL BEGIN DROP TABLE #TMP END
GO
SELECT thedate INTO #TMP FROM dbo.ExplodeDates('20141216','20151011')
SELECT thedate from #TMP
WHERE thedate NOT IN
(SELECT DISTINCT(reportdate) FROM dbo.MyReportTable)
You can try with #TMP2 for MyReportTable if it will help with performence.
Remember to remove this table after all using:
IF object_id('tempdb..#TMP') IS NOT NULL BEGIN DROP TABLE #TMP END
GO
Just an educated guess without the details like difference in query plans, but instead of the function you could just create a date table, with one row per day. That most likely works a lot better than a function with a dynamic tally table that has to calculate huge amount of dateadds every time
Also you might want to test fetching the dates into a temp. table and using that in the SQL, and most likely not exists works better than not in.

Return Regex Matches from a sql query

I have a table that stores html templates which contain markup with placeholders in key locations, something like this ...
<div>
<div>{FirstName}</div>
<div>{LastName}</div>
</div>
I want to write a query that returns from the table all of the placeholders used from all rows.
SELECT Template
FROM MyTable
WHERE ????
So for the above example the result I want is ...
{FirstName}
{LastName}
I have seen people using regex in SQL but can't figure out how to only return the matches and not the whole column value.
It's also worth noting that I want a result per match ideally but if I got a comma separated list per row that matched or something that would do.
I would approach this using a numbers table, which are very useful anyway, so if you don't have one, I would consider creating one, but for the sake of a complete answer I will assume you don't have one and can't create one. In such scenarios you can generate a list of numbers on the fly quite easily using:
WITH N1 AS (SELECT N FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) n (N)),
N2 (N) AS (SELECT 1 FROM N1 AS N1 CROSS JOIN N1 AS N2),
N3 (N) AS (SELECT 1 FROM N2 AS N1 CROSS JOIN N2 AS N2),
--N4 (N) AS (SELECT 1 FROM N3 AS N1 CROSS JOIN N3 AS N2)
Numbers (Number) AS (SELECT ROW_NUMBER() OVER(ORDER BY N) FROM N3)
SELECT Number
FROM Numbers;
This starts with a table of 10 rows created with a table value constructor (N1), it then joins this table with itself to get a table of 100 rows (N2), then joins N2 to itself to get 10,000 rows (N3), this can be repeated as required, before finally using ROW_NUMBER() to get a sequential number in each row. Aaron Bertrand has done a pretty comprehensive series on generating a set or sequence without loops, and this method comes out on top (as a method of creating the table on the fly).
Once you have this numbers table you can join it to your template to find the position of each "{" using SUBSTRING:
SELECT t.Template,
StartPosition = n.Number
FROM dbo.T
INNER JOIN Numbers n
ON SUBSTRING(t.Template, n.Number, 1) = '{';
With your example this will return 16, and 43. Then you can use CHARINDEX to find the "}" that follows each "{":
SELECT t.Template,
StartPosition = n.Number,
EndPosition = CHARINDEX('}', t.template, n.Number) + 1
FROM dbo.T
INNER JOIN Numbers n
ON SUBSTRING(t.Template, n.Number, 1) = '{';
Then you can use SUBSTRING again to extract the term between each start and end position. So a full working example would be:
DECLARE #T TABLE (Template NVARCHAR(MAX));
INSERT #T (Template)
VALUES ('<div>
<div>{FirstName}</div>
<div>{LastName}</div>
</div>');
WITH N1 AS (SELECT N FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) n (N)),
N2 (N) AS (SELECT 1 FROM N1 AS N1 CROSS JOIN N1 AS N2),
N3 (N) AS (SELECT 1 FROM N2 AS N1 CROSS JOIN N2 AS N2),
--N4 (N) AS (SELECT 1 FROM N3 AS N1 CROSS JOIN N3 AS N2)
Numbers (Number) AS (SELECT ROW_NUMBER() OVER(ORDER BY N) FROM N3)
SELECT t.Template,
StartPosition = n.Number,
EndPosition = CHARINDEX('}', t.template, n.Number) + 1,
Term = SUBSTRING(t.template, n.Number, CHARINDEX('}', t.template, n.Number) + 1 - n.Number)
FROM #T t
INNER JOIN Numbers n
ON SUBSTRING(t.Template, n.Number, 1) = '{';
See this:
CREATE TABLE #temp(id int identity(1,1), template nvarchar(max))
INSERT INTO #temp(template)
SELECT REPLICATE(N'<div>
<div>{FirstName}</div>
<div>{LastName}</div>
</div>',1000)
;WITH cte AS(
SELECT id,
SUBSTRING(template,CHARINDEX(N'{',template),CHARINDEX(N'}',template)-CHARINDEX(N'{',template)+1) as match,
SUBSTRING(template,CHARINDEX(N'}',template)+1,LEN(template)) as templateRest
FROM #temp
UNION ALL
SELECT id,
SUBSTRING(templateRest,CHARINDEX(N'{',templateRest),CHARINDEX(N'}',templateRest)-CHARINDEX(N'{',templateRest)+1) as match,
SUBSTRING(templateRest,CHARINDEX(N'}',templateRest)+1,LEN(templateRest)) as templateRest
FROM cte
WHERE templateRest LIKE N'%}%'
)
SELECT t.id, t.template, c.match
-- Only distinctive:
-- SELECT DISTINCT t.id, t.template c.match
FROM cte AS c
INNER JOIN #temp AS t
ON c.id = t.id
OPTION(MAXRECURSION 1000) -- if needed, this value could still be raised
DROP TABLE #temp
GO
You can filter it for the template and retrieve all matches.

Insert row for each integer between 0 and <value> without cursor

I have a source table with id and count.
id count
a 5
b 2
c 31
I need to populate a destination table with each integer up to the count for each id.
id value
a 1
a 2
a 3
a 4
a 5
b 1
b 2
c 1
c 2
etc...
My current solution is like so:
INSERT INTO destination (id,value)
source.id
sequence.number
FROM
(VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9)) AS sequence(number)
INNER JOIN
source ON sequence.number <= source.count
This solution has an upper limit and is plain lame. Is there anyway to replace the sequence with a set of all integers? Or another solution that does not use looping.
this should work:
WITH r AS (
SELECT id, count, 1 AS n FROM SourceTable
UNION ALL
SELECT id, count, n+1 FROM r WHERE n<count
)
SELECT id,n FROM r
order by id,n
OPTION (MAXRECURSION 0)
Unfortunately, there is not set of all integers in SQL Server. However, using a little trickery, you can easily generate such a set:
select N from (
select ROW_NUMBER() OVER (ORDER BY t1.object_id) AS N
from sys.all_objects t1, sys.all_objects t2
) AS numbers
where N between 1 and 1000000
will generate a set of all numbers from 1 through 1000000. If you need more than a few million numbers, add sys.all_objects to the cross join a third time.
You can find many examples in this page:
DECLARE #table TABLE (ID VARCHAR(1), counter INT)
INSERT INTO #table SELECT 'a', 5
INSERT INTO #table SELECT 'b', 3
INSERT INTO #table SELECT 'c', 31
;WITH cte (ID, counter) AS (
SELECT id, 1
FROM #table
UNION ALL
SELECT c.id, c.counter +1
FROM cte AS c
INNER JOIN #table AS t
ON t.id = c.id
WHERE c.counter + 1 <= t.counter
)
SELECT *
FROM cte
ORDER BY ID, Counter

Resources