Related
CREATE TABLE #A (UpperLimit NUMERIC(4))
CREATE TABLE #B (Id NUMERIC(4), Amount NUMERIC(4))
INSERT INTO #A VALUES
(1000), (2000), (3000)
INSERT INTO #B VALUES
(1, 3100),
(2, 1900),
(3, 1800),
(4, 1700),
(5, 900),
(6, 800)
Given these 2 tables, I want to join Table A to B ON B.Amount < A.UpperLimit but each record from Table B can only be used once, so the desired output would be:
I could easily do this by plopping Table B's records into a temp table, cursor over table A taking top record < UpperLimit and Deleting that record from the temp table or some other programmatic solution, but I'd like to avoid that and I'm pretty sure this could be done with a "normal" (recursive CTE? Partition?) query.
You could achieve your desired output using below recursive CTE
WITH
DATA AS
(
SELECT * FROM #A A1 INNER JOIN #B B1 ON A1.UpperLimit >= B1.Amount
),
MA AS
(
SELECT MIN(UpperLimit) AS MinLimit, MAX(UpperLimit) AS MaxLimit FROM #A
),
RESULT AS
(
-- Get the first record corresponding with maximum upper limit
SELECT *
FROM DATA D1
WHERE NOT EXISTS
(SELECT 1
FROM DATA D2
WHERE D2.UpperLimit = D1.UpperLimit AND D2.Amount > D1.Amount)
AND D1.UpperLimit = (SELECT MaxLimit FROM MA)
-- Recursive get remain record corresponding with other upper limit
UNION ALL
SELECT D1.*
FROM RESULT R1 INNER JOIN DATA D1
ON (R1.UpperLimit > D1.UpperLimit AND R1.Id != D1.Id)
WHERE D1.UpperLimit >= (SELECT MinLimit FROM MA)
AND NOT EXISTS
(SELECT 1
FROM DATA D2
WHERE D2.UpperLimit = D1.UpperLimit AND D2.Amount > D1.Amount AND D2.Id != R1.Id)
)
SELECT DISTINCT * FROM RESULT ORDER BY UpperLimit DESC;
Demo: https://dbfiddle.uk/Y-m0K6Mk
Might be a bit lengthy but hopefully clear enough.
with a as
(select -- order and number rows in table A in some way
row_number() over (order by UpperLimit) as RnA,
*
from #a),
b as
(select -- order and number rows in table B in the same way
row_number() over (order by Amount) as RnB,
*
from #b),
m as
(select -- get and number all possible pairs of values from both tables considering the restriction
row_number() over (order by a.UpperLimit desc, b.Amount desc) as RnM,
*
from a
join b on
b.Amount < a.UpperLimit),
r as
(select -- use recursion to get all possible combinations of the value pairs with metrics of interest for comparison
convert(varchar(max), RnA) as ListA,
convert(varchar(max), RnB) as ListB,
RnA,
RnB,
1 as CountB,
convert(int, Amount) as SumB
from m
where RnM = 1
union all
select
r.ListA + ' ' + convert(varchar(max), m.RnA),
r.ListB + ' ' + convert(varchar(max), m.RnB),
m.RnA,
m.RnB,
r.CountB + 1,
r.SumB + convert(int, m.Amount)
from m
join r on
m.RnA < r.RnA and
m.RnB < r.RnB),
e as
(select top(1) -- select combinations of interest using metrics
ListA,
ListB
from r
order by CountB desc, SumB desc),
ea as
(select -- turn id list into table for table A
ea.Rn,
ea.Value
from e
cross apply(select row_number() over (order by (select null)) as Rn, Value from string_split(e.ListA, ' ')) as ea),
eb as
(select -- turn id list into table for table B
eb.Rn,
eb.Value
from e
cross apply(select row_number() over (order by (select null)) as Rn, Value from string_split(e.ListB, ' ')) as eb)
select -- get output table with actual values from the original tables
a.UpperLimit,
b.Amount,
b.Id
from ea
join eb on
ea.Rn = eb.Rn
join a on
ea.Value = a.RnA
join b on
eb.Value = b.RnB;
You can use an APPLY with a TOP 1 for this. Each row in the outer table gets only one row from the APPLY.
SELECT
*
FROM #A a
OUTER APPLY (
SELECT TOP (1) *
FROM #B b
WHERE b.Amount < a.UpperLimit
) b;
To simulate an inner-join (rather than a left-join) use CROSS APPLY.
This query returns very close to desired outcome.
WITH CTE AS (SELECT B.*,
ROW_NUMBER() OVER (PARTITION BY B.Value ORDER BY B.Value DESC) AS RowNum
FROM #B B),
cc as (SELECT A.Limit, CTE.*
FROM #A A
LEFT JOIN CTE ON CTE.Value < A.Limit AND CTE.RowNum = 1),
cc2 as (select *, MAX(Value) OVER ( PARTITION BY cc.Limit) as l1 from cc)
select Limit, ID, Value
from cc2
where Value = l1
This query use 3 Common Table Expressions. First sort Table B with ROW_NUMBER() function and PARTITION BY clause, second one JOIN Table A with Table B with the condition given and the third one filters the record that is in Limit on Table A and use the Limit only once.
I am using a function to return all the dates between a startdate and an enddate. The function works fine and fast (around 300 records returned).
SELECT thedate FROM dbo.ExplodeDates('20141216','20151011')
In another table, I am checking if my reports are received so I get distinct values of my report date. This query also takes less than a second to complete (around 200 records returned).
SELECT DISTINCT(reportdate) FROM dbo.MyReportTable
But when I use these two like the following, the query becomes unresponsive:
SELECT thedate FROM dbo.ExplodeDates('20141216','20151011')
WHERE thedate NOT IN
(SELECT DISTINCT(reportdate) FROM dbo.MyReportTable)
Here is the code to ExplodeDates function:
CREATE FUNCTION [dbo].[ExplodeDates](#startdate datetime, #enddate datetime)
returns table as
return (
with
N0 as (SELECT 1 as n UNION ALL SELECT 1)
,N1 as (SELECT 1 as n FROM N0 t1, N0 t2)
,N2 as (SELECT 1 as n FROM N1 t1, N1 t2)
,N3 as (SELECT 1 as n FROM N2 t1, N2 t2)
,N4 as (SELECT 1 as n FROM N3 t1, N3 t2)
,N5 as (SELECT 1 as n FROM N4 t1, N4 t2)
,N6 as (SELECT 1 as n FROM N5 t1, N5 t2)
,nums as (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 1)) as num FROM N6)
SELECT DATEADD(day,num-1,#startdate) as thedate
FROM nums
WHERE num <= DATEDIFF(day,#startdate,#enddate) + 1
);
GO
Sometimes you can use temp table and it helps with performence. For example:
IF object_id('tempdb..#TMP') IS NOT NULL BEGIN DROP TABLE #TMP END
GO
SELECT thedate INTO #TMP FROM dbo.ExplodeDates('20141216','20151011')
SELECT thedate from #TMP
WHERE thedate NOT IN
(SELECT DISTINCT(reportdate) FROM dbo.MyReportTable)
You can try with #TMP2 for MyReportTable if it will help with performence.
Remember to remove this table after all using:
IF object_id('tempdb..#TMP') IS NOT NULL BEGIN DROP TABLE #TMP END
GO
Just an educated guess without the details like difference in query plans, but instead of the function you could just create a date table, with one row per day. That most likely works a lot better than a function with a dynamic tally table that has to calculate huge amount of dateadds every time
Also you might want to test fetching the dates into a temp. table and using that in the SQL, and most likely not exists works better than not in.
I preapare string from row of nubmers. When I use the row_number function, the order by clause not working
DECLARE #text VARCHAR(MAX)
IF OBJECT_ID('tempdb..#numbers') IS NOT NULL DROP TABLE #numbers
SELECT CAST(ROW_NUMBER() OVER (ORDER BY name) AS INT) AS number INTO #numbers FROM master..spt_values
SET #text = ''
;WITH
numbers (number)
AS
(
SELECT CAST(ROW_NUMBER() OVER (ORDER BY name) AS INT) AS number FROM master..spt_values
),
a
AS
(
SELECT number FROM numbers WHERE number < 10
),
b
AS
(
SELECT number FROM numbers WHERE number < 10
)
SELECT #text = #text + LTRIM(STR(a.number*b.number))
FROM a
CROSS JOIN b
ORDER BY a.number, b.number DESC
SELECT #text
result "9"
SET #text = ''
;WITH
numbers (number)
AS
(
SELECT number FROM #numbers
),
a
AS
(
SELECT number FROM numbers WHERE number < 10
),
b
AS
(
SELECT number FROM numbers WHERE number < 10
)
SELECT #text = #text + LTRIM(STR(a.number*b.number))
FROM a
CROSS JOIN b
ORDER BY a.number, b.number DESC
SELECT #text
result "9876543211816141210864227242118151296336322824201612844540353025201510554484236302418126635649423528211477264564840322416881726354453627189"
Where is diference ?
I expect this is related to this issue, in summary when you use variable concatenation, e.g.
SELECT #Variable = #Variable + someField
FROM Table
ORDER BY AnotherField;
The results are dependant on physical implementation and internal access paths. I am currently struggling to find benchmark tests on the internet, but I think the fastest, reliable approach in SQL Server is to use XML extensions to concatenate rows to columns:
WITH Numbers AS (SELECT * FROM (VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9)) t (Number))
SELECT [Text] = (SELECT LTRIM(STR(a.number*b.number))
FROM Numbers AS A
CROSS JOIN Numbers AS B
ORDER BY A.Number, b.Number DESC
FOR XML PATH(''), TYPE).value('.', 'VARCHAR(MAX)');
N.B. I have also removed the reference to master..spt_values and replaced with a table value constructor - this just adds unnecessary reads to generate a sequence from 1 to 9.
If you need more numbers for your sequence I would still not use system tables, use Iztik Ben-Gan's stacked CTE approach, as described in this article:
DECLARE #Numbers INT = 100000;
WITH N1 AS (SELECT N FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) t (N)),
N2 (N) AS (SELECT 1 FROM N1 AS N1 CROSS JOIN N1 AS N2),
N3 (N) AS (SELECT 1 FROM N2 AS N1 CROSS JOIN N2 AS N2),
N4 (N) AS (SELECT 1 FROM N3 AS N1 CROSS JOIN N3 AS N2),
Numbers (Number) AS (SELECT TOP (#Numbers) ROW_NUMBER() OVER(ORDER BY N) FROM N4)
SELECT Number
FROM Numbers;
Do not use cast on ROW_NUMBER(). This will return same as your second query:
DECLARE #text VARCHAR(MAX) = ''
;WITH
numbers (number)
AS
(
SELECT ROW_NUMBER() OVER (ORDER BY name) AS number FROM master..spt_values
),
a
AS
(
SELECT number FROM numbers WHERE number < 10
),
b
AS
(
SELECT number FROM numbers WHERE number < 10
)
SELECT #text = #text + LTRIM(STR(a.number*b.number))
FROM a
CROSS JOIN b
ORDER BY a.number, b.number DESC
Also don't define twice the same in CTE use aliases instead:
DECLARE #text VARCHAR(MAX) = ''
;WITH
numbers (number)
AS
(
SELECT ROW_NUMBER() OVER (ORDER BY name) AS number FROM master..spt_values
),
a
AS
(
SELECT number FROM numbers WHERE number < 10
)
SELECT #text = #text + LTRIM(STR(a.number*b.number))
FROM a AS a
CROSS JOIN a AS b
ORDER BY a.number, b.number DESC
SELECT #text
I have a table that stores html templates which contain markup with placeholders in key locations, something like this ...
<div>
<div>{FirstName}</div>
<div>{LastName}</div>
</div>
I want to write a query that returns from the table all of the placeholders used from all rows.
SELECT Template
FROM MyTable
WHERE ????
So for the above example the result I want is ...
{FirstName}
{LastName}
I have seen people using regex in SQL but can't figure out how to only return the matches and not the whole column value.
It's also worth noting that I want a result per match ideally but if I got a comma separated list per row that matched or something that would do.
I would approach this using a numbers table, which are very useful anyway, so if you don't have one, I would consider creating one, but for the sake of a complete answer I will assume you don't have one and can't create one. In such scenarios you can generate a list of numbers on the fly quite easily using:
WITH N1 AS (SELECT N FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) n (N)),
N2 (N) AS (SELECT 1 FROM N1 AS N1 CROSS JOIN N1 AS N2),
N3 (N) AS (SELECT 1 FROM N2 AS N1 CROSS JOIN N2 AS N2),
--N4 (N) AS (SELECT 1 FROM N3 AS N1 CROSS JOIN N3 AS N2)
Numbers (Number) AS (SELECT ROW_NUMBER() OVER(ORDER BY N) FROM N3)
SELECT Number
FROM Numbers;
This starts with a table of 10 rows created with a table value constructor (N1), it then joins this table with itself to get a table of 100 rows (N2), then joins N2 to itself to get 10,000 rows (N3), this can be repeated as required, before finally using ROW_NUMBER() to get a sequential number in each row. Aaron Bertrand has done a pretty comprehensive series on generating a set or sequence without loops, and this method comes out on top (as a method of creating the table on the fly).
Once you have this numbers table you can join it to your template to find the position of each "{" using SUBSTRING:
SELECT t.Template,
StartPosition = n.Number
FROM dbo.T
INNER JOIN Numbers n
ON SUBSTRING(t.Template, n.Number, 1) = '{';
With your example this will return 16, and 43. Then you can use CHARINDEX to find the "}" that follows each "{":
SELECT t.Template,
StartPosition = n.Number,
EndPosition = CHARINDEX('}', t.template, n.Number) + 1
FROM dbo.T
INNER JOIN Numbers n
ON SUBSTRING(t.Template, n.Number, 1) = '{';
Then you can use SUBSTRING again to extract the term between each start and end position. So a full working example would be:
DECLARE #T TABLE (Template NVARCHAR(MAX));
INSERT #T (Template)
VALUES ('<div>
<div>{FirstName}</div>
<div>{LastName}</div>
</div>');
WITH N1 AS (SELECT N FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) n (N)),
N2 (N) AS (SELECT 1 FROM N1 AS N1 CROSS JOIN N1 AS N2),
N3 (N) AS (SELECT 1 FROM N2 AS N1 CROSS JOIN N2 AS N2),
--N4 (N) AS (SELECT 1 FROM N3 AS N1 CROSS JOIN N3 AS N2)
Numbers (Number) AS (SELECT ROW_NUMBER() OVER(ORDER BY N) FROM N3)
SELECT t.Template,
StartPosition = n.Number,
EndPosition = CHARINDEX('}', t.template, n.Number) + 1,
Term = SUBSTRING(t.template, n.Number, CHARINDEX('}', t.template, n.Number) + 1 - n.Number)
FROM #T t
INNER JOIN Numbers n
ON SUBSTRING(t.Template, n.Number, 1) = '{';
See this:
CREATE TABLE #temp(id int identity(1,1), template nvarchar(max))
INSERT INTO #temp(template)
SELECT REPLICATE(N'<div>
<div>{FirstName}</div>
<div>{LastName}</div>
</div>',1000)
;WITH cte AS(
SELECT id,
SUBSTRING(template,CHARINDEX(N'{',template),CHARINDEX(N'}',template)-CHARINDEX(N'{',template)+1) as match,
SUBSTRING(template,CHARINDEX(N'}',template)+1,LEN(template)) as templateRest
FROM #temp
UNION ALL
SELECT id,
SUBSTRING(templateRest,CHARINDEX(N'{',templateRest),CHARINDEX(N'}',templateRest)-CHARINDEX(N'{',templateRest)+1) as match,
SUBSTRING(templateRest,CHARINDEX(N'}',templateRest)+1,LEN(templateRest)) as templateRest
FROM cte
WHERE templateRest LIKE N'%}%'
)
SELECT t.id, t.template, c.match
-- Only distinctive:
-- SELECT DISTINCT t.id, t.template c.match
FROM cte AS c
INNER JOIN #temp AS t
ON c.id = t.id
OPTION(MAXRECURSION 1000) -- if needed, this value could still be raised
DROP TABLE #temp
GO
You can filter it for the template and retrieve all matches.
Not able to word the question properly, so couldn't search what I want. All I need is a dummy table with a single column of say guids, which I use it for some other purposes. Without actually writing same insert .. newID() n times, wondering if there is an elegant solution.
Similar question would be how do I populate a blank table with a int column with say 1-n numbers.
Row1: 1
Row2: 2
.......
Row100:100
Instead of a recursive CTE, I recommend a set-based approach from any object you know already has more than 100 rows.
--INSERT dbo.newtable(ID, GUID)
SELECT TOP (100) ROW_NUMBER() OVER (ORDER BY [object_id]), NEWID()
FROM sys.all_columns ORDER BY [object_id];
For plenty of other ideas, see this series:
http://www.sqlperformance.com/generate-a-set-1
http://www.sqlperformance.com/generate-a-set-2
http://www.sqlperformance.com/generate-a-set-3
You can do it recursively.
For numbers, f.ex.:
WITH r AS (
SELECT 1 AS n
UNION ALL
SELECT n+1 FROM r WHERE n+1<=100
)
SELECT * FROM r
This method is blisteringly fast. If you need to generate a numbers table from nothing, it's probably the "best" means available.
WITH
t0(i) AS (SELECT 0 UNION ALL SELECT 0), -- 2 rows
t1(i) AS (SELECT 0 FROM t0 a, t0 b), -- 4 rows
t2(i) AS (SELECT 0 FROM t1 a, t1 b), -- 16 rows
t3(i) AS (SELECT 0 FROM t2 a, t2 b), -- 256 rows
--t4(i) AS (SELECT 0 FROM t3 a, t3 b), -- 65,536 rows
--t5(i) AS (SELECT 0 FROM t4 a, t4 b), -- 4,294,967,296 rows
n(i) AS (SELECT ROW_NUMBER() OVER(ORDER BY (SELECT 0)) FROM t3)
SELECT i FROM n WHERE i BETWEEN 1 AND 100
Regarding performance:
Using SQL Server 2022, on a Xeon box from 2016, with SET STATISTICS TIME ON to measure query time I got these numbers:
(With t4 and t5 commented-out), it generates 256 rows in "0ms".
(With t4 uncommented) it generates 65,536 rows in 53ms.
(With t5 uncommented in an INSERT FROM) it generated and inserted 4bn rows to a TABLE on-disk in about 65 minutes.
That's 66 million rows per minute, or about a million rows per second, nice!
Explanation:
The first CTE, t0 generates 2 rows.
Each subsequent CTE performs a CROSS JOIN of the previous CTE; a CROSS JOIN is a Cartesian Product which effectively squares the number of rows in each CTE step.
So having t0 through t3 means performing the Cartesian product three times, thus generating rows.
SELECT 0 FROM t0 a, t0 b is the same thing as SELECT 0 FROM t0 AS a CROSS JOIN t0 AS b.
Note that the results start at 1 and not 0 because ROW_NUMBER() starts at 1. To start at 0 do SELECT ( i - 1 ) FROM n in the outermost query.
One way;
;with guids( i, guid ) as
(
select 1 as i, newid()
union all
select i + 1, newid()
from guids
where i < 100
)
select guid from guids option (maxrecursion 100)
Just adding this as it wasn't listed:
A quick way to get 10 rows:
SELECT ROW_NUMBER() OVER(
ORDER BY N1.N)
, LOWER(NEWID())
FROM (VALUES(1), (1), (1), (1), (1), (1), (1), (1), (1), (1)) AS N1(N) -- 10
If you want it to be based on a variable:
DECLARE #N int = 10;
WITH Numbers(number)
AS (SELECT ROW_NUMBER() OVER(
ORDER BY N1.N)
FROM (VALUES(1), (1), (1), (1), (1), (1), (1), (1), (1), (1)) AS N1(N) -- 10
CROSS JOIN(VALUES(1), (1), (1), (1), (1), (1), (1), (1), (1), (1)) AS N2(N)-- 100
--CROSS JOIN (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) AS N3 (N) -- 1,000
--CROSS JOIN (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) AS N4 (N) -- 10,000
--CROSS JOIN (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) AS N5 (N) -- 100,000
-- Etc....
)
SELECT *
, LOWER(NEWID())
FROM Numbers
WHERE number <= #N;