Why does window functions not work in CROSS APPLY? - sql-server

There is a simple code. I always thought that both outside ROW_NUMBER and the one in CROSS APPLY clause are supposed to generate the same output (in my example I excepct rn = crn). Could you please explain why isn't it like that?
CREATE TABLE #tmp ( id INT, name VARCHAR(200) );
INSERT INTO #tmp
VALUES ( 1, 'a' ),
( 2, 'a' ),
( 3, 'a' ),
( 4, 'b' ),
( 5, 'b' ),
( 6, 'c' ),
( 7, 'a' );
SELECT name,
ROW_NUMBER() OVER ( PARTITION BY name ORDER BY id ) AS rn,
a.crn
FROM #tmp
CROSS APPLY (
SELECT ROW_NUMBER() OVER ( PARTITION BY name ORDER BY id ) AS crn
) a;
OUTPUT:
name rn crn
a 1 1
a 2 1
a 3 1
a 4 1
b 1 1
b 2 1
c 1 1

The query in the CROSS APPLY is applied to each row in #tmp. The query selects for that one row it is applied to, the row number for that one row which is of course one.
Maybe this article on Microsoft's Technet would give you more insight into how CROSS APPLY works. An excerpt that highlights what I wrote in previous paragraph:
The APPLY operator allows you to invoke a table-valued function for each row returned by an outer table expression of a query. The table-valued function acts as the right input and the outer table expression acts as the left input. The right input is evaluated for each row from the left input and the rows produced are combined for the final output. The list of columns produced by the APPLY operator is the set of columns in the left input followed by the list of columns returned by the right input.

Note that APPLY is using the fields from you main query as the parameters.
SELECT ROW_NUMBER() OVER ( PARTITION BY name ORDER BY id ) AS crn
The above query does not have a FROM clause. So it's treating the name and id as literals. To illustrate, for the first row of #tmp, the resulting query of the CROSS APPLY is:
SELECT ROW_NUMBER() OVER ( PARTITION BY (SELECT 'a') ORDER BY (SELECT 1)) AS crn
which returns:
crn
--------------------
1
This is the result of your CROSS APPLY for every rows.
To achieve the desired result:
SELECT
t.name,
ROW_NUMBER() OVER ( PARTITION BY t.name ORDER BY t.id ) AS rn,
a.crn
FROM #tmp t
CROSS APPLY(
SELECT id, ROW_NUMBER() OVER (PARTITION BY name ORDER BY id ) AS crn
FROM #tmp
) a
WHERE t.id = a.id

Related

row number without using the ROW_NUMBER window function

Suppose you have a table with non-unique values such as this:
CREATE TABLE accounts ( fname VARCHAR(20), lname VARCHAR(20))
GO
INSERT accounts VALUES ('Fred', 'Flintstone')
INSERT accounts VALUES ('Fred', 'Flintstone')
INSERT accounts VALUES ('Fred', 'Flintstone')
SELECT * FROM accounts
GO
Now using a ROW_NUMBER function, you can get a unique incrementing row number.
select *, ROW_NUMBER() over(order by (select null)) as rn
from accounts
But how do we this without using a ROW_NUMBER function. I tried giving each row a unique ID using NEWID() and then counting the rows as given below but it did not work as it gives me a non-unique number which does not start with 1.
Note that I do not want to alter the table to add a new column.
;with cte as
(select *
from accounts as e
cross apply (select newid()) as a(id)
)
select *, (select count(*)+1 from cte as c1 where c.id > c1.id) as rn
from cte as c
order by rn
SQL Fiddle for toying around is http://sqlfiddle.com/#!18/c270f/3/0
The following demonstrates why your code fails, but does not provide an alternative to Row_Number().
A column, TopId, is added to the final select that should get the minimum value generated by NewId() and report it in every row. Instead, a new value is generated for each row.
-- Sample data.
declare #Samples as Table ( FName VarChar(20), LName VarChar(20) );
insert into #Samples ( FName, LName ) values
( 'Fred', 'Flintstone' ), ( 'Fred', 'Flintstone' ), ( 'Fred', 'Flintstone' );
select * from #Samples;
-- Cross apply NewId() in a CTE.
;with cte as
( select *
from #Samples as S
cross apply ( select NewId() ) as Ph( Id ) )
select *, ( select count(*) from cte as c1 where c1.Id >= c.Id ) as RN,
-- The following column should output the minimum Id value from the table for every row.
-- Instead, it generates a new unique identifier for each row.
( select top 1 id from cte order by id ) as TopId
from cte as c
order by RN;
The execution plan shows that the CTE is treated as a view that is being evaluated repeatedly, thus generating conflicting Id values.
How about this:
SELECT
src.*,
SUM(DummyVal) OVER(ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS RowId
FROM (
SELECT a.*, 1 AS DummyVal
FROM MyTable a
) src
It's still a window function, though, not sure if that matters.
Fiddle me this
You can create an function yourself to compute the row_number,
In this example, I had to calculate an index for a lesson within a course.
Window Function version:
SELECT *, ROW_NUMBER() OVER(PARTITION BY courseId) AS row_num FROM lessons;
I created helper-function, to compute the row_number without window function:
DELIMITER $$
CREATE FUNCTION getRowNumber (lessonId int, courseId int)
RETURNS int
DETERMINISTIC
BEGIN
DECLARE count int;
select count(l2.id) into count from lessons l2 where l2.courseId=courseId
and l2.id<=lessonId;
RETURN count;
END$$
DELIMITER ;
so, the final query is:
SELECT l.*, getRowNumber(l.id,l.courseId) as row_num FROM lessons l;
got the same result as the first query!
MySQL:
SELECT #rownum := #rownum + 1 AS rank, a.*
FROM accounts a,(SELECT #rownum := 0) r;
In ORACLE it would simply be
SELECT ROWNUM, a.*
FROM accounts a;
Both without window

Union doesnt preserve order(not stable)

I execute below query and expecting first set then second set distinctly but order is totally random.
I expected results: john,mark,dave,robert,kirk
select *
from (
select Name
from (values ('john'),('mark'),('dave')) X(Name)
union
select Name
from (values ('robert'),('mark'),('kirk')) X(Name)
) q
This is alternative query which i expected to have ordered(stable) results but i get same results. Union All append second set as i expected but applying Distinct later break ordering.
select Distinct Name
from (
select Name
from (values ('john'),('mark'),('dave')) X(Name)
union all
select Name
from (values ('robert'),('mark'),('kirk')) X(Name)
) q
What is solution for having ordered and distinct set ?
WITH q AS
(
-- Original data
SELECT Name FROM (VALUES ('john'),('mark'),('dave')) X(Name)
UNION ALL
SELECT Name FROM (VALUES ('robert'),('mark'),('kirk')) X(Name)
), r AS
(
-- Add the sequence column for ordering
-- ** It just use natual order **
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS Seq, * from q
), s AS
(
-- Use RN to filter out the duplicates
SELECT *, ROW_NUMBER() OVER (PARTITION BY Name ORDER BY Seq) AS RN FROM r
)
SELECT Name FROM s WHERE RN = 1 ORDER BY Seq
If you want the first set first, then every element in the second set that is not in the first set later, then you have to specify it. And you will also need to add an ORDER BY if you want to guarantee things in the first set being listed before the second set.
select q.name
from (
select Name,n
from
(values ('john'),('mark'),('dave')) X(Name)
cross join (values (1)) y(n)
union all
select Name,n
from
(
select Name
from (values ('robert'),('mark'),('kirk')) X(Name)
except
select Name
from
(values ('john'),('mark'),('dave')) X(Name)
) Z(Name)
Cross join (values (2)) y(n)
) q
order by q.n
If we are using values to select data then no need to use order by clause and and we won't have to worry about potential index changes from that particular query,so you will have the order always and moreover we are using union all it will just concatenate the result set is not sorted.
However, if you are using table, yes, the order can most certainly change depends on several factors such as table indexes, columns being returned, new data being introduced, etc. So if you want your results ordered in a particular way, you need to specify ORDER BY clause .
For your example you don't need to use order by clause like below
select Name
from (
select Name
from (values ('john'),('mark'),('dave')) X(Name)
union all
( select Name
from (values ('robert'),('mark'),('kirk')) X(Name)
except
select Name
from (values ('john'),('mark'),('dave')) X(Name))
) q
Is it reasonable for you to add a SortOrder column to your inner SELECT statements and then order by that? Something like:
select Distinct Name
from (
select SortOrder, Name
from (values (1, 'john'),(2, 'mark'),(3, 'dave')) X(SortOrder, Name)
union all
select SortOrder, Name
from (values (4, 'robert'),(2, 'mark'),(5, 'kirk')) X(SortOrder, Name)
) q
order by SortOrder ASC
The only way to guarantee order is by using ORDER BY clause, this is documented in BOL:
https://msdn.microsoft.com/en-us/library/ms188385.aspx
Order the result set of a query by the specified column list and, optionally, limit the rows returned to a specified range. The order in which rows are returned in a result set are not guaranteed unless an ORDER BY clause is specified.
If you want rows returned based on order in the UNION, then you could do something like this:
select Name
from (
select 1 as sort1, Name
from (values ('john'),('mark'),('dave')) X(Name)
union
select 2 as sort1, Name
from (values ('robert'),('mark'),('kirk')) X(Name)
) q
order by sort1, Name

T-SQL - Next row with greater value, continuously

I have table described bellow from which I need to select all rows with [Value] greater for example at least 5 points than [Value] from previous row (ordered by [Id]). Starting with first row of [Id] 1, desired output would be:
[Id] [Value]
---------------
1 1
4 12
8 21
Code:
declare #Data table
(
[Id] int not null identity(1, 1) primary key,
[Value] int not null
);
insert into #Data ([Value])
select 1 [Value]
union all
select 5
union all
select 3
union all
select 12
union all
select 8
union all
select 9
union all
select 16
union all
select 21;
select [t1].*
from #Data [t1];
Edit:
So, based on JNevill's and Hogan's answers I end with this:
;with [cte1]
as (
select [t1].[Id],
[t1].[Value],
cast(1 as int) [rank]
from #Data [t1]
where [t1].[Id] = 1
union all
select [t2].[Id],
[t2].[Value],
cast(row_number() over (order by [t2].id) as int) [rank]
FROM [cte1] [t1]
inner join #Data [t2] on [t2].[value] - [t1].[value] > 5
and [t2].[Id] > [t1].[Id]
where [t1].[rank] = 1
)
select [t1].[Id],
[t1].[Value]
from [cte1] [t1]
where [t1].[rank] = 1;
which is working. Alan Burstein answer is correct too (but applicable only on MSSQL 2012+ - due to LAG fc). I will do some performance tests (I'm on 2016 version) and will see performance over my real data (approx. 30 millions of records).
If you are on 2012+ you can use LAG which will provide a better performing solution that a recursive CTE. I'm including your sample data so you can just copy/paste/test...
-- Your sample data
DECLARE #Data TABLE
(
Id int not null identity(1, 1) primary key,
Value int not null
);
insert into #Data ([Value])
select 1 [Value] union all select 5 union all select 3 union all select 12 union all
select 8 union all select 9 union all select 16 union all select 21;
-- Solution using window functions
WITH
prevRows AS
(
SELECT t1.Id, t1.Value, prevDiff = LAG(t1.Value, 1) OVER (ORDER BY t1.id) - t1.Value
FROM #Data t1
),
NewPrev AS
(
SELECT t1.Id, t1.Value, NewDiff = Value - LAG(t1.Value,1) OVER (ORDER BY t1.id)
FROM prevRows t1
WHERE prevDiff <= -5 OR prevDiff IS NULL
)
SELECT t1.Id, t1.Value
FROM NewPrev t1
WHERE NewDiff >= 5 OR NewDiff IS NULL;
I believe the best way to pull this off is using a recursive CTE. A Recursive CTE is a special type of CTE that refers back to itself. It's made up of two parts.
The recursive seed/anchor which establishes the beginning of the recursion. In your case, record with ID=1.
The recursive term/member which is the statement that refers back to itself by the name of the CTE. Here we pull through the next record that is greater than 5 from the previous found record according to the ID sorted ascending.
Code:
WITH RECURSIVE recCTE AS
(
/*Select first record for recursive seed/anchor*/
SELECT
id,
value,
cast(1 as INT) as [rank]
FROM table
WHERE id = 1
UNION ALL
/*find the next value that is more than 5 from the current value*/
SELECT
table.id,
table.value
ROW_NUMBER() OVER (ORDER BY id)
FROM
recCTE INNER JOIN table
ON table.value - recCTE.value > 5
AND table.id > recCTE.id
WHERE recCTE.[rank]=1
)
SELECT id, value FROM recCTE;
I've made use of the Row_Number() Window Function to find the rank of the matching record by ID sorted Ascending. With the WHERE clause in the recursive term we only grab the first found record that is 5 more than the previous found record. Then we head into the next recursive step.
You can do it with a recursive CTE
with find_values as
(
-- Find first value
SELECT Value
FROM #Table
ORDER BY ID ASC
FETCH FIRST 1 ROW ONLY
UNION ALL
-- Find next value
SELECT Value
FROM #Table
CROSS JOIN find_values
WHERE Value >= find_values.Value + 5
ORDER BY ID ASC
FETCH FIRST 1 ROW ONLY
)
SELECT *
FROM find_values

need empty row in output between rows with data Report Builder 3.0 T-SQL

I have reports that pull 50 random records. i would like to insert a blank Row in the output between each row of data. for example, Rows 1,3,5,7... are populated with data, and even number rows are empty.
Thanks
CROSS APPLY with NULL valued table gives equal number of rows as original table.
Now we can generate ROW_NUMBER for two SELECTs and sort it by row number to get alternate values.
select C.id, C.name, ROW_NUMBER() OVER ( ORDER BY C.id) as seq from TableC C
UNION ALL
SELECT T.id, T.name, seq as seq
FROM
(
select T.id, T.name ,ROW_NUMBER() OVER ( ORDER by C.id ) as seq from TableC C
cross apply ( select NULL as id,NULL as name ) T
) T
ORDER BY seq
A very simple solution could be :
Select id+Temp id, name+Temp name from TableA A
Cross Apply
(select '' as 'Temp'
union all
Select null) X
If you have large number of columns, then create the concatenation of column_name + temp dynamically and use that in dynamic sql
for Demo Click on --> DEMO

Insert row for each integer between 0 and <value> without cursor

I have a source table with id and count.
id count
a 5
b 2
c 31
I need to populate a destination table with each integer up to the count for each id.
id value
a 1
a 2
a 3
a 4
a 5
b 1
b 2
c 1
c 2
etc...
My current solution is like so:
INSERT INTO destination (id,value)
source.id
sequence.number
FROM
(VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9)) AS sequence(number)
INNER JOIN
source ON sequence.number <= source.count
This solution has an upper limit and is plain lame. Is there anyway to replace the sequence with a set of all integers? Or another solution that does not use looping.
this should work:
WITH r AS (
SELECT id, count, 1 AS n FROM SourceTable
UNION ALL
SELECT id, count, n+1 FROM r WHERE n<count
)
SELECT id,n FROM r
order by id,n
OPTION (MAXRECURSION 0)
Unfortunately, there is not set of all integers in SQL Server. However, using a little trickery, you can easily generate such a set:
select N from (
select ROW_NUMBER() OVER (ORDER BY t1.object_id) AS N
from sys.all_objects t1, sys.all_objects t2
) AS numbers
where N between 1 and 1000000
will generate a set of all numbers from 1 through 1000000. If you need more than a few million numbers, add sys.all_objects to the cross join a third time.
You can find many examples in this page:
DECLARE #table TABLE (ID VARCHAR(1), counter INT)
INSERT INTO #table SELECT 'a', 5
INSERT INTO #table SELECT 'b', 3
INSERT INTO #table SELECT 'c', 31
;WITH cte (ID, counter) AS (
SELECT id, 1
FROM #table
UNION ALL
SELECT c.id, c.counter +1
FROM cte AS c
INNER JOIN #table AS t
ON t.id = c.id
WHERE c.counter + 1 <= t.counter
)
SELECT *
FROM cte
ORDER BY ID, Counter

Resources