Related
In SQL Server 2019, analytic functions are not returning the results that I would expect in the context of recursive common table expressions. Consider the following non-recursive T-SQL query:
WITH SourceData (RowNum, Uniform, RowVal) AS (
SELECT 1, 'A', 'A' UNION ALL
SELECT 2, 'A', 'B' UNION ALL
SELECT 3, 'A', 'C' UNION ALL
SELECT 4, 'A', 'D'
),
RecursiveCte0 (RowNum, Uniform, RowVal, MinVal, SomeSum, RowNumCalc, RecursiveLevel) AS (
SELECT RowNum, Uniform, RowVal, RowVal, RowNum, CAST(RowNum AS BIGINT), 0
FROM SourceData
),
RecursiveCte1 (RowNum, Uniform, RowVal, MinVal, SomeSum, RowNumCalc, RecursiveLevel) AS (
SELECT * FROM RecursiveCte0
UNION ALL
SELECT
RowNum, Uniform, RowVal,
MIN(MinVal) OVER (PARTITION BY Uniform),
SUM(RowNum) OVER (PARTITION BY Uniform),
ROW_NUMBER() OVER (PARTITION BY Uniform ORDER BY RowNum),
RecursiveLevel + 1
FROM RecursiveCte0
)
SELECT *
FROM RecursiveCte1
ORDER BY RecursiveLevel, RowNum;
Results:
RowNum Uniform RowVal MinVal SomeSum RowNumCalc RecursiveLevel
1 A A A 1 1 0
2 A B B 2 2 0
3 A C C 3 3 0
4 A D D 4 4 0
1 A A A 10 1 1
2 A B A 10 2 1
3 A C A 10 3 1
4 A D A 10 4 1
As expected, the MIN, SUM, and ROW_NUMBER functions generate the appropriate values based on all rows from RecursiveCte0. I would expect the following recursive query to be logically identical to the non-recursive version above, but it produces different results:
WITH SourceData (RowNum, Uniform, RowVal) AS (
SELECT 1, 'A', 'A' UNION ALL
SELECT 2, 'A', 'B' UNION ALL
SELECT 3, 'A', 'C' UNION ALL
SELECT 4, 'A', 'D'
),
RecursiveCte (RowNum, Uniform, RowVal, MinVal, SomeSum, RowNumCalc, RecursiveLevel) AS (
SELECT RowNum, Uniform, RowVal, RowVal, RowNum, CAST(RowNum AS BIGINT), 0
FROM SourceData
UNION ALL
SELECT
RowNum, Uniform, RowVal,
MIN(MinVal) OVER (PARTITION BY Uniform),
SUM(RowNum) OVER (PARTITION BY Uniform),
ROW_NUMBER() OVER (PARTITION BY Uniform ORDER BY RowNum),
RecursiveLevel + 1
FROM RecursiveCte
WHERE RecursiveLevel < 1
)
SELECT *
FROM RecursiveCte
ORDER BY RecursiveLevel, RowNum;
Results:
RowNum Uniform RowVal MinVal SomeSum RowNumCalc RecursiveLevel
1 A A A 1 1 0
2 A B B 2 2 0
3 A C C 3 3 0
4 A D D 4 4 0
1 A A A 1 1 1
2 A B B 2 1 1
3 A C C 3 1 1
4 A D D 4 1 1
For each of the three analytic functions, it appears that the grouping is only being applied within the context of each individual row, rather than all of the rows at that level. This unexpected behavior also happens if I partition over (SELECT NULL). I would expect the analytic functions to apply to the entire recursion level, as per MSDN:
Analytic and aggregate functions in the recursive part of the CTE are
applied to the set for the current recursion level and not to the set
for the CTE. Functions like ROW_NUMBER operate only on the subset of
data passed to them by the current recursion level and not the entire
set of data passed to the recursive part of the CTE.
Why do these two queries produce different results? Is there a way to effectively use analytic functions with recursive common table expressions?
Sometimes the Snowflake SQL compiler tries to be too smart for its own good. This is a follow-up to a previous question here, where a clever solution was provided for my given use-case, but have run into some limitations for that solution.
A brief background; I have a JS-UDTF that takes 3 float arguments to return rows representing a series GENERATE_SERIES(FLOAT,FLOAT,FLOAT), and a SQL-UDTF GENERATE_SERIES(INT,INT,INT) that cast the params to floats, invokes the JS-UDTF, and then the result back to ints. My original version for this wrapper UDTF was:
CREATE OR REPLACE FUNCTION generate_series(FIRST_VALUE INTEGER, LAST_VALUE INTEGER, STEP_VALUE INTEGER)
RETURNS TABLE (GS_VALUE INTEGER)
AS
$$
SELECT GS_VALUE::INTEGER AS GS_VALUE FROM table(generate_series(FIRST_VALUE::DOUBLE,LAST_VALUE::DOUBLE,STEP_VALUE::DOUBLE))
$$;
Which would fail in most conditions where the input were not constants, e.g.:
WITH report_params AS (
SELECT
1::integer as first_value,
3::integer as last_value,
1::integer AS step_value
)
SELECT
*
FROM
report_params, table(
generate_series(
first_value,
last_value,
step_value
)
)
Would return error:
SQL compilation error: Unsupported subquery type cannot be evaluated
The provided solution to trick the SQL compiler to behave was to encapsulate the function params into a VALUES table and cross-join the inner UDTF:
CREATE OR REPLACE FUNCTION generate_series_int(FIRST_VALUE INTEGER, LAST_VALUE INTEGER, STEP_VALUE INTEGER)
RETURNS TABLE (GS_VALUE INTEGER)
AS
$$
SELECT GS_VALUE::INTEGER AS GS_VALUE
FROM (VALUES (first_value, last_value, step_value)),
table(generate_series(first_value::double,last_value::double,step_value::double))
$$;
This worked lovely for most invocations, however I've discovered a situation where the SQL compiler is at it again. Here is a simplified example that reproduces the problem:
WITH report_params AS (
SELECT
1::integer AS first_value,
DATEDIFF('DAY','2020-01-01'::date,'2020-02-01'::date)::integer AS last_value,
1::integer AS step_value
)
SELECT
*
FROM
report_params, table(
COMMON.FN.generate_series(
first_value,
last_value,
step_value
)
);
This results in the error:
SQL compilation error: Invalid expression [CORRELATION(SYS_VW.LAST_VALUE_3)] in VALUES clause
The error seems obvious enough (I think) that the compiler is trying to embed the function code into the outer queries treating the function like a macro before runtime.
The answer at this point might just be that I am asking too much out of Snowflake's current capabilities, but in the interest of learning and continuing to build out what I think is a very helpful UDF library, am curious if there is a solution I am missing.
The major problem is you have written a correlated sub query.
WITH report_params AS (
SELECT * FROM VALUES
(1, 30, 1)
v(first_value,last_value, step_value)
)
SELECT
*
FROM
report_params, table(
COMMON.FN.generate_series(
first_value,
last_value,
step_value
)
);
as when you add a second row to your CTE
WITH report_params AS (
SELECT * FROM VALUES
(1, 30, 1),
(2, 40, 2)
v(first_value,last_value, step_value)
)
SELECT
*
FROM
report_params, table(
COMMON.FN.generate_series(
first_value,
last_value,
step_value
)
);
it becomes more obvious this is correlated, which is not so obvious who snowflake should execute it.
which for the above data would ideal look like (if it was valid SQL)
WITH report_params AS (
SELECT *
,mod(v.first_value,v.step_value) as mod_offset
FROM VALUES
(0, 5, 20, 1),
(1, 3, 15, 3),
(2, 4, 15, 3),
(3, 5, 15, 3)
v(id, first_value,last_value, step_value)
), report_ranges AS (
SELECT min(first_value) as mmin,
max(last_value) as mmax
FROM report_params
WHERE first_value <= last_value AND step_value > 0
), all_range AS (
SELECT
row_number() over (order by seq8()) + rr.mmin - 1 as seq
FROM report_ranges rr,
TABLE(GENERATOR( ROWCOUNT => (rr.mmax - rr.mmin) + 1 ))
)
SELECT
ar.seq
,rp.id, rp.first_value, rp.last_value, rp.step_value, rp.mod_offset
FROM all_range as ar
JOIN report_params as rp ON ar.seq BETWEEN rp.first_value AND rp.last_value AND mod(ar.seq, rp.step_value) = rp.mod_offset
ORDER BY 2,1;
but if your generating it in a stored procedure (or externally) could be substituted into
WITH report_params AS (
SELECT *
,mod(v.first_value,v.step_value) as mod_offset
FROM VALUES
(0, 5, 20, 1),
(1, 3, 15, 3),
(2, 4, 15, 3),
(3, 5, 15, 3)
v(id, first_value,last_value, step_value)
), all_range AS (
SELECT
row_number() over (order by seq8()) + 3 /*min*/ - 1 as seq
FROM TABLE(GENERATOR( ROWCOUNT => (20/*max*/ - 3/*min*/) + 1 ))
)
SELECT
ar.seq
,rp.id
,rp.first_value, rp.last_value, rp.step_value, rp.mod_offset
FROM all_range as ar
JOIN report_params as rp ON ar.seq BETWEEN rp.first_value AND rp.last_value AND mod(ar.seq, rp.step_value) = rp.mod_offset
ORDER BY 2,1;
giving:
SEQ ID FIRST_VALUE LAST_VALUE STEP_VALUE MOD_OFFSET
5 0 5 20 1 0
6 0 5 20 1 0
7 0 5 20 1 0
8 0 5 20 1 0
9 0 5 20 1 0
10 0 5 20 1 0
11 0 5 20 1 0
12 0 5 20 1 0
13 0 5 20 1 0
14 0 5 20 1 0
15 0 5 20 1 0
16 0 5 20 1 0
17 0 5 20 1 0
18 0 5 20 1 0
19 0 5 20 1 0
20 0 5 20 1 0
3 1 3 15 3 0
6 1 3 15 3 0
9 1 3 15 3 0
12 1 3 15 3 0
15 1 3 15 3 0
4 2 4 15 3 1
7 2 4 15 3 1
10 2 4 15 3 1
13 2 4 15 3 1
5 3 5 15 3 2
8 3 5 15 3 2
11 3 5 15 3 2
14 3 5 15 3 2
The problem I cannot guess at, is it feels like you ether trying to hide some complexity behind the table functions JS functions, or have made thing over complex for an unstated reason.
[edit speaking to the 1-9 comment]
the major difference between a generate_series and GENERATOR is the former is almost a UDF or CTE and in snowflake you have to have the GENERATOR in it own sub-select or you will get messed up results.
with s1 as (
SELECT
row_number() over (order by seq8()) -1 as seq
FROM
TABLE(GENERATOR( ROWCOUNT => 3 ))
), s2 as (
SELECT
row_number() over (order by seq8()) -1 as seq
FROM
TABLE(GENERATOR( ROWCOUNT => 3 ))
)
select s1.seq as a, s2.seq as b
from s1, s2
order by 1,2;
gives 9 rows of the two data mixed, like you not you want.
where-as
with s1 as (
SELECT
row_number() over (order by seq8()) -1 as seq
FROM
TABLE(GENERATOR( ROWCOUNT => 3 ))
)
SELECT
row_number() over (order by seq8()) -1 as a
,s1.seq as b
FROM
TABLE(GENERATOR( ROWCOUNT => 3 )), s1;
give 1-9, because the GENERATOR (the creator of rows) has been crossed with the other data, before the sequence code has run..
Another version of the original solution provided, is
WITH report_params AS (
SELECT *
,trunc(div0((last_value-first_value),step_value)) as steps
FROM VALUES
(0, 5, 20, 1),
(1, 3, 15, 3),
(2, 4, 15, 3),
(3, 5, 15, 3)
v(id, first_value,last_value, step_value)
), large_range AS (
SELECT
row_number() over (order by seq8()) -1 as seq
FROM
TABLE(GENERATOR( ROWCOUNT => 1000 ))
)
select rp.id
,rp.first_value + (lr.seq*rp.step_value) as val
from report_params as rp
join large_range as lr on lr.seq <= rp.steps
order by 1,2;
which I like more as the nature of the mixing is more clear. But it still speaks to the mindset difference between snowflake and other RDB. In postgress there is no cost to doing per-row operations, because it was born of an era where it was all per-row operations, but snowflake has no per-row options, and because it cannot do things on each row, it can do many rows independently. It means all expressions of per-row, need to be moved to the front and then joined. Thus what the above is trying to show.
I'm trying to generate the numbers in the "x" column considering the values in field "eq", in a way that it should assign a number for every record until it meets the value "1", and the next row should reset and start counting again. I've tried with row_number, but the problem is that I only have ones and zeros in the column I need to evaluate, and the cases I've seen using row_number were using growing values in a column. Also tried with rank, but I haven't managed to make it work.
nInd Fecha Tipo #Inicio #contador_I #Final #contador_F eq x
1 18/03/2002 I 18/03/2002 1 null null 0 1
2 20/07/2002 F 18/03/2002 1 20/07/2002 1 1 2
3 19/08/2002 I 19/08/2002 2 20/07/2002 1 0 1
4 21/12/2002 F 19/08/2002 2 21/12/2002 2 1 2
5 17/03/2003 I 17/03/2003 3 21/12/2002 2 0 1
6 01/04/2003 I 17/03/2003 4 21/12/2002 2 0 2
7 07/04/2003 I 17/03/2003 5 21/12/2002 2 0 3
8 02/06/2003 F 17/03/2003 5 02/06/2003 3 0 4
9 31/07/2003 F 17/03/2003 5 31/07/2003 4 0 5
10 31/08/2003 F 17/03/2003 5 31/08/2003 5 1 6
11 01/09/2005 I 01/09/2005 6 31/08/2003 5 0 1
12 05/09/2005 I 01/09/2005 7 31/08/2003 5 0 2
13 31/12/2005 F 01/09/2005 7 31/12/2005 6 0 3
14 14/01/2006 F 01/09/2005 7 14/01/2006 7 1 4
There is another solution available:
select
nind, eq, row_number() over (partition by s order by s)
from (
select
nind, eq, coalesce((
select sum(eq) +1 from mytable pre where pre.nInd < mytable.nInd)
,1) s --this is the sum of eq!
from mytable) g
The inner subquery creates groups sequentially for each occurrence of 1 in eq. Then we can use row_number() over partition to get our counter.
Here is an example using Sql Server
I have two answers here. One is based off of the ROW_NUMBER() and the other is based off of what appears to be your index (nInd). I wasn't sure if there would be a gap in your index so I made the ROW_NUMBER() as well.
My table format was as follows -
myIndex int identity(1,1) NOT NULL
number int NOT NULL
First one is ROW_NUMBER()...
WITH rn AS (SELECT *, ROW_NUMBER() OVER (ORDER BY myIndex) AS rn, COUNT(*) AS max
FROM counting c GROUP BY c.myIndex, c.number)
,cte (myIndex, number, level, row) AS (
SELECT r.myIndex, r.number, 1, r.rn + 1 FROM rn r WHERE r.rn = 1
UNION ALL
SELECT r1.myIndex, r1.number,
CASE WHEN r1.number = 0 AND r2.number = 1 THEN 1
ELSE c.level + 1
END,
row + 1
FROM cte c
JOIN rn r1
ON c.row = r1.rn
JOIN rn r2
ON c.row - 1 = r2.rn
)
SELECT c.myIndex, c.number, c.level FROM cte c OPTION (MAXRECURSION 0);
Now the index...
WITH cte (myIndex, number, level) AS (
SELECT c.myIndex + 1, c.number, 1 FROM counting c WHERE c.myIndex = 1
UNION ALL
SELECT c1.myIndex + 1, c1.number,
CASE WHEN c1.number = 0 AND c2.number = 1 THEN 1
ELSE c.level + 1
END
FROM cte c
JOIN counting c1
ON c.myIndex = c1.myIndex
JOIN counting c2
ON c.myIndex - 1 = c2.myIndex
)
SELECT c.myIndex - 1 AS myIndex, c.number, c.level FROM cte c OPTION (MAXRECURSION 0);
The answer that I have now is via using
Cursor
I know if there is another solution without cursor it will be better for performance aspects
here is a quick demo of my solution:
-- Create DBTest
use master
Go
Create Database DBTest
Go
use DBTest
GO
-- Create table
Create table Tabletest
(nInd int , eq int)
Go
-- insert dummy data
insert into Tabletest (nInd,eq)
values (1,0),
(2,1),
(3,0),
(4,1),
(5,0),
(6,0),
(7,0),
(8,0),
(9,1),
(8,0),
(9,1)
Create table #Tabletest (nInd int ,eq int ,x int )
go
DECLARE #nInd int , #eq int , #x int
set #x = 1
DECLARE db_cursor CURSOR FOR
SELECT nInd , eq
FROM Tabletest
order by nInd
OPEN db_cursor
FETCH NEXT FROM db_cursor INTO #nInd , #eq
WHILE ##FETCH_STATUS = 0
BEGIN
if (#eq = 0)
begin
insert into #Tabletest (nInd ,eq ,x) values (#nInd , #eq , #x)
set #x = #x +1
end
else if (#eq = 1)
begin
insert into #Tabletest (nInd ,eq ,x) values (#nInd , #eq , #x)
set #x = 1
end
FETCH NEXT FROM db_cursor INTO #nInd , #eq
END
CLOSE db_cursor
DEALLOCATE db_cursor
select * from #Tabletest
The end result set will be as following:
Hope it helps.
Looking at this a slightly different way (which might not be true, but eliminates the need for cursors of recursive CTEs), it looks like you building ordered groups within your dataset. So, start by finding those groups, then determining the ordering of each of them.
The real key is to determine the rules to find the correcting grouping. Based on your description and comments, I'm guessing the grouping is from the start (ordered by the nInd column) ending at each row with and eq value of 1, so you can do something like:
;with ends(nInd, ord) as (
--Find the ending row for each set
SELECT nInd, row_number() over(order by nInd)
FROM mytable
WHERE eq=1
), ranges(sInd, eInd) as (
--Find the previous ending row for each ending row, forming a range for the group
SELECT coalesce(s.nInd,0), e.nInd
FROM ends s
right join ends e on s.ord=e.ord-1
)
Then, using these group ranges, you can find the final ordering of each:
select t.nInd, t.Fecha, t.eq
,[x] = row_number() over(partition by sInd order by nInd)
from ranges r
join mytable t on r.sInd < t.nInd
and t.nInd <= r.eInd
order by t.nInd
This is a bit of a weird question, and I know it would probably be easier to not do it in SQL, but it will make my life a lot easier.
Basically I have a single column result-set, and I need to turn that into 3 columns, not based on any criteria.
eg.
1
2
3
4
5
6
7
into:
1 2 3
4 5 6
7
It will always be a fixed 3 column result I need in this case.
Currently I am using a cursor and inserting into a table variable, which seems a bit terrible. There must be a better way.
Thanks
Try this:
declare #t table(n int)
insert #t(n) values(1),(2),(3),(4),(5),(6),(7),(8),(9),(10)
select [0],[1],[2]
from
(
select n
, (ROW_NUMBER() over (order by n) - 1) % 3 c
, (ROW_NUMBER() over (order by n) - 1) / 3 r
from #t
) x
pivot (max(n) for c in ([0], [1], [2])) p
It's possible, but man is this an ugly requirement. This really belongs in the presentation tier, not in the sql.
WITH original As
(
SELEZCT MyColumn, row_number() over (order by MyColumn) as ordinal
FROM RestOfOriginalQueryHere
),
Grouped As
(
SELECT MyColumn, ordinal / 3 As row, ordinal % 3 As col
FROM original
)
SELECT o1.MyColumn, o2.MyColumn, o3.MyColumn
FROM grouped g1
LEFT JOIN grouped g2 on g2.row = g1.row and g2.col = 1
LEFT JOIN grouped g3 on g2.row = g1.row and g3.col = 2
WHERE g1.col = 0
I would like to filter duplicate rows on conditions so that the rows with minimum modified and maximum active and unique rid and did are picked. self join? or any better approach that would be performance wise better?
Example:
id rid modified active did
1 1 2010-09-07 11:37:44.850 1 1
2 1 2010-09-07 11:38:44.000 1 1
3 1 2010-09-07 11:39:44.000 1 1
4 1 2010-09-07 11:40:44.000 0 1
5 2 2010-09-07 11:41:44.000 1 1
6 1 2010-09-07 11:42:44.000 1 2
Output expected is
1 1 2010-09-07 11:37:44.850 1 1
5 2 2010-09-07 11:41:44.000 1 1
6 1 2010-09-07 11:42:44.000 1 2
Commenting on the first answer, the suggestion does not work for the below dataset(when active=0 and modified is the minimum for that row)
id rid modified active did
1 1 2010-09-07 11:37:44.850 1 1
2 1 2010-09-07 11:38:44.000 1 1
3 1 2010-09-07 11:39:44.000 1 1
4 1 2010-09-07 11:36:44.000 0 1
5 2 2010-09-07 11:41:44.000 1 1
6 1 2010-09-07 11:42:44.000 1 2
Assuming SQL Server 2005+. Use RANK() instead of ROW_NUMBER() if you want ties returned.
;WITH YourTable as
(
SELECT 1 id,1 rid,cast('2010-09-07 11:37:44.850' as datetime) modified, 1 active,1 did union all
SELECT 2,1,'2010-09-07 11:38:44.000', 1,1 union all
SELECT 3,1,'2010-09-07 11:39:44.000', 1,1 union all
SELECT 4,1,'2010-09-07 11:36:44.000', 0,1 union all
SELECT 5,2,'2010-09-07 11:41:44.000', 1,1 union all
SELECT 6,1,'2010-09-07 11:42:44.000', 1,2
),cte as
(
SELECT id,rid,modified,active, did,
ROW_NUMBER() OVER (PARTITION BY rid,did ORDER BY active DESC, modified ASC ) RN
FROM YourTable
)
SELECT id,rid,modified,active, did
FROM cte
WHERE rn=1
order by id
select id, rid, min(modified), max(active), did from foo group by rid, did order by id;
You can get good performance with a CROSS APPLY if you have a table that has one row for each combination of rid and did:
SELECT
X.*
FROM
ParentTable P
CROSS APPLY (
SELECT TOP 1 *
FROM YourTable T
WHERE P.rid = T.rid AND P.did = T.did
ORDER BY active DESC, modified
) X
Substituting (SELECT DISTINCT rid, did FROM YourTable) for ParentTable would work but will hurt performance.
Also, here is my crazy, single scan magic query which can often outperform other methods:
SELECT
id = Substring(Packed, 6, 4),
rid,
modified = Convert(datetime, Substring(Packed, 2, 4)),
Active = Convert(bit, 1 - Substring(Packed, 1, 1)),
did,
FROM
(
SELECT
rid,
did,
Packed = Min(Convert(binary(1), 1 - active) + Convert(binary(4), modified) + Convert(binary(4), id)
FROM
YourTable
GROUP BY
rid,
did
) X
This method is not recommended because it's not easy to understand, and it's very easy to make mistakes with it. But it's a fun oddity because it can outperform other methods in some cases.