Related
I have:
Name of the table is "myTable" and maximum number value is 7.
index
id
number
letter
date
0
999
1
a
1/1/99
1
999
2
a
1/2/99
2
999
3
a
1/3/99
3
999
3
b
1/4/99
4
999
4
a
1/5/99
5
999
4
b
1/6/99
6
999
5
a
1/7/99
7
888
1
a
2/1/99
8
888
1
b
2/2/99
9
888
1
c
2/3/99
10
888
2
a
2/4/99
11
888
2
b
2/5/99
12
888
3
a
2/6/99
13
888
4
a
2/7/99
I'd like:
index
id
1
2
3
4
5
6
7
0
999
1/1/99
1/2/99
1/4/99
1/6/99
1/7/99
1
888
2/3/99
2/5/99
2/6/99
2/7/99
So rows with the highest letter for the same number are included in the output, and null values are empty strings or null.
The number value can be 1 through n, and the letters a through zzz. And there are way more than the one example id I have in there.
EDIT: I'm also given the maximum value for the number column and I've updated the illustration to show this.
Been at this for three days and tried many different approaches, any help is greatly appreciated.
SOLUTION: I was able to solve this problem by creating a stored procedure that I could pass some variables that would run a loop to create the query. One of the variables that get passed to the stored procedure is an integer that tells me how many columns I need to create, and then I build a pretty complex query using Pivots and Joins to get the output I need.
I'll post more details on this solution as soon as I wrap up the project and have a moment to generalize everything.
This seems to work just fine...?
Create Table #T([Index] Int, ID Int, Number Int, Letter Varchar(10), [Date] DateTime)
Insert Into #T([Index], ID, Number, Letter, [Date])
Values
(0 ,999 ,1 , 'a', '1/1/99'),
(1 ,999 ,2 , 'a', '1/2/99'),
(2 ,999 ,3 , 'a', '1/3/99'),
(3 ,999 ,3 , 'b', '1/4/99'),
(4 ,999 ,4 , 'a', '1/5/99'),
(5 ,999 ,4 , 'b', '1/6/99'),
(6 ,999 ,5 , 'a', '1/7/99'),
(7 ,888 ,1 , 'a', '2/1/99'),
(8 ,888 ,1 , 'b', '2/2/99'),
(9 ,888 ,1 , 'c', '2/3/99'),
(10 ,888 ,2 , 'a', '2/4/99'),
(11 ,888 ,2 , 'b', '2/5/99'),
(12 ,888 ,3 , 'a', '2/6/99'),
(13 ,888 ,4 , 'a', '2/7/99')
Select
[Index] = Row_Number() Over (Order By ID) - 1,
ID, [1], [2], [3], [4], [5], [6], [7]
From
(
Select
ID, Number = T.Number, [Date]
From
#T As T
Where
[Index] = (Select Top (1) HighestLetter.[Index]
From #T As HighestLetter
Where HighestLetter.ID = T.ID
And HighestLetter.Number = T.Number
Order By HighestLetter.Letter Desc)
) As Highest
Pivot
(
Max([Date])
For Number In ([1], [2], [3], [4], [5], [6], [7])
) As Piv
Sometimes the Snowflake SQL compiler tries to be too smart for its own good. This is a follow-up to a previous question here, where a clever solution was provided for my given use-case, but have run into some limitations for that solution.
A brief background; I have a JS-UDTF that takes 3 float arguments to return rows representing a series GENERATE_SERIES(FLOAT,FLOAT,FLOAT), and a SQL-UDTF GENERATE_SERIES(INT,INT,INT) that cast the params to floats, invokes the JS-UDTF, and then the result back to ints. My original version for this wrapper UDTF was:
CREATE OR REPLACE FUNCTION generate_series(FIRST_VALUE INTEGER, LAST_VALUE INTEGER, STEP_VALUE INTEGER)
RETURNS TABLE (GS_VALUE INTEGER)
AS
$$
SELECT GS_VALUE::INTEGER AS GS_VALUE FROM table(generate_series(FIRST_VALUE::DOUBLE,LAST_VALUE::DOUBLE,STEP_VALUE::DOUBLE))
$$;
Which would fail in most conditions where the input were not constants, e.g.:
WITH report_params AS (
SELECT
1::integer as first_value,
3::integer as last_value,
1::integer AS step_value
)
SELECT
*
FROM
report_params, table(
generate_series(
first_value,
last_value,
step_value
)
)
Would return error:
SQL compilation error: Unsupported subquery type cannot be evaluated
The provided solution to trick the SQL compiler to behave was to encapsulate the function params into a VALUES table and cross-join the inner UDTF:
CREATE OR REPLACE FUNCTION generate_series_int(FIRST_VALUE INTEGER, LAST_VALUE INTEGER, STEP_VALUE INTEGER)
RETURNS TABLE (GS_VALUE INTEGER)
AS
$$
SELECT GS_VALUE::INTEGER AS GS_VALUE
FROM (VALUES (first_value, last_value, step_value)),
table(generate_series(first_value::double,last_value::double,step_value::double))
$$;
This worked lovely for most invocations, however I've discovered a situation where the SQL compiler is at it again. Here is a simplified example that reproduces the problem:
WITH report_params AS (
SELECT
1::integer AS first_value,
DATEDIFF('DAY','2020-01-01'::date,'2020-02-01'::date)::integer AS last_value,
1::integer AS step_value
)
SELECT
*
FROM
report_params, table(
COMMON.FN.generate_series(
first_value,
last_value,
step_value
)
);
This results in the error:
SQL compilation error: Invalid expression [CORRELATION(SYS_VW.LAST_VALUE_3)] in VALUES clause
The error seems obvious enough (I think) that the compiler is trying to embed the function code into the outer queries treating the function like a macro before runtime.
The answer at this point might just be that I am asking too much out of Snowflake's current capabilities, but in the interest of learning and continuing to build out what I think is a very helpful UDF library, am curious if there is a solution I am missing.
The major problem is you have written a correlated sub query.
WITH report_params AS (
SELECT * FROM VALUES
(1, 30, 1)
v(first_value,last_value, step_value)
)
SELECT
*
FROM
report_params, table(
COMMON.FN.generate_series(
first_value,
last_value,
step_value
)
);
as when you add a second row to your CTE
WITH report_params AS (
SELECT * FROM VALUES
(1, 30, 1),
(2, 40, 2)
v(first_value,last_value, step_value)
)
SELECT
*
FROM
report_params, table(
COMMON.FN.generate_series(
first_value,
last_value,
step_value
)
);
it becomes more obvious this is correlated, which is not so obvious who snowflake should execute it.
which for the above data would ideal look like (if it was valid SQL)
WITH report_params AS (
SELECT *
,mod(v.first_value,v.step_value) as mod_offset
FROM VALUES
(0, 5, 20, 1),
(1, 3, 15, 3),
(2, 4, 15, 3),
(3, 5, 15, 3)
v(id, first_value,last_value, step_value)
), report_ranges AS (
SELECT min(first_value) as mmin,
max(last_value) as mmax
FROM report_params
WHERE first_value <= last_value AND step_value > 0
), all_range AS (
SELECT
row_number() over (order by seq8()) + rr.mmin - 1 as seq
FROM report_ranges rr,
TABLE(GENERATOR( ROWCOUNT => (rr.mmax - rr.mmin) + 1 ))
)
SELECT
ar.seq
,rp.id, rp.first_value, rp.last_value, rp.step_value, rp.mod_offset
FROM all_range as ar
JOIN report_params as rp ON ar.seq BETWEEN rp.first_value AND rp.last_value AND mod(ar.seq, rp.step_value) = rp.mod_offset
ORDER BY 2,1;
but if your generating it in a stored procedure (or externally) could be substituted into
WITH report_params AS (
SELECT *
,mod(v.first_value,v.step_value) as mod_offset
FROM VALUES
(0, 5, 20, 1),
(1, 3, 15, 3),
(2, 4, 15, 3),
(3, 5, 15, 3)
v(id, first_value,last_value, step_value)
), all_range AS (
SELECT
row_number() over (order by seq8()) + 3 /*min*/ - 1 as seq
FROM TABLE(GENERATOR( ROWCOUNT => (20/*max*/ - 3/*min*/) + 1 ))
)
SELECT
ar.seq
,rp.id
,rp.first_value, rp.last_value, rp.step_value, rp.mod_offset
FROM all_range as ar
JOIN report_params as rp ON ar.seq BETWEEN rp.first_value AND rp.last_value AND mod(ar.seq, rp.step_value) = rp.mod_offset
ORDER BY 2,1;
giving:
SEQ ID FIRST_VALUE LAST_VALUE STEP_VALUE MOD_OFFSET
5 0 5 20 1 0
6 0 5 20 1 0
7 0 5 20 1 0
8 0 5 20 1 0
9 0 5 20 1 0
10 0 5 20 1 0
11 0 5 20 1 0
12 0 5 20 1 0
13 0 5 20 1 0
14 0 5 20 1 0
15 0 5 20 1 0
16 0 5 20 1 0
17 0 5 20 1 0
18 0 5 20 1 0
19 0 5 20 1 0
20 0 5 20 1 0
3 1 3 15 3 0
6 1 3 15 3 0
9 1 3 15 3 0
12 1 3 15 3 0
15 1 3 15 3 0
4 2 4 15 3 1
7 2 4 15 3 1
10 2 4 15 3 1
13 2 4 15 3 1
5 3 5 15 3 2
8 3 5 15 3 2
11 3 5 15 3 2
14 3 5 15 3 2
The problem I cannot guess at, is it feels like you ether trying to hide some complexity behind the table functions JS functions, or have made thing over complex for an unstated reason.
[edit speaking to the 1-9 comment]
the major difference between a generate_series and GENERATOR is the former is almost a UDF or CTE and in snowflake you have to have the GENERATOR in it own sub-select or you will get messed up results.
with s1 as (
SELECT
row_number() over (order by seq8()) -1 as seq
FROM
TABLE(GENERATOR( ROWCOUNT => 3 ))
), s2 as (
SELECT
row_number() over (order by seq8()) -1 as seq
FROM
TABLE(GENERATOR( ROWCOUNT => 3 ))
)
select s1.seq as a, s2.seq as b
from s1, s2
order by 1,2;
gives 9 rows of the two data mixed, like you not you want.
where-as
with s1 as (
SELECT
row_number() over (order by seq8()) -1 as seq
FROM
TABLE(GENERATOR( ROWCOUNT => 3 ))
)
SELECT
row_number() over (order by seq8()) -1 as a
,s1.seq as b
FROM
TABLE(GENERATOR( ROWCOUNT => 3 )), s1;
give 1-9, because the GENERATOR (the creator of rows) has been crossed with the other data, before the sequence code has run..
Another version of the original solution provided, is
WITH report_params AS (
SELECT *
,trunc(div0((last_value-first_value),step_value)) as steps
FROM VALUES
(0, 5, 20, 1),
(1, 3, 15, 3),
(2, 4, 15, 3),
(3, 5, 15, 3)
v(id, first_value,last_value, step_value)
), large_range AS (
SELECT
row_number() over (order by seq8()) -1 as seq
FROM
TABLE(GENERATOR( ROWCOUNT => 1000 ))
)
select rp.id
,rp.first_value + (lr.seq*rp.step_value) as val
from report_params as rp
join large_range as lr on lr.seq <= rp.steps
order by 1,2;
which I like more as the nature of the mixing is more clear. But it still speaks to the mindset difference between snowflake and other RDB. In postgress there is no cost to doing per-row operations, because it was born of an era where it was all per-row operations, but snowflake has no per-row options, and because it cannot do things on each row, it can do many rows independently. It means all expressions of per-row, need to be moved to the front and then joined. Thus what the above is trying to show.
Which of the two alternatives is better?
ROW_NUMBER() OVER (PARTITION BY...)
or
COUNT(1) OVER (PARTITION BY ...)
I could not find any such question.
Edit:
DBMS: SQL-SERVER (version >= 2008)
In my case the over partition is guaranteed by a single field:
ROW_NUMBER() OVER (PARTITION BY ELEMENT ORDER BY EMPLOYEE)
COUNT(1) OVER (PARTITION BY ELEMENT ORDER BY EMPLOYEE)
ELEMENT EMPLOYEE ROW_NUMBER COUNT
0000001 00000003 1 1
0000001 00000004 2 2
0000001 00000005 3 3
0000003 00000045 1 1
0000003 00000046 2 2
COUNT(1) behaves different when the same group of values in the ORDER BY columns are repeated.
The following is an example of SQL Server:
IF OBJECT_ID('tempdb..#Example') IS NOT NULL
DROP TABLE #Example
CREATE TABLE #Example (
Number INT,
GroupNumber INT)
INSERT INTO #Example (
Number,
GroupNumber)
VALUES
(NULL, 1),
(100, 1),
(101, 1),
(102, 1),
(103, 1),
(NULL, 2),
(NULL, 2),
(NULL, 2),
(200, 2),
(201, 2),
(202, 2),
(300, 3),
(301, 3),
(301, 3),
(301, 3),
(302, 3)
SELECT
E.*,
RowNumber = ROW_NUMBER() OVER (PARTITION BY E.GroupNumber ORDER BY E.Number ASC),
CountOver = COUNT(1) OVER (PARTITION BY E.GroupNumber ORDER BY E.Number ASC)
FROM
#Example AS E
Result:
Number GroupNumber RowNumber CountOver
----------- ----------- -------------------- -----------
NULL 1 1 1
100 1 2 2
101 1 3 3
102 1 4 4
103 1 5 5
NULL 2 1 3 Here
NULL 2 2 3
NULL 2 3 3
200 2 4 4
201 2 5 5
202 2 6 6
300 3 1 1
301 3 2 4 Here
301 3 3 4
301 3 4 4
302 3 5 5
This is because it's a count and not a row number. You should use the one that's appropriate to your needs.
I have data that looks like ID and Col1, where the value 01 in Col1 denotes the start of a related group of rows lasting until the next 01.
Sample Data:
ID Col1
1 01
2 02
3 02
---------
4 01
5 02
6 03
7 03
----------
8 01
9 03
----------
10 01
I need to calculate GroupTotal, which provides a running total of '01' from Col1, and also GroupID, which is an increment ID that resets at every instance of '01' in Col 1. Row order must be preserved with ID.
Desired Results:
ID Col1 GroupTotal GroupID
1 01 1 1
2 02 1 2
3 02 1 3
----------------------------
4 01 2 1
5 02 2 2
6 03 2 3
7 03 2 4
----------------------------
8 01 3 1
9 03 3 2
----------------------------
10 01 4 1
I've been messing with OVER, PARTITION BY etc. and cannot crack either.
Thanks
I believe what the OP is saying is that the only data available is a table with the id and col1 data, and that the desired results is what is currently posted in the question.
If that is the case, you just need the following.
Sample Data Setup:
declare #grp_tbl table (id int, col1 int)
insert into #grp_tbl (id, col1)
values (1, 1),(2, 2),(3, 2),(4, 1),(5, 2),(6, 3),(7, 3),(8, 1),(9, 3),(10, 1)
Answer:
declare #max_id int = (select max(id) from #grp_tbl)
; with grp_cnt as
(
--getting the range of ids that are in each group
--and ranking them
select gt.id
, lead(gt.id - 1, 1, #max_id) over (order by gt.id asc) as id_max --max id in the group
, row_number() over (order by gt.id asc) as grp_ttl
from #grp_tbl as gt
where 1=1
and gt.col1 = 1
)
--ranking the range of ids inside each group
select gt.id
, gt.col1
, gc.grp_ttl as group_total
, row_number() over (partition by gc.grp_ttl order by gt.id asc) as group_id
from #grp_tbl as gt
left join grp_cnt as gc on gt.id between gc.id and gc.id_max
Final Results:
id col1 group_total group_id
1 1 1 1
2 2 1 2
3 2 1 3
4 1 2 1
5 2 2 2
6 3 2 3
7 3 2 4
8 1 3 1
9 3 3 2
10 1 4 1
If I understood correctly, this is what you want:
CREATE TABLE #tmp
([ID] int, [Col1] int, [GroupTotal] int, [GroupID] int)
;
INSERT INTO #tmp
([ID], [Col1], [GroupTotal], [GroupID])
VALUES
(1, 01, 1, 1),
(2, 02, 1, 2),
(3, 02, 1, 3),
(4, 01, 2, 1),
(5, 02, 2, 2),
(6, 03, 2, 3),
(7, 03, 2, 4),
(8, 01, 3, 1),
(9, 03, 3, 2),
(10, 01, 4, 1)
;
select *, row_number() over (partition by Grp order by ID) as GrpID From (
select ID, Col1, [GroupTotal],
sum(case when Col1 = '01' then 1 else 0 end) over (Order by ID) as Grp,
[GroupID]
from #tmp
The sum handles the groups with case, 1 is added always when Col1=01, and that's then used in the row_number to partition the groups.
Example
I'm not really sure what you are after but you are on the right tracks with partitioning functions. The following calculates a running total of groupid by grouptotal. I'm sure that's not what you want but it shows you how you can achieve it.
select *, SUM(GroupId) over (partition by grouptotal order by id)
from #tmp
order by grouptotal, id
I Use SQL Server 2012 and have a table like below:
DECLARE #T TABLE(Id INT, [Type] CHAR(1), Quantity INT, Price MONEY, UnitPrice AS (Price/Quantity))
INSERT INTO #T VALUES
(1, 'I', 30, 1500),
(2, 'O', 5, NULL),
(3, 'O', 20, NULL),
(4, 'O', 2, NULL),
(5, 'I', 10, 2500),
(6, 'I', 8, 1000),
(7, 'O', 3, NULL),
(8, 'O', 10, NULL),
(9, 'I', 12, 3600)
In my table I have a Type Column With Values ('I' and 'O') I have unit price for 'I' Type Record and 'O' Type Record used last 'I' Type Record Value I want to calculate RunningTotalPrice (Sum of Quantity*UnitPrice of each rows).
Following code calculate RunningTotalQuantity:
SELECT *,
SUM(CASE WHEN [Type] = 'I' Then Quantity ELSE -Quantity END)OVER (ORDER BY Id) AS QuantityRunningTotal
FROM #T
and Results of this query is:
Id Type Quantity Price UnitPrice QuantityRunningTotal
1 I 30 1500/00 50/00 30
2 O 5 NULL NULL 25
3 O 20 NULL NULL 5
4 O 2 NULL NULL 3
5 I 10 2500/00 250/00 13
6 I 8 1000/00 125/00 21
7 O 3 NULL NULL 18
8 O 10 NULL NULL 8
9 I 12 3600/00 300/00 20
I want to have following Result
Id Type Quantity Price UnitPrice QuantityRunningTotal Price RunningTotalPrice
1 I 30 1500/00 50/00 30 1500/00 1500/00
2 O 5 NULL 50/00 25 250/00 1250/00
3 O 20 NULL 50/00 5 1000/00 250/00
4 O 2 NULL 50/00 3 100/00 150/00
5 I 10 2500/00 250/00 13 2500/00 2650/00
6 I 8 1000/00 125/00 21 1000/00 3650/00
7 O 3 NULL 125/00 18 375/00 3275/00
8 O 10 NULL 125/00 8 1250/00 2025/00
9 I 12 3600/00 300/00 20 3600/00 5625/00
In this result Null Unitprice Column valued with last exists unitprice in before records.
and Calculate Price ( Quantity * UnitPrice) and The Calculate Running Total Of Price.
Unfortunately LEAD and LAG functions can't be used to the last not NULL value, so you would need to use OUTER APPLY to get the previous UnitPrice to use in rows where the type is 'O':
SELECT t.ID,
t.[Type],
t.Quantity,
t.Price,
t.UnitPrice,
SUM(CASE WHEN t.[Type] = 'I' THEN t.Quantity ELSE -t.Quantity END) OVER (ORDER BY t.Id) AS QuantityRunningTotal,
CASE WHEN t.[Type] = 'I' THEN t.Price ELSE t.Quantity * p.UnitPrice END AS Price2,
SUM(CASE WHEN t.[Type] = 'I' THEN t.Price ELSE -t.Quantity * p.UnitPrice END)OVER (ORDER BY t.Id) AS QuantityRunningTotal
FROM #T AS t
OUTER APPLY
( SELECT TOP 1 t2.UnitPrice
FROM #T AS t2
WHERE t2.ID < t.ID
AND t2.UnitPrice IS NOT NULL
ORDER BY t2.ID DESC
) AS p;