Using Group By And Over clause on Different columns - snowflake-cloud-data-platform

I have a table like this;
col 1|col 2|col 3
a | 2 | 10
b | -1 | 10
a | 10 | 10
The goal is to get output as;
col 1| col 2| col 3
a | 12 | 30
b | -1 |30
I tried the following query;
select col 1,
sum(col 2),
sum(col 3) OVER()
From table t1
group by col 1
But I got the error; col 3 is not a valid group by expression.
Kindly suggest an alternate solution.thanks in advance.

You may try taking the sum of SUM(col3) over the entire table:
SELECT
col1,
SUM(col2),
SUM(SUM(col3)) OVER()
FROM table t1
GROUP BY col1;
If you want to use SUM as a window function, then you need to sum something which is available after GROUP BY has evaluated. SUM(col3) is available, while col3 is not.

Related

Sum the result of a query

I have made a query with the below code which works fine. I then need to sum the diff column which will give me the total amount. It also needs to ignore the first row in the sum to reflect the true result.
select col,
col - coalesce(lag(col) over (order by id), 0) as diff
from t;
+.......+.......+
|COL |DIFF |
+.......+.......+
|1200 |0 |
|1200 |0 |
|1202 |2 |
|1204 |2 |
|1204 |0 |
|1208 |4 |
+.......+.......+
This is what the query result is, i need to have the result as the sum of the diff column which in this case would be 8.
As added by OP as comment, adding to the question.
Below is the query. How can I improve the performance.
Select SUM(result)
FROM (
select col1,
col1 - coalesce (lag(col1) over (order by col1, 0) as result
from table
where CAST(t_stamp AS TIME) BETWEEN '07:00' and '15:00'
and DATEPART(DAY, T_Stamp) = '20'
and DATEPART(MONTH, T_Stamp) = '05'
and DATEPART(YEAR, T_Stamp) = '2020'
) sub
You dont need this much complexity. You can simply use the RANGE with SUM aggregate function.
DECLARE #table Table (col1 int)
INSERT INTO #table values(1),(2),(3),(4),(5)
SELECT col1, SUM(col1) OVER(ORDER BY col1 RANGE BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS diff
FROm #table
ORDER BY col1
+------+------+
| col1 | diff |
+------+------+
| 1 | 15 |
| 2 | 14 |
| 3 | 12 |
| 4 | 9 |
| 5 | 5 |
+------+------+
** UPDATE: Based on OP edit, updating answer **
SELECT SUM(Diff) as diff_sum
FROM
(
SELECT col1,
ISNULL(col1 - LAG(col1) OVER(ORDER BY Col1),0) AS DIFF
FROm #table
) AS T
Result set
+----------+
| diff_sum |
+----------+
| 8 |
+----------+

Understanding SQL Merge statement?

I have a source table that has data identical to my target table. When I try to run a merge statement, it fails with the error
merge can't update a target row multiple times.
So My Question is since they are identical why SQL did succeed but with 0 rows affected instead. Please help me understand this.
By the way, My syntax is correct because in my initial insert it succeeded, the problem is if re-run it again.
Thank you.
target table and the source table has the same data.
WHEN MATCHED AND ISNULL(T.VALUE,'') <> ISNULL(S.VALUE,'')
COL1 COL2 COL3 VALUE DATE
1 A TYPE 3 2019-01-02
2 B KIND 4 2019-01-03
1 A COLOR 0 2019-01-02
2 B KIND 0 2019-01-03
MERGE TargetTable T
USING
(
SELECT COL1,
COL2,
COL3,
VALUE,
DATE
FROM SourceTable S
) s
ON
(
S.COL1 = T.COL1
AND S.COL2 = T.COL2
AND S.COL3 = T.COL3
AND S.DATE = T.DATE
)
WHEN MATCHED AND
(
ISNULL(S.VALUE,'') <> ISNULL(T.VALUE,'')
)
THEN UPDATE
SET
T.VALUE = S.VALUE
WHEN NOT MATCHED
THEN INSERT VALUES
(
S.COL1
,S.COL2
,S.COL3
,S.VALUE
,S.DATE
);
For better Unserstanding of Merge :
MERGE is a DML statement (data manipulation language).
Also called UPSERT (Update-Insert).
It tries to match source (table / view / query) to a target (table / updatable view) based on your defined conditions and then based on the matching results it insert/update/delete rows to/in/of the target table.
MERGE (Transact-SQL)
create table src (i int, j int);
create table trg (i int, j int);
insert into src values (1,1),(2,2),(3,3);
insert into trg values (2,20),(3,30),(4,40);
merge into trg
using src
on src.i = trg.i
when not matched by target then insert (i,j) values (src.i,src.j)
when not matched by source then update set trg.j = -1
when matched then update set trg.j = trg.j + src.j
;
select * from trg order by i
+---+----+
| i | j |
+---+----+
| 1 | 1 |
+---+----+
| 2 | 22 |
+---+----+
| 3 | 33 |
+---+----+
| 4 | -1 |
+---+----+
Source : Stackoverflow SQL Merge
I couldn't reproduce the error, but found something interesting
SQL DEMO
As you mention the first merge run perfect, but in my case the second merge says update 2 rows.
So I modify the 2nd merge to detect what rows were updated.
WHEN MATCHED AND
(
ISNULL(S.VALUE,'') <> ISNULL(T.VALUE,'')
)
THEN UPDATE
SET T.VALUE = S.VALUE + 10
OUTPUT
+------+------+-------+-------+---------------------+
| COL1 | COL2 | COL3 | VALUE | DATE |
+------+------+-------+-------+---------------------+
| 1 | A | TYPE | 3 | 02/01/2019 00:00:00 |
| 2 | B | KIND | 10 | 03/01/2019 00:00:00 |
| 1 | A | COLOR | 0 | 02/01/2019 00:00:00 |
| 2 | B | KIND | 14 | 03/01/2019 00:00:00 |
+------+------+-------+-------+---------------------+
Because you have 2 rows with the exact match (COL1, COL2, COL3, DATE) the system is telling you don't know which one update with which row.
But that doesn't explain why on my demo work as expected.
So my suggestion is you have to add a PK to your table to make sure the merge happen on the right rows.

How to select all PK's (column 1) where the MAX(ISNULL(value, 0)) in column 3 grouped by a value in column 2?

I couldn't find an answer on my question since all questions similar to this one aren't using a nullable int in the max value and getting 1 column out of it.
My table is as follows:
| ContractId | ContractNumber | ContractVersion |
+------------+----------------+-----------------+
| 1 | 11 | NULL |
| 2 | 11 | 1 |
| 3 | 11 | 2 |
| 4 | 11 | 3 | --get this one
| 5 | 24 | NULL |
| 6 | 24 | 1 | --get this one
| 7 | 75 | NULL | --get this one
The first version is NULL and all following versions get a number starting with 1.
So now I only want to get the rows of the latest contracts (as shown in the comments behind the rows).
So for each ContractNumber I want to select the ContractId from the latest ContractVersion.
The MAX() function wont work since it's a nullable int.
So I was thinking to use the ISNULL(ContractVersion, 0) in combination with the MAX() function, but I wouldn't know how.
I tried the following code:
SELECT
ContractNumber,
MAX(ISNULL(ContractVersion, 0))
FROM
Contracts
GROUP BY
ContractNumber
...which returned all of the latest version numbers combined with the ContractNumber, but I need the ContractId. When I add ContractId in the SELECT and the GROUP BY, I'm getting all the versions again.
The result should be:
| ContractId |
+------------+
| 4 |
| 6 |
| 7 |
It's just a simple application of ROW_NUMBER() when you're wanting to select rows based on Min/Max:
declare #t table (ContractId int, ContractNumber int, ContractVersion int)
insert into #t(ContractId,ContractNumber,ContractVersion) values
(1,11,NULL ),
(2,11, 1 ),
(3,11, 2 ),
(4,11, 3 ),
(5,24,NULL ),
(6,24, 1 ),
(7,75,NULL )
;With Numbered as (
select *,ROW_NUMBER() OVER (
PARTITION BY ContractNumber
order by ContractVersion desc) rn
from #t
)
select
*
from
Numbered
where rn = 1
this will work:
select ContractId,max(rank),ContractNumber from(select *,rank() over(partition by
ContractVersion order by nvl(ContractVersion,0)) desc ) rank from tablename) group by
ContractId,max(rank),ContractNumber;

Window function to count occurrences in last 10 minutes

I can use a traditional subquery approach to count the occurrences in the last ten minutes. For example, this:
drop table if exists [dbo].[readings]
go
create table [dbo].[readings](
[server] [int] NOT NULL,
[sampled] [datetime] NOT NULL
)
go
insert into readings
values
(1,'20170101 08:00'),
(1,'20170101 08:02'),
(1,'20170101 08:05'),
(1,'20170101 08:30'),
(1,'20170101 08:31'),
(1,'20170101 08:37'),
(1,'20170101 08:40'),
(1,'20170101 08:41'),
(1,'20170101 09:07'),
(1,'20170101 09:08'),
(1,'20170101 09:09'),
(1,'20170101 09:11')
go
-- Count in the last 10 minutes - example periods 08:31 to 08:40, 09:12 to 09:21
select server,sampled,(select count(*) from readings r2 where r2.server=r1.server and r2.sampled <= r1.sampled and r2.sampled > dateadd(minute,-10,r1.sampled)) as countinlast10minutes
from readings r1
order by server,sampled
go
How can I use a window function to obtain the same result ? I've tried this:
select server,sampled,
count(case when sampled <= r1.sampled and sampled > dateadd(minute,-10,r1.sampled) then 1 else null end) over (partition by server order by sampled rows between unbounded preceding and current row) as countinlast10minutes
-- count(case when currentrow.sampled <= r1.sampled and currentrow.sampled > dateadd(minute,-10,r1.sampled) then 1 else null end) over (partition by server order by sampled rows between unbounded preceding and current row) as countinlast10minutes
from readings r1
order by server,sampled
But the result is just the running count. Any system variable that refers to the current row pointer ? currentrow.sampled ?
This isn't a very pleasing answer but one possibility is to first create a helper table with all the minutes
CREATE TABLE #DateTimes(datetime datetime primary key);
WITH E1(N) AS
(
SELECT 1 FROM (VALUES(1),(1),(1),(1),(1),
(1),(1),(1),(1),(1)) V(N)
) -- 1*10^1 or 10 rows
, E2(N) AS (SELECT 1 FROM E1 a, E1 b) -- 1*10^2 or 100 rows
, E4(N) AS (SELECT 1 FROM E2 a, E2 b) -- 1*10^4 or 10,000 rows
, E8(N) AS (SELECT 1 FROM E4 a, E4 b) -- 1*10^8 or 100,000,000 rows
,R(StartRange, EndRange)
AS (SELECT MIN(sampled),
MAX(sampled)
FROM readings)
,N(N)
AS (SELECT ROW_NUMBER()
OVER (
ORDER BY (SELECT NULL)) AS N
FROM E8)
INSERT INTO #DateTimes
SELECT TOP (SELECT 1 + DATEDIFF(MINUTE, StartRange, EndRange) FROM R) DATEADD(MINUTE, N.N - 1, StartRange)
FROM N,
R;
And then with that in place you could use ROWS BETWEEN 9 PRECEDING AND CURRENT ROW
WITH T1 AS
( SELECT Server,
MIN(sampled) AS StartRange,
MAX(sampled) AS EndRange
FROM readings
GROUP BY Server )
SELECT Server,
sampled,
Cnt
FROM T1
CROSS APPLY
( SELECT r.sampled,
COUNT(r.sampled) OVER (ORDER BY N.datetime ROWS BETWEEN 9 PRECEDING AND CURRENT ROW) AS Cnt
FROM #DateTimes N
LEFT JOIN readings r
ON r.sampled = N.datetime
AND r.server = T1.server
WHERE N.datetime BETWEEN StartRange AND EndRange ) CA
WHERE CA.sampled IS NOT NULL
ORDER BY sampled
The above assumes that there is at most one sample per minute and that all the times are exact minutes. If this isn't true it would need another table expression pre-aggregating by datetimes rounded to the minute.
As far as I know, there is not a simple exact replacement for your subquery using window functions.
Window functions operate on a set of rows and allow you to work with them based on partitions and order.
What you are trying to do isn't the type of partitioning that we can work with in window functions.
To generate the partitions we would need to be able to use window functions in this instance would just result in overly complicated code.
I would suggest cross apply() as an alternative to your subquery.
I am not sure if you meant to restrict your results to within 9 minutes, but with sampled > dateadd(...) that is what is happening in your original subquery.
Here is what a window function could look like based on partitioning your samples into 10 minute windows, along with a cross apply() version.
select
r.server
, r.sampled
, CrossApply = x.CountRecent
, OriginalSubquery = (
select count(*)
from readings s
where s.server=r.server
and s.sampled <= r.sampled
/* doesn't include 10 minutes ago */
and s.sampled > dateadd(minute,-10,r.sampled)
)
, Slices = count(*) over(
/* partition by server, 10 minute slices, not the same thing*/
partition by server, dateadd(minute,datediff(minute,0,sampled)/10*10,0)
order by sampled
)
from readings r
cross apply (
select CountRecent=count(*)
from readings i
where i.server=r.server
/* changed to >= */
and i.sampled >= dateadd(minute,-10,r.sampled)
and i.sampled <= r.sampled
) as x
order by server,sampled
results: http://rextester.com/BMMF46402
+--------+---------------------+------------+------------------+--------+
| server | sampled | CrossApply | OriginalSubquery | Slices |
+--------+---------------------+------------+------------------+--------+
| 1 | 01.01.2017 08:00:00 | 1 | 1 | 1 |
| 1 | 01.01.2017 08:02:00 | 2 | 2 | 2 |
| 1 | 01.01.2017 08:05:00 | 3 | 3 | 3 |
| 1 | 01.01.2017 08:30:00 | 1 | 1 | 1 |
| 1 | 01.01.2017 08:31:00 | 2 | 2 | 2 |
| 1 | 01.01.2017 08:37:00 | 3 | 3 | 3 |
| 1 | 01.01.2017 08:40:00 | 4 | 3 | 1 |
| 1 | 01.01.2017 08:41:00 | 4 | 3 | 2 |
| 1 | 01.01.2017 09:07:00 | 1 | 1 | 1 |
| 1 | 01.01.2017 09:08:00 | 2 | 2 | 2 |
| 1 | 01.01.2017 09:09:00 | 3 | 3 | 3 |
| 1 | 01.01.2017 09:11:00 | 4 | 4 | 1 |
+--------+---------------------+------------+------------------+--------+
Thanks, Martin and SqlZim, for your answers. I'm going to raise a Connect enhancement request for something like %%currentrow that can be used in window aggregates. I'm thinking this would lead to much more simple and natural sql:
select count(case when sampled <= %%currentrow.sampled and sampled > dateadd(minute,-10,%%currentrow.sampled) then 1 else null end) over (...whatever the window is...)
We can already use expressions like this:
select count(case when sampled <= getdate() and sampled > dateadd(minute,-10,getdate()) then 1 else null end) over (...whatever the window is...)
so thinking would be great if we could reference a column that's in the current row.

How to do recursive select in PostgreSQL with array as an argument

I am trying to implement easy recursive function in PostgreSQL but I cannot finish it...
I have got table MyTable which includes columns Col1 and Col2. Data inside is like this:
Col1 | Col2
1 | 2
2 | 5
2 | 6
3 | 7
4 | 5
4 | 2
5 | 3
I would like to write a function which takes as a parameter array of Col1 f.e. (1,2) and gives me back values from Col2 like this :
1 | 2
2 | 5
2 | 6
and after that, does it again with results : (2, 5, 6)
so:
1 | 2
2 | 5
2 | 6
5 | 3
(2 is already in, and key '6' does not exist)
and again (3):
1 | 2
2 | 5
2 | 6
5 | 3
3 | 7
and for (7) nothing because value '7' does not exist in Col1.
It is an easy recursion but I have no idea how to implement it. I have got so far something like this:
with recursive aaa(params) as (
select Col1, Col2
from MyTable
where Col1 = params -- I need an array here
union all
select Col1, Col2
from aaa
)
select * from aaa;
But it of course does not work
Thanks in advance
The basic pattern for recursion is to have your base case as the first part of the union and in the second part join the recursion result to what you need to produce the next level of results. In your case it would look like this:
WITH RECURSIVE aaa(col1, col2) AS (
SELECT col1, col2 FROM mytable
WHERE col1 = ANY (ARRAY[1,2]) -- initial case based on an array
UNION -- regular union because we only want new values
SELECT child.col1, child.col2
FROM aaa, mytable AS child -- join the source table to the result
WHERE aaa.col2 = child.col1 -- the recursion condition
)
SELECT * FROM aaa;

Resources