Snowflake : IN operator - snowflake-cloud-data-platform

Snowflake : IN operator - snowflake-cloud-data-platform

so I want something as below in my query
select * from table a
where a.id in(select id, max(date) from table a group by id)
I am getting error here , as IN is equivalent to = .
how to do it?
example :
id
date
1
2022-31-01
1
2022-21-03
2
2022-01-01
2
2022-02-01
I need to get only one record based on date(max). The table has more columns than just id and date
so I need to something like this in snowflake
select * from table a
where id in(select id,max(date) from table a group by id)
```-----------------------
All solutions are working , if i select from table .
but i have case statement in view where duplicate records are coming
example :
create or replace view v_test
as
select * from
(
select id,lastdatetime,*,
case when start_date < timestamp and timestamp < end
and move_date = '9999-12-31' then 'Y'
else 'N' end as IND
from table a
) a
so if any one select view where IND= 'Y', more than 1 records are coming
what i want is to select latest records for ID where IND='Y' and max(lastdatetime)
how to incorporate this logic in view?

I think you are trying to get the latest record for each id?
select *
from table a
qualify row_number() over (partition by id order by date desc) = 1

So if we look at your sub-select:
using this "data" for the examples:
with data (id, _date) as (
select column1, to_date(column2, 'yyyy-dd-mm') from values
(1, '2022-31-01'),
(1, '2022-21-03'),
(2, '2022-01-01'),
(2, '2022-02-01')
)
select id, max(_date)
from data
group by 1;
it gives:
ID
MAX(_DATE)
1
2022-03-21
2
2022-01-02
which makes it seem you want the "the last date, per id"
which can classically (ansi sql) be written:
select d.*
from data as d
join (
select
id,
max(_date) as max_date
from data
group by 1
) as c
on d.id = c.id and d._date = c.max_date
;
ID
_DATE
1
2022-03-21
2
2022-01-02
which gives you "all the rows values". BUT if you have many rows with the same last date, you will get those, in the output.
Another methods is to use a ROW_NUMBER to pick one and only one row, which is the style of answer Mike has given:
with data (id, _date, extra) as (
select column1, to_date(column2, 'yyyy-dd-mm'), column3 from values
(1, '2022-31-01', 'extra_a'),
(1, '2022-21-03', 'extra_b_double_a'),
(1, '2022-21-03', 'extra_b_double_b'),
(2, '2022-01-01', 'extra_c'),
(2, '2022-02-01', 'extra_d')
)
select *
from data
qualify row_number() over (partition by id order by _date desc) =1 ;
gives:
ID
_DATE
EXTRA
1
2022-03-21
extra_b_double_a
2
2022-01-02
extra_d
now if you want the "all rows of the last day" you method works, albeit the QUALIFY/ROW_NUMBER is faster. You can use RANK
with data (id, _date, extra) as (
select column1, to_date(column2, 'yyyy-dd-mm'), column3 from values
(1, '2022-31-01', 'extra_a'),
(1, '2022-21-03', 'extra_b_double_a'),
(1, '2022-21-03', 'extra_b_double_b'),
(2, '2022-01-01', 'extra_c'),
(2, '2022-02-01', 'extra_d')
)
select *
from data
qualify dense_rank() over (partition by id order by _date desc) =1 ;
ID
_DATE
EXTRA
1
2022-03-21
extra_b_double_a
1
2022-03-21
extra_b_double_b
2
2022-01-02
extra_d
Now the last thing that it almost seems you are asking for, is "how do find the ID with the most recent data (here 1) and get all rows for that"
with data (id, _date, extra) as (
select column1, to_date(column2, 'yyyy-dd-mm'), column3 from values
(1, '2022-31-01', 'extra_a'),
(1, '2022-21-03', 'extra_b_double_a'),
(1, '2022-21-03', 'extra_b_double_b'),
(2, '2022-01-01', 'extra_c'),
(2, '2022-02-01', 'extra_d')
)
select *
from data
qualify id = last_value(id) over (order by _date);

Here is an example of how to use the in operator with a subquery:
select * from table1 t1 where t1.id in (select t2.id from table2 t2);

Usage of IN is possible to match on both columns:
select *
from tab AS a
where (a.id, a.date) in (select id, max(date) from tab group by id);
For sample data:
CREATE TABLE tab (id, date)
AS
SELECT column1, to_date(column2, 'yyyy-dd-mm')
FROM VALUES
(1, '2022-31-01'),
(1, '2022-21-03'),
(2, '2022-01-01'),
(2, '2022-02-01');
Output:

Related

How to optimize the query so the manual work can be dyanmic and in optimized way

I have below table like: SQL fiddle
I am able to get this output via XML, but I am not sure how I can get below output properly for larger number of users (approx 0.2M users).
Later I want to get top-3 Names by their counts for each id ,so RANK or OrderBy clauses will come into SQL and not sure how many iteration will it take when data is of large number of users.
Working code that I have tried:
-----------SQL Raw Table Creation------------------------
CREATE TABLE tb
(
Id INT,
Name VARCHAR(50) NOT NULL
);
INSERT INTO tb (Id, Name) VALUES (1, 'aa');
INSERT INTO tb (Id, Name) VALUES (1, 'aa');
INSERT INTO tb (Id, Name) VALUES (1, 'aa');
INSERT INTO tb (Id, Name) VALUES (1, 'aa');
INSERT INTO tb (Id, Name) VALUES (1, 'aa');
INSERT INTO tb (Id, Name) VALUES (1, 'bb');
INSERT INTO tb (Id, Name) VALUES (1, 'cc');
INSERT INTO tb (Id, Name) VALUES (1, 'cc');
INSERT INTO tb (Id, Name) VALUES (1, 'dd');
INSERT INTO tb (Id, Name) VALUES (1, 'dd');
INSERT INTO tb (Id, Name) VALUES (1, 'dd');
INSERT INTO tb (Id, Name) VALUES (2, 'aa');
INSERT INTO tb (Id, Name) VALUES (2, 'bb');
INSERT INTO tb (Id, Name) VALUES (2, 'bb');
INSERT INTO tb (Id, Name) VALUES (2, 'ee');
INSERT INTO tb (Id, Name) VALUES (2, 'ee');
INSERT INTO tb (Id, Name) VALUES (2, 'ee');
INSERT INTO tb (Id, Name) VALUES (2, 'ee');
INSERT INTO tb (Id, Name) VALUES (3, 'aa');
INSERT INTO tb (Id, Name) VALUES (3, 'bb');
INSERT INTO tb (Id, Name) VALUES (3, 'cc');
INSERT INTO tb (Id, Name) VALUES (3, 'dd');
INSERT INTO tb (Id, Name) VALUES (3, 'dd');
INSERT INTO tb (Id, Name) VALUES (3, 'dd');
-----------------Want to RANK or get only top 3 rows for each Id when group by Name--------------
select f.* into #t1
from(
select f.*
from(
select f.*
from (
select top 3 id,name,count(name) as total
from tb
where id = 1
group by id,name
order by id,total desc
)f
Union
select top 3 id,name,count(name) as total
from tb
where id = 2
group by id,name
order by id,total desc
)f
Union
select top 3 id,name,count(name) as total
from tb
where id = 3
group by id,name
order by id,total desc
) f
/* Output is moved in temp table #t1 which looks like
id name total
1 aa 5
1 cc 2
1 dd 3
2 aa 1
2 bb 2
2 ee 4
3 bb 1
3 cc 1
3 dd 3
*/
---------Final Joining for each Top3Names and RespectiveTotal -----
select a.id as ID, a.listStr as Top3Names , b.Total as RespectiveTotal
from
(
SELECT id,STUFF((SELECT ',' + name
FROM #t1 EE
WHERE EE.id=E.id
FOR XML PATH('')), 1, 1, '') AS listStr
FROM #t1 E
GROUP BY E.id
)a
left Join
(
SELECT id,STUFF((SELECT ',' + cast(total as Varchar)
FROM #t1 EE
WHERE EE.id=E.id
FOR XML PATH('')), 1, 1, '') AS Total
FROM #t1 E
GROUP BY E.id
)b
on a.id=b.id
Output:
ID Top3Names RespectiveTotal
1 aa,cc,dd 5,2,3
2 aa,bb,ee 1,2,4
3 bb,cc,dd 1,1,3
Here I am using UNION for each ID, which is not correct way of doing. I want an optimized way. Also I am using a temp table to store my results. Is it a good way? Let me know for any correct solution or alternatives so that I can test it on larger.

on my SQL SERVER machine for given sample data, your query stats looks like this:
Total Logical Reads: 13
Total CPU Time: 00:00:00.007
if you are using SQL SERVER 2017+ you can use STRING_AGG function :
SELECT
id
, STRING_AGG(name,',') WITHIN GROUP (order by name asc) Top3Names
, STRING_AGG(countx,',') WITHIN GROUP (order by name asc) RespectiveTotal
FROM (
SELECT
id
, name
, count(*) countx
, ROW_NUMBER() over (partition by id order by count(*) desc) rownumber
FROM tb
GROUP BY name, id
) result1
WHERE
result1.rownumber < 4
GROUP BY id
Stats are like :
Total Logical Reads: 1
Total CPU Time: 00:00:00.000
for SQL SERVER 2016- :
select id
, STUFF((
SELECT ',' + t1.Name
FROM cte t1
WHERE t1.id = t2.id
and t1.rownumber < 4
ORDER BY t1.name
FOR XML PATH('')), 1, LEN(','), '') AS Top3Names
, STUFF((
SELECT ',' + cast(t1.countx as varchar(50))
FROM cte t1
WHERE t1.id = t2.id --and t1.name = t2.name
and t1.rownumber < 4
ORDER BY t1.name
FOR XML PATH('')), 1, LEN(','), '') AS RespectiveTotal
from cte t2
group by id
Stats are like :
Total Logical Reads: 7
Total CPU Time: 00:00:00.006
so regardless of sql server version, It will improve performance, you will get the best performance if you are using sql server 2017 or above using query above.

TSQL: Group by one column, count all rows and keep value on second column based on row_number

I have a query that returns an Id, a Name and the Row_Number() based on some rules.
The query looks like that
SELECT
tm.id AS Id,
pn.Name AS Name,
ROW_NUMBER() OVER(PARTITION BY tm.id ORDER BY tm.CreatedDate ASC) AS Row
FROM
#tempTable AS tm
LEFT JOIN
names pn WITH (NOLOCK) ON tm.nameId = pn.NameId
WHERE ....
The output of the above query looks like the table below with the dummy data
CREATE TABLE people
(
id int,
name varchar(55),
row int
);
INSERT INTO people
VALUES (1, 'John', 1), (1, 'John', 2), (2, 'Mary', 1),
(3, 'Jeff', 1), (4, 'Bill', 1), (4, 'Bill', 2),
(4, 'Bill', 3), (4, 'Billy', 4), (5, 'Bobby', 1),
(5, 'Bob', 2), (5, 'Bob' , 3), (5, 'Bob' , 4);
What I try to do, is group by the id field, count all rows, but for the name, use the one with row = 1
My attempt is like this, but, obviously, I get different rows since I include the x.name in the group by.
SELECT
x.id,
x.name,
COUNT(*) AS Value
FROM
(SELECT
tm.id AS Id,
pn.Name AS Name,
ROW_NUMBER() OVER(PARTITION BY tm.id ORDER BY tm.CreatedDate ASC) AS Row
FROM
#tempTable AS tm
LEFT JOIN
names pn WITH (NOLOCK) ON tm.nameId = pn.NameId
WHERE ....
) x
GROUP BY
x.id, x.name
ORDER BY
COUNT(*) DESC
The desired results from the dummy data are:
id name count
------------------
1 John 2
2 Mary 1
3 Jeff 1
4 Bill 4
5 Bobby 4

You can use FIRST_VALUE() window function to get the name of the row with row number = 1 and with the keyword DISTINCT there is no need to GROUP BY:
SELECT DISTINCT tm.id AS Id
, FIRST_VALUE(pn.Name) OVER (PARTITION BY tm.id ORDER BY tm.CreatedDate ASC) AS Name
, COUNT(*) OVER (PARTITION BY tm.id) AS counter
FROM #tempTable AS tm
LEFT JOIN names pn WITH (NOLOCK) ON tm.nameId = pn.NameId
WHERE ....
If you can't use FIRST_VALUE() then you can do it with conditional aggregation:
SELECT id,
MAX(CASE WHEN Row = 1 THEN Name END) AS NAME,
COUNT(*) AS Counter
FROM (
SELECT tm.id AS Id
, pn.Name AS Name
, ROW_NUMBER() OVER(PARTITION BY tm.id ORDER BY tm.CreatedDate ASC) AS Row
FROM #tempTable AS tm
LEFT JOIN names pn WITH (NOLOCK) ON tm.nameId = pn.NameId
WHERE ....
) t
GROUP BY id

This could be one solution to your problem: group on both id and the target name (case when p.row = 1 then p.name end) for the counting. Adding a with rollup to the grouping will "roll up" the count aggregations. Another aggregation on just id can then be use to merge the row values from the intermediate data set (visible in fiddle).
with cte as
(
select p.id,
case when p.row = 1 then p.name end as name,
count(1) as cnt
from people p
group by p.id, case when p.row = 1 then p.name end with rollup
having grouping(p.id) = 0
)
select cte.id,
max(cte.name) as name,
max(cte.cnt) as [count]
from cte
group by cte.id;
Fiddle
This would be another solution: do a regular count query with grouping on id and fetch the required name afterwards with a cross apply.
with cte as
(
select p.id,
count(1) as cnt
from people p
group by p.id
)
select cte.id,
n.name,
cte.cnt as [count]
from cte
cross apply ( select p.name
from people p
where p.id = cte.id
and p.row = 1 ) n;
Fiddle

Only show value of Max rows with partition by?

the title might be a bit off however i'm trying to remove the values of a row without removing the actual row.
This is my table:
SELECT ID,CustomerID,Weight FROM Orders
What am i trying to accomplish is this:
The MAX() value of ID Group By CustomerID that would give me null values in Weight where max and group by is not set
Is it possible to do this in one line? with a partiton by?
Something like:
SELECT MAX(ID) over (partition by CustomerID,Weight).... I know this is wrong but if possible to do without a join or CTE and only in one line in the select statement that would be great.

One possible approach is using ROW_NUMBER:
SELECT
ID,
CustomerID,
CASE
WHEN ROW_NUMBER() OVER (PARTITION BY CustomerId ORDER BY ID DESC) = 1 THEN [Weight]
ELSE Null
END AS [Weight]
FROM #Orders
ORDER BY ID
Input:
CREATE TABLE #Orders (
ID int,
CustomerID int,
[Weight] int
)
INSERT INTO #Orders
(ID, CustomerID, [Weight])
VALUES
(1, 11, 100),
(2, 11, 17),
(3, 11, 35),
(4, 22, 26),
(5, 22, 78),
(6, 22, 10030)
Output:
ID CustomerID Weight
1 11 NULL
2 11 NULL
3 11 35
4 22 NULL
5 22 NULL
6 22 10030

Try this
;WITH CTE
AS
(
SELECT
MAX_ID = MAX(ID) OVER(PARTITION BY CustomerId),
ID,
CustomerId,
Weight
FROM Orders
)
SELECT
ID,
CustomerId,
Weight = CASE WHEN ID = MAX_ID THEN Weight ELSE NULL END
FROM CTE

You can try this.
SELECT ID,CustomerId,CASE WHEN ID= MAX(ID) OVER(PARTITION BY CustomerId) THEN Weight ELSE NULL END AS Weight FROM Orders

Row based Condition in Left Join

Is there a way to write a row based condition in Left Join.
If some row not exists based on column condition, then it should take the next first row.
I have the structure below,
create table Report
(
id int,
name varchar(10)
)
create table ReportData
(
report_id int references report(id),
flag bit,
path varchar(50)
)
insert into Report values (1, 'a');
insert into Report values (2, 'b');
insert into Report values (3, 'c');
insert into ReportData values (1, 0, 'xx');
insert into ReportData values (2, 0, 'yy');
insert into ReportData values (2, 1, 'yy');
insert into ReportData values (3, 1, 'zz');
insert into ReportData values (3, 1, 'mm');
I need some output like
1 a 0 xx
2 b 0 yy
3 c 1 zz

You can use ROW_NUMBER for this:
;WITH ReportDate_Rn AS (
SELECT report_id, flag, path,
ROW_NUMBER() OVER (PARTITION BY report_id ORDER BY path) AS rn
FROM ReportDate
)
SELECT t1.id, t1.name, t2.flag, t2.path
FROM Report AS t1
JOIN ReportDate_Rn AS t2 ON t1.id = t2.report_id AND t2.rn = 1
The above query regards as first record of each report_id slice, the one having the alphabetically smallest path. You may amend the ORDER BY clause of the ROW_NUMBER() window function as you wish.

SELECT id,name,flag,path
FROM
(
SELECT Report.id,Report.name,ReportData.flag,ReportData.path,
row_number() over(partition by ReportData.report_id order by flag) as rownum
FROM Report
JOIN ReportData on Report.id = ReportData.report_id
) tmp
WHERE tmp.rownum=1

A simpler alternative to the left join, using rowid and rownum
SELECT id, name, flag, path
FROM report, reportdata
WHERE reportdata.rowid = (SELECT rowid
FROM reportdata
WHERE id = report_id
AND rownum = 1);

Without using row_numner() you can achieve this.
Have a look at this SQL Fiddle
select r.id, r.name, d.flag, d.path from report r
inner join reportdata d
on r.id = d.report_id group by d.report_id
PS: I wasn't believing the result - I was just building the query - haven't used d.report_id in the select clause and it worked. Will be updating this answer once I get the reason why this query worked :)

Use Partition BY:
declare #Report AS table
(
id int,
name varchar(10)
)
declare #ReportData AS table
(
report_id int ,
flag bit,
path varchar(50)
)
insert into #Report values (1, 'a');
insert into #Report values (2, 'b');
insert into #Report values (3, 'c');
insert into #ReportData values (1, 0, 'xx');
insert into #ReportData values (2, 0, 'yy');
insert into #ReportData values (2, 1, 'yy');
insert into #ReportData values (3, 1, 'zz');
insert into #ReportData values (3, 1, 'mm');
;WITH T AS
(
Select
R.id,
r.name,
RD.flag,
RD.path,
ROW_NUMBER () OVER(PARTITION BY R.id ORDER BY R.id) AS PartNo
FROM #Report R
LEFT JOIN #ReportData RD ON R.id=RD.report_id
)
SELECT
T.id,
T.name,
T.flag,
T.path
FROM T WHERE T.PartNo=1

SQL - return one row for every match

I am not entirely sure how to word what I am looking for which is making searching difficult. But what I am trying to do is return a single record for every distinct column match
Table Structure:
ItemHolderId Name
------------ --------------------------------------------------
1 Holder A
2 Holder B
ItemId Data ItemHolderId
----------- -------------------------------------------------- ------------
1 Item A 1
2 Item B 1
3 Item C 1
4 Item D 1
5 Item E 2
6 Item F 2
7 Item G 2
I am looking to select a single item for each item holder id. So it would only select Item A and Item E for example. Order doesn't matter just one record for each matched column. I hope I am explaining this in a sensible manner.
Thanks for your time.

One approach would be to use a CTE (Common Table Expression) if you're on SQL Server 2005 and newer (you aren't specific enough in that regard).
With this CTE, you can partition your data by some criteria - i.e. your ItemHolderId - and have SQL Server number all your rows starting at 1 for each of those partitions, ordered by some other criteria (you need some criteria - which one you use is up to you).
So try something like this:
;WITH PartitionedComponents AS
(
SELECT
ih.ItemHolderID, ih.Name, d.Data,
ROW_NUMBER() OVER(PARTITION BY ih.ItemHolderID ORDER BY d.Data DESC) AS 'RowNum'
FROM
dbo.ItemHolder ih
INNER JOIN
dbo.ItemHolderData d ON ih.ItemHolderID = d.ItemHolderID
WHERE
ComponentId IN (.....)
AND ConsoleTimeStamp <= (threshold)
)
SELECT
ItemHolderID, Name, Data
FROM
PartitionedComponents
WHERE
RowNum = 1
Here, I am selecting only the last two entries for each "partition" (i.e. for each ItemHolderId) - ordered in a descending fashion by the "Data" column.
Does that approach what you're looking for??

You can use row_number() to partition the data by the user, and then grab each record with a '1' as the [rank]. This way you can control the partition and sort of the data to control which record is given the value of '1', and thus returned...
/* Setup tables for query */
declare #tbl1 table (ItemHolderId int, Name varchar(32))
declare #tbl2 table (ItemId int, Data varchar(32), ItemHolderId int)
insert into #tbl1 values (1, 'Holder A'), (2, 'Holder B')
insert into #tbl2 values (1, 'Item A', 1), (2, 'Item B', 1), (3, 'Item C', 1), (4, 'Item D', 1)
insert into #tbl2 values (5, 'Item E', 2), (6, 'Item F', 2), (7, 'Item G', 2)
/* Select data */
select t2.*, row_number() over (partition by t1.ItemHolderId order by t2.ItemHolderId) as [rank]
into #temp
from #tbl1 t1 inner join #tbl2 t2 on t1.ItemHolderId = t2.ItemHolderId
select ItemId, Data, ItemHolderId from #temp where [rank] = 1
drop table #temp

I think you can use group by function for itemholderid column as well, it will help I guess.
select
rownum itemId,
tmp.name ,
tmp.itemholderid
from
(select
min(t.name) name,
t.itemholderid
from
table_name t
group by t.itemholderid
)tmp;