Need help in using Analytical function - snowflake-cloud-data-platform

Need help in using Analytical function - snowflake-cloud-data-platform

sELECT id, pid,
case WHEN listagg(DISTINCT apn_nbr, ';') within GROUP(ORDER BY apn_nbr)= '' THEN 'null'
ELSE listagg(distinct apn_nbr,',') within group(order by apn_nbr)
END as apn_nbr
FROM (SELECT max(f1.pid) as pid,f1.id,apn_nbr,date
FROM table_1 f1
JOIN table_2 d1
ON f1.process_id = d1.process_id
WHERE apn_nbr is not null
and id=1234576
// AND pid='5812900'
GROUP BY id,apn_nbr)
group by id,pid
When I run the above query, I'm getting results like what is it mentioned below:
ID PID APN_NBR
220247111 64306012133 228887143,130050106,220247111,220247143
220247111 57558164496 105450046,105450314,136010476,136150077,184060007,186930609
For one ID, I'm getting different values for PID and APN_NBR column. I need the last NOT NULL records to be displayed in the result.
When I try to use
QUALIFY rank() over (partition by ID, pid order by datedesc) = 1, I'm not getting the listagg values as comma separated. I'm getting only one record(i.e. the first record for APN_NBR column)
Can anyone guide me on this logic?
Thanks in advance :)
Sample Records:
ID PID APN_NBR
30247521 5533433057558 130050044,130050106,195050142,960109430,960228707,960542787,960542788
30247521 5533433059643 105450046,105450314,136010476,136150077,184060007,186930609,196051036,960113678,970010915
34690213 1594308114486 960513957,970020828
34690213 5943081144866
I want to display only one row for each orders. i.e. I want to display the second row for each orders.

Please see if this helps -
Data-set used -
select column1,column2,column3 from
values
(30247521,5533433057558,'130050044,130050106,195050142,960109430,960228707,960542787,960542788'),
(30247521,5533433059643,'105450046,105450314,136010476,136150077,184060007,186930609,196051036,960113678,97
0010915'),
(34690213,1594308114486,'960513957,970020828'),
(34690213,5943081144866,NULL);
+----------+---------------+-------------------------------------------------------------------------------------------+
| COLUMN1 | COLUMN2 | COLUMN3 |
|----------+---------------+-------------------------------------------------------------------------------------------|
| 30247521 | 5533433057558 | 130050044,130050106,195050142,960109430,960228707,960542787,960542788 |
| 30247521 | 5533433059643 | 105450046,105450314,136010476,136150077,184060007,186930609,196051036,960113678,970010915 |
| 34690213 | 1594308114486 | 960513957,970020828 |
| 34690213 | 5943081144866 | NULL |
+----------+---------------+-------------------------------------------------------------------------------------------+
Query to get result -
select column1,column2,nvl2(column3, column3,lag(column3) over (order by column1))
as column3 from
values
(30247521,5533433057558,'130050044,130050106,195050142,960109430,960228707,960542787,960542788'),
(30247521,5533433059643,'105450046,105450314,136010476,136150077,184060007,186930609,196051036,960113678,97
0010915'),
(34690213,1594308114486,'960513957,970020828'),
(34690213,5943081144866,NULL) ;
+----------+---------------+-------------------------------------------------------------------------------------------+
| COLUMN1 | COLUMN2 | COLUMN3 |
|----------+---------------+-------------------------------------------------------------------------------------------|
| 30247521 | 5533433057558 | 130050044,130050106,195050142,960109430,960228707,960542787,960542788 |
| 30247521 | 5533433059643 | 105450046,105450314,136010476,136150077,184060007,186930609,196051036,960113678,970010915 |
| 34690213 | 1594308114486 | 960513957,970020828 |
| 34690213 | 5943081144866 | 960513957,970020828 |
+----------+---------------+-------------------------------------------------------------------------------------------+

So if you have the data in your last table.
select ID, PID, APN_NBR
from values
(30247521, 5533433057558, '130050044,130050106,195050142,960109430,960228707,960542787,960542788'),
(30247521, 5533433059643, '105450046,105450314,136010476,136150077,184060007,186930609,196051036,960113678,970010915'),
(34690213, 1594308114486, '960513957,970020828'),
(34690213, 5943081144866, NULL)
t(ID, PID, APN_NBR);
ID
PID
APN_NBR
30247521
5533433057558
130050044,130050106,195050142,960109430,960228707,960542787,960542788
30247521
5533433059643
105450046,105450314,136010476,136150077,184060007,186930609,196051036,960113678,970010915
34690213
1594308114486
960513957,970020828
34690213
5943081144866
null
and you want all the rows with APN_NBR that are NULL removed, then eliminate them with a WHERE clause:
select ID, PID, APN_NBR
from values
(30247521, 5533433057558, '130050044,130050106,195050142,960109430,960228707,960542787,960542788'),
(30247521, 5533433059643, '105450046,105450314,136010476,136150077,184060007,186930609,196051036,960113678,970010915'),
(34690213, 1594308114486, '960513957,970020828'),
(34690213, 5943081144866, NULL)
t(ID, PID, APN_NBR)
WHERE APN_NBR IS NOT NULL;
gives:
ID
PID
APN_NBR
30247521
5533433057558
130050044,130050106,195050142,960109430,960228707,960542787,960542788
30247521
5533433059643
105450046,105450314,136010476,136150077,184060007,186930609,196051036,960113678,970010915
34690213
1594308114486
960513957,970020828
Now these result, can be pruned to the only one result per ID with preferred order being given to the larger PID via a QUALIFY which runs after the WHERE clause has run. You should use ROW_NUMBER here instead of RANK, although I said you can use RANK, because RANK can have 2 firsts (which maybe you do want) if the rows are equal. But on the other hand ROW_NUMBER will silent choose 1 row, which might be different from execution to execution.
select ID, PID, APN_NBR
from values
(30247521, 5533433057558, '130050044,130050106,195050142,960109430,960228707,960542787,960542788'),
(30247521, 5533433059643, '105450046,105450314,136010476,136150077,184060007,186930609,196051036,960113678,970010915'),
(34690213, 1594308114486, '960513957,970020828'),
(34690213, 5943081144866, NULL)
t(ID, PID, APN_NBR)
WHERE APN_NBR IS NOT NULL
QUALIFY row_number() over (partition by ID order by PID desc) = 1;
ID
PID
APN_NBR
34690213
1594308114486
960513957,970020828
30247521
5533433059643
105450046,105450314,136010476,136150077,184060007,186930609,196051036,960113678,970010915
Now these might be the exactly results you want, but this is how to use WHERE and QUALIFY/ROW_NUMBER to filter and order and restrict the results shown. And if you experiment with small toy datasets, like the above provided, and internalize how this functions work. You should be able to apply them to the data you do have, and the transformations you want to apply.
Given you have a grouping in the outer SELECT you can using HAVING to apply a post GROUPING filter, like so:
sELECT id, pid,
case WHEN listagg(DISTINCT apn_nbr, ';') within GROUP(ORDER BY apn_nbr)= '' THEN 'null'
ELSE listagg(distinct apn_nbr,',') within group(order by apn_nbr)
END as apn_nbr
FROM (SELECT max(f1.pid) as pid,f1.id,apn_nbr,date
FROM table_1 f1
JOIN table_2 d1
ON f1.process_id = d1.process_id
WHERE apn_nbr is not null
and id=1234576
// AND pid='5812900'
GROUP BY id,apn_nbr)
group by id,pid
HAVING APN_NBR IS NOT NULL
QUALIFY row_number() over (partition by ID order by PID desc) = 1;
OR you could just add another layer of selects to apply the pattern shown:
SELECT id,pid,apn_nbr FROM(sELECT id, pid,
case WHEN listagg(DISTINCT apn_nbr, ';')
within GROUP(ORDER BY apn_nbr)= '' THEN 'null'
ELSE listagg(distinct apn_nbr,',') within
group(order by apn_nbr) END as apn_nbr
FROM(SELECT max(f1.pid) as pid,f1.id,apn_nbr,
date FROM table_1 f1 JOIN table_2 d1 ON
f1.process_id = d1.process_id WHERE
apn_nbr is not null and id=1234576
// AND pid='5812900'
GROUP BY id,apn_nbr) group by id,pid)
WHERE APN_NBR IS NOT NULL QUALIFY
row_number() over (partition by ID order
by PID desc) = 1;

Related

Need the correct logic when using LISTAGG

select id,LISTAGG(DISTINCT APN_OL,',') AS APN_NEW
from
(SELECT id, pid,
listagg(DISTINCT apn_nbr, ';') within GROUP(ORDER BY apn_nbr)as APN_OL
FROM
(SELECT max(pid) as pid,
f1.id,
apn_nbr,
date
FROM table1 f1
JOIN table2 f2
ON f1.process_id = f2.process_id
WHERE apn_nbr is not null AND id=1227521
GROUP BY id,apn_nbr,date)
GROUP BY id,pid;
When I try the below query, I'm getting records as mentioned below:
ID PID APN_NBR
1227521 964306012133700 238885004,130050106,195050142,960109430
1227521 816449643060121 105450046,105450314,136010476,136150077
I want to display all the records for APN_NBR in a single row(ie.Records from Row 1 and Row 2 to be displayed in a single row. So I tried the below logic:
select id,LISTAGG(DISTINCT APN_OL,',') AS APN_NEW
from
(SELECT id, pid,
listagg(DISTINCT apn_nbr, ';') within GROUP(ORDER BY apn_nbr)as APN_OL
FROM
(SELECT max(pid) as pid,
f1.id,
apn_nbr,
date
FROM table1 f1
JOIN table2 f2
ON f1.process_id = f2.process_id
WHERE apn_nbr is not null AND id=1227521
GROUP BY id,apn_nbr,date)
GROUP BY id,pid)
GROUP BY id;
When I use the above query, I'm getting values for APN_NBR in a single row.
However I need to add pid in the SELECT statement in order to perform join operation with
another table. I need to join based on ID and PID columns

So you could move the second LISTAGG into the first (unless you want semi-comma seperated and comma seperated values, AND then make an array of the PID's and join to all the match rows like so:
select
a.id,
a.APN_OL as apn_new
b.<stuff from pids stuff>
from
(
SELECT
id,
array_agg(distinct pid) as pids,
listagg(DISTINCT apn_nbr, ';') within GROUP(ORDER BY apn_nbr)as APN_OL
FROM (
SELECT
max(pid) as pid,
f1.id,
apn_nbr
FROM table1 f1
JOIN table2 f2
ON f1.process_id = f2.process_id
WHERE apn_nbr is not null AND id = 1227521
GROUP BY id, apn_nbr
)
GROUP BY id
) as A
JOIN table_with_pids_details as B
on ARRAY_CONTAINS(b.pid::variant, a.pids);
OR
you if the PID all have the same values, but you just need one, then ANY_VALUE() can be helpful.
select
a.id,
a.APN_OL as apn_new
b.<stuff from pids stuff>
from
(
SELECT
id,
ANY_VALUE(distinct pid) as random_pid,
listagg(DISTINCT apn_nbr, ';') within GROUP(ORDER BY apn_nbr)as APN_OL
FROM (
SELECT
max(pid) as pid,
f1.id,
apn_nbr
FROM table1 f1
JOIN table2 f2
ON f1.process_id = f2.process_id
WHERE apn_nbr is not null AND id = 1227521
GROUP BY id, apn_nbr
)
GROUP BY id
) as A
JOIN table_with_pids_details as B
on b.pid = a.random_pid;

Need guidance in using LISTAGG

SELECT id,pid,apn_nbr
FROM(
sELECT id, pid,case WHEN listagg(DISTINCT apn_nbr, ';') within GROUP(ORDER BY apn_nbr)= '' THEN 'null'
ELSE listagg(distinct apn_nbr,',') within group(order by apn_nbr) END as apn_nbr
FROM
(SELECT max(f1.pid) as pid,
f1.id,
apn_nbr,
date
FROM table_1 f1
JOIN table_2 d1 ON
f1.process_id = d1.process_id
WHERE apn_nbr is not null and id=1234576
GROUP BY id,apn_nbr,date,)
group by id,pid)
WHERE APN_NBR IS NOT NULL
QUALIFY row_number() over (partition by ID order by PID desc) = 1;
The result I'm getting when I run the above query is:
ID PID APN_NBR
228887143 91616341263 108051468,145010014,147010037,960049392,960057955,960098393,960098621,960169763,960183667,960247935,960290544,960290545,960326343,960545263,970002302
228887146 52655416407 108010224,184070159,960010235,960018534,960070069,960082736,960086586,960111804,960169763,960450519,960537135,960537137,970020211,970033955
228887148 50304710850 111011119,136010478,137750338,184700156,188320007,960032041,960072024,960264356,960300892,960457665,970003002,970004388
228887150 72523300271 182050695,960529661,960538276,970110690
228887187 272662636613 108010505,148050070
pid_ind and qid_ind columns are coming from table_2
I need to use the below conditions in the SELECT statement.
iff( pid_ind = 'TRUE', concat(apn_nbr, ':', 'high'), NULL)
AS pid_ind,
iff( qid_ind = 'TRUE', concat(apn_nbr, ':', 'low'), NULL)
AS qid_ind
When I add these conditions in the SELECT statement and adding the column names in GROUP BY,
I'm not getting values separated by commas.
Can anyone guide me on this logic?
The final result should look like this:
ID PID APN_NBR pid_ind qid_ind
228887143 91616341263 108051468,145010014,147010037,960049392 108051468,145010014,147010037,960049392:increased NULL
228887146 52655416407 108010224,184070159,960010235,960018534 108010224,184070159,960010235,960018534:increased NULL
228887148 50304710850 111011119,136010478,137750338,184700156 111011119,136010478,137750338,184700156:increased NULL
228887150 72523300271 182050695,960529661,960538276,970110690 NULL 182050695,960529661,960538276,970110690:decreased
228887187 272662636613 108010505,148050070 NULL 108010505,148050070:decreased

Well if you want it doing it after the listagg makes sense, but then the two flags will need to be progated out from table1/2 on the inside, also I assume/hope that are all the same.
SELECT
id,
pid,
apn_nbr,
iff(pid_ind = 'TRUE', apn_nbr||':high', NULL) AS pid_ind,
iff(qid_ind = 'TRUE', apn_nbr||':low', NULL) AS qid_ind
FROM (
SELECT
id,
pid,
pid_ind,
qid_ind,
case
WHEN listagg(DISTINCT apn_nbr, ';') within GROUP(ORDER BY apn_nbr)= '' THEN 'null'
ELSE listagg(distinct apn_nbr,',') within group(order by apn_nbr)
END as apn_nbr
FROM (
SELECT
max(f1.pid) as pid,
f1.id,
apn_nbr,
date,
pid_ind,
qid_ind
FROM table_1 f1
JOIN table_2 d1
ON f1.process_id = d1.process_id
WHERE apn_nbr is not null and id=1234576
GROUP BY id, apn_nbr, date, pid_ind, qid_ind
)
group by id,pid,pid_ind, qid_ind
)
WHERE APN_NBR IS NOT NULL
QUALIFY row_number() over (partition by ID order by PID desc) = 1;
but if pid_ind and qid_ind are not 100% all the same for each id,date this will fragment you data into "more" groupings, of which all but the last will disappear via that ROW_NUMBER. Which probably want to be ORDER BY date, given it's the grouping clause you are sorting by and then dropping. Which actually means you could filter the data a head of time to only keep that latest date stuff all the way back there...

Get newest record per group from subquery

I would like to get the latest record based on date for each email from my query.
This query produces multiple records for each email. Let's call this output, table C.
My question is: How to filter from the alias table C only the most recent record.
+-------------------+-----+------------+
| email | id | date |
+-------------------+-----+------------+
| hello#example.com | 123 | 2020-06-21 |
+-------------------+-----+------------+
| hello#example.com | 123 | 2020-06-15 |
+-------------------+-----+------------+
Desired result is:
+-------------------+-----+------------+
| email | id | date |
+-------------------+-----+------------+
| hello#example.com | 123 | 2020-06-21 |
+-------------------+-----+------------+
My starting query (that produces multiple email records) is the following:
SELECT DISTINCT
Email,
ID,
Date
FROM [TABLE_A] AS a
LEFT JOIN (
select *
from [TABLE_B]
where ID = '123'
) AS b
ON a.Email = b.Key
My attempt:
SELECT c.Email, c.ID, c.Date
FROM (
SELECT DISTINCT
Email,
ID,
Date
FROM [TABLE_A] AS a
LEFT JOIN (
select *
from [TABLE_B]
where ID = '123'
) AS b ON a.Email = b.Key
) AS c
INNER JOIN (
SELECT Email, max(Date) as MaxDate
FROM c
GROUP BY Email
) tm on c.Email = tm.Email and c.Date = tm.Date
Looks like SQL cannot 'see' table C as I am getting an error:
invalid object name

You can use WITH TIES in concert with row_number()
Example
Select Top 1 with ties *
From YourTable
Order By Row_Number() over (Partition By Id Order By [Date] Desc)

SQL Server group by but select 'top' date

I have a table in SQL server like so (Note the ID field is not unique):
-----------------------------------
| ID | IsAdamBrown | DateComplete |
| 1 | TRUE | 2017-01-01 |
| 1 | TRUE | 2017-01-03 |
-----------------------------------
I'd like to select one row for all the unique IDs in the table and the most recent 'DateComplete' for that ID.
My desired output in this case would be:
-----------------------------------
| ID | IsAdamBrown | DateComplete |
| 1 | TRUE | 2017-01-03 |
-----------------------------------
I've tried:
SELECT DISTINCT DateComplete, ID, IsAdamBrown
FROM thisTable
WHERE IsAdamBrown IS NOT NULL
GROUP BY DateComplete, ID, IsAdamBrown
ORDER BY DateComplete DESC
Unfortunately I still get the two date rows back. In MySQL I would group by just the first two rows and the ORDER BY would make sure the DateComplete was the most recent. SQL servers requirement that the SELECT fields match the GROUP BY makes this impossible.
How can I get a single row back for each ID with the most recent DateComplete?

SELECT id,
isadambrown,
Max(datecomplete) AS DateComplete
FROM thistable
GROUP BY id,
isadambrown
ORDER BY Max(datecomplete) DESC

You can get by GROUP BY with MAX() of DateComplete
SELECT ID, IsAdamBrown, MAX(DateComplete) AS DateComplete
FROM thisTable
WHERE IsAdamBrown IS NOT NULL
GROUP BY ID, IsAdamBrown
ORDER BY MAX(DateComplete) DESC

You can using LIMIT
SELECT ID, IsAdamBrown, DateComplete
FROM thisTable
WHERE IsAdamBrown IS NOT NULL
GROUP BY ID, IsAdamBrown
ORDER BY DateComplete LIMIT 1

You can use this. I hope it will work for you.
SELECT ID, IsAdamBrown, DateComplete
FROM thisTable a
WHERE DateComplete IN
(
SELECT MAX(DateComplete) FROM thisTable b WHERE a.ID = b.ID GROUP BY b.ID
) ORDER BY DateComplete DESC

You can use ROW_NUMBER() for grouping according to ID and a subquery to get the only first record with recent iscomplete. This will first sort your data according to id and recent iscomplete and then the first result for all the unique IDs
SELECT X.ID, X.IsAdamBrown, X.DateComplete
FROM ( SELECT ID, IsAdamBrown, DateComplete,ROW_NUMBER() OVER(PARTITION BY ID ORDER BY DateComplete DESC) RN
FROM thisTable
WHERE IsAdamBrown IS NOT NULL ) X
WHERE X.RN=1

TSQL Find If All Records In A Group Have The Same Value

I have a table that contains a field which is used for grouping and another field which holds data. I want a good way to find any GroupColumn value where every DataColumn value contains a specific value.
Example
+-------------+------------+
| GroupColumn | DataColumn |
+-------------+------------+
| GroupA | Data1 |
| GroupA | Data2 |
| GroupA | Data3 |
| GroupB | Data1 |<---These two values are the same
| GroupB | Data1 |<---for the same group
| GroupC | Data1 |
| GroupC | Data2 |
| GroupC | Data2 |
| GroupC | Data3 |
+-------------+------------+
Desired Output
Group B
In the example above the DataColumn changes for GroupA and GroupC, but for GroupB both values in the DataColumn are the same so I would want this result returned.
Current Solution
I have 2 current solutions based around the same theme, but I feel that this is something that SQL should be able to do in a easier fashion.
Group everything in the table, count the times GroupColumn appears and put this into a table. Do the same, but apply a condition. Join the 2 tables and see where the 2 counts do not match.
SELECT GROUPCOLUMN, COUNT(*) [TOTAL] INTO #ALL
FROM #TABLE
GROUP BY GROUPCOLUMN
SELECT GROUPCOLUMN, COUNT(*) [TOTAL] INTO #SOME
FROM #TABLE
WHERE DATACOLUMN = 'DATA1'
GROUP BY GROUPCOLUMN
SELECT * FROM #ALL A
INNER JOIN #SOME S ON A.GROUPCOLUMN = S.GROUPCOLUMN
WHERE S.TOTAL = A.TOTAL
Use a SUM and a CASE to check for the specific value and count everything and check in a sub-query.
SELECT * FROM
(SELECT GROUPCOLUMN, SUM(CASE WHEN DATACOLUMN = 'DATA1' THEN 1 ELSE 0 END) [VALUE], COUNT(*) [TOTAL] FROM #TABLE (NOLOCK)
GROUP BY GROUPCOLUMN) A
WHERE A.VALUE = A.TOTAL
Is there a better way to do this in SQL?
Thanks in advance.
Ninja

You are looking for HAVING clause
SELECT GROUPCOLUMN
FROM #TABLE (NOLOCK)
GROUP BY GROUPCOLUMN
HAVING Count(*) = Count(case when DATACOLUMN = 'DATA1' then 1 end)

It sounds like you are looking for each group that has a single distinct value in DATACOLUMN:
SELECT GROUPCOLUMN
FROM #TABLE
GROUP BY GROUPCOLUMN
HAVING COUNT(DISTINCT DATACOLUMN) = 1
Note that COUNT(DISTINCT ...) does not count NULL as a distinct value.

You should be able to compare COUNT(*) with COUNT(DISTINCT DATACOLUMN) to do this. Like this:
SELECT GROUPCOLUMN
FROM #TABLE
GROUP BY GROUPCOLUMN
HAVING Count(*) = Count(DISTINCT DATACOLUMN)

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Need help in using Analytical function - snowflake-cloud-data-platform

Related

Need the correct logic when using LISTAGG

Need guidance in using LISTAGG

Get newest record per group from subquery

SQL Server group by but select 'top' date

TSQL Find If All Records In A Group Have The Same Value

Categories

Resources