Take the Supersets of Arrays - snowflake-cloud-data-platform

i have a column in a Snowflake table that returns arrays.
Eg-
['A','U']
['A','P','U']
['A','P']
['P','U']
['M','S']
['S']
i need to remove the subsets and just take the supersets, so only
['A','P','U']
['M','S']
is there an easy way to do this?

not the best solution but if you can't find better try something like this:
SELECT *
FROM (
SELECT a, b, c,
ROW_NUMBER() OVER (PARTITION BY a ORDER BY b) AS row_num
FROM (
SELECT a, b, c,
ROW_NUMBER() OVER (PARTITION BY a ORDER BY c) AS row_num
FROM (
SELECT a, b, c,
ROW_NUMBER() OVER (PARTITION BY c ORDER BY b) AS row_num
FROM (
SELECT a, b, c,
ROW_NUMBER() OVER (PARTITION BY b ORDER BY b) AS row_num
from supertset
) where row_num = 1
)
)
)
QUALIFY row_num = 1
for reference:
https://docs.snowflake.com/en/sql-reference/constructs/qualify.html

For each array we are going to look what array contains it, by looking with a cross join for one with the largest intersection, and then choose the one that's the largest between those:
select distinct x
from (
select b.x
from data a
cross join data b
qualify 1 = row_number() over(
partition by a.id
order by array_size(array_intersection(a.x, b.x)) desc
, array_size(b.x) desc
)
)
This might not work with a different sample dataset, but the rules that the question includes are not enough to determine the right solution in other situations.
Data setup:
with data as (
select row_number() over(order by 1) id, parse_json(value) x
from table(split_to_table($$['A','U']
['A','P','U']
['A','P']
['P','U']
['M','S']
['S']$$, '\n'))
)

Related

How to avoid SELECT COUNT in a SELECT statement for partial percentages

I want to calculate the partial percentage for each SiteName here but I would
need to calculate the aggregate count of my rows.
The following query works but is there a way to do this without using the SELECT
within the SELECT statement or declaring a variable for this? I only have read access so
I can't declare variables.
SELECT
ServiceSiteName
, COUNT(*) AS [Alarms Resolved]
-- How can I avoid this.
, (SELECT COUNT(1) FROM [C1Datastore].[dbo].[Fct_AlertCRM_Incident]
WHERE Conditions A, B, C) AS [Total Count]
, COUNT(*) / (SELECT COUNT(1) FROM [C1Datastore].[dbo].[Fct_AlertCRM_Incident]
WHERE Conditions A, B, C) AS [% Count]
FROM TableX
WHERE Conditions A, B, C
GROUP BY
ServiceSiteName
ORDER BY [Alarms Resolved] DESC
The key is to use SUM(COUNT()) OVER (PARTITION BY) and this is much shorter.
SELECT
ServiceSiteName
, COUNT(*) AS [Alarms Resolved]
-- Use this
, SUM(COUNT(1)) OVER (PARTITION BY ServiceSiteName) AS [TOTAL]
, COUNT(*) / SUM(COUNT(1)) OVER (PARTITION BY ServiceSiteName) AS [% Count]
FROM TableX
WHERE Conditions A, B, C
GROUP BY
ServiceSiteName
ORDER BY [Alarms Resolved] DESC

Compare N and N-1 record, using Row_NUMBER().. Performance issue

I am trying to compare [N] and [N-1] record, Partition By c_id ORDER BY b_id DESC. Query is executing fine but running really slow for ROW_NUMBER() function. most expensive areas are Index seek, Sort for partition and Hash match. Is there any way to write the query to improve performance.
Select [N].*,[N-1]* from
(
(
Select p_id,b_id,c_id,cust_name,TNP,TRQI,First_One_Latest
from (
Select p_id, b_id, c_id, cust_name, TNP, TRQI,
ROW_NUMBER() OVER (Partition By c_id ORDER BY b_id DESC) First_One_Latest
from data.Counterparty_Credit_Risk
where ISNUMERIC(c_id)=1
and cust_src_system=N'SET1'
) As FIRST
where FIRST.First_One_Latest=1
) As [N]
INNER JOIN
(
Select p_id,b_id,c_id,cust_name,TNP,TRQI,First_One_Latest
from (
Select p_id,b_id,c_id,cust_name,TNP,TRQI
,ROW_NUMBER() OVER (Partition By c_id ORDER BY b_id DESC) First_One_Latest
from data.Counterparty_Credit_Risk
where ISNUMERIC(c_id)=1
and cust_src_system=N'SET1'
) As Second
where Second.First_One_Latest=2
) As [N-1]
ON N.c_id = [N-1].c_id
)

SQL Simple Join with two tables, but one is random

I am stuck with this. I have a simple set-up with two tables. One table is holding emailaddresses one table is holding vouchercodes. I want to join them in a third table, so that each emailaddress has one random vouchercode.
Unfortunatly I am stuck with this as there are no identic Ids to match both values. What I have so far brings no result:
Select
A.Email
B.CouponCode
FROM Emailaddresses as A
JOIN CouponCodes as B
on A.Email = B.CouponCode
A hint would be great as search did not bring me any further yet.
Edit -
Table A (Addresses)
-------------------
Column A | Column B
-------------------------
email1#gmail.com True
email2#gmail.com
email3#gmail.com True
email4#gmail.com
Table B (Voucher)
-------------------
ABCD1234
ABCD5678
ABCD9876
ABCD5432
Table C
-------------------------
column A | column B
-------------------------
email1#gmail.com ABCD1234
email2#gmail.com ABCD5678
email3#gmail.com ABCD9876
email4#gmail.com ABCD5432
Sample Data:
While joining without proper keys is not a good solution, for your case you can try this. (note: not tested, just a quick suggestion)
;with cte_email as (
select row_number() over (order by Email) as rownum, Email
from Emailaddresses
)
;with cte_coupon as (
select row_number() over (order by CouponCode) as rownum, CouponCode
from CouponCodes
)
select a.Email,b.CouponCode
from cte_email a
join cte_coupon b
on a.rownum = b.rownum
You want to randomly join records, one email with one coupon each. So create random row numbers and join on these:
select
e.email,
c.couponcode
from (select t.*, row_number() over (order by newid()) as rn from emailaddresses t) e
join (select t.*, row_number() over (order by newid()) as rn from CouponCodes t) c
on c.rn = e.rn;
Give a row number for both the tables and join it with row number.
Query
;with cte as(
select [rn] = row_number() over(
order by [Column_A]
), *
from [Table_A]
),
cte2 as(
select [rn] = row_number() over(
order by [Column_A]
), *
from [Table_B]
)
select t1.[Column_A] as [Email_Id], t2.[Column_A] as [Coupon]
from cte t1
join cte2 t2
on t1.rn = t2.rn;
Find a demo here

SQL select the row with max value using row_number() or rank()

I have data of following kind:
RowId Name Value
1 s1 12
22 s1 3
13 s1 4
10 s2 14
22 s2 5
3 s2 100
I want to have the following output:
RowId Name Value
1 s1 12
3 s2 100
I am currently using temp tables to get this in two step. I have been trying to use row_number() and rank() functions but have not been successful.
Can someone please help me with syntax as I feel row_number() and rank() will make it cleaner?
Edit:
I changed the rowId to make it a general case
Edit:
I am open to ideas better than row_number() and rank() if there are any.
If you use rank() you can get multiple results when a name has more than 1 row with the same max value. If that is what you are wanting, then switch row_number() to rank() in the following examples.
For the highest value per name (top 1 per group), using row_number()
select sub.RowId, sub.Name, sub.Value
from (
select *
, rn = row_number() over (
partition by Name
order by Value desc
)
from t
) as sub
where sub.rn = 1
I can not say that there are any 'better' alternatives, but there are alternatives. Performance may vary.
cross apply version:
select distinct
x.RowId
, t.Name
, x.Value
from t
cross apply (
select top 1
*
from t as i
where i.Name = t.Name
order by i.Value desc
) as x;
top with ties using row_number() version:
select top 1 with ties
*
from t
order by
row_number() over (
partition by Name
order by Value desc
)
This inner join version has the same issue as using rank() instead of row_number() in that you can get multiple results for the same name if a name has more than one row with the same max value.
inner join version:
select t.*
from t
inner join (
select MaxValue = max(value), Name
from t
group by Name
) as m
on t.Name = m.Name
and t.Value = m.MaxValue;
If you really want to use ROW_NUMBER() you can do it this way:
With Cte As
(
Select *,
Row_Number() Over (Partition By Name Order By Value Desc) RN
From YourTable
)
Select RowId, Name, Value
From Cte
Where RN = 1;
Unless I'm missing something... Why use row_number() or rank?
select rowid, name, max(value) as value
from table
group by rowid, name

CREATE VIEW with SELECT results from CTE

I'm having the hard time creating a VIEW with the results of the SELECT statement from my CTE. The following is my syntax:
CREATE VIEW fooView AS
;WITH CTE AS (
SELECT a, b, c, ROW_NUMBER() OVER (PARTITION BY a, b, c ORDER BY a) AS RN
FROM foo)
SELECT * FROM CTE WHERE RN > 1
I have also tried this which also doesn't work:
;WITH CTE AS (
SELECT a, b, c, ROW_NUMBER() OVER (PARTTION BY a, b, c ORDER BY a) AS RN
FROM foo)
CREATE VIEW fooView AS
SELECT * FROM CTE WHERE RN > 1
Anybody want to help me out with this?

Resources