SQL rank starting with 2 - snowflake-cloud-data-platform

SQL rank starting with 2 - snowflake-cloud-data-platform

select
TRANSACTION_DT,
SUBSCRIPTION_ID,
KEY_ID,
EMAIL,
PURCHASE_PRODUCT_ID,
rank () over (partition by SUBSCRIPTION_ID,KEY_ID order by TRANSACTION_DT desc) as rnk
from "DC"."BW_BOOKINGS"
where email='abc#gmail.com'
The above SQL statement result table looks like this
TRANSACTION_DT
SUBSCRIPTION_ID
KEY_ID
EMAIL
PURCHASE_PRODUCT_ID
RNK
2021-07-14 09:42:47.710 -0700
S107283
122693
abc#gmail.com
143510
1
2021-07-14 09:42:47.710 -0700
S107283
122693
abc#gmail.com
139724
1
2020-07-14 09:22:14.033 -0700
S107283
122693
abc#gmail.com
143510
3
2020-07-14 09:22:14.033 -0700
S107283
122693
abc#gmail.com
139724
3
But when I change the SQL statement to this
select * from (
select
TRANSACTION_DT,
SUBSCRIPTION_ID,
KEY_ID,
EMAIL,
PURCHASE_PRODUCT_ID,
rank () over (partition by SUBSCRIPTION_ID,KEY_ID order by TRANSACTION_DT desc) as rnk
from "DC"."BW_BOOKINGS"
) t
where email='abc#gmail.com'
My table looks like this:
TRANSACTION_DT
SUBSCRIPTION_ID
KEY_ID
EMAIL
PURCHASE_PRODUCT_ID
RNK
2021-07-14 09:42:47.710 -0700
S107283
122693
abc#gmail.com
143510
3
2021-07-14 09:42:47.710 -0700
S107283
122693
abc#gmail.com
139724
3
2020-07-14 09:22:14.033 -0700
S107283
122693
abc#gmail.com
139724
5
2020-07-14 09:22:14.033 -0700
S107283
122693
abc#gmail.com
143510
5
I want to get results from the table only when rnk=1, but the rank in table 2 starts with 3 which is not helping me to filter out results on where rnk=1. Also, Can anyone tell me why the order of PURCHASE_PRODUCT_ID chases in row 3.

The second solution is ranking ALL rows, and then you throw away rows not match the email address, thus you have already "lost 1"
The filter need to happen before the ranking function, not after it.
So if you want only RANK = 1 then you can qualify:
select
TRANSACTION_DT,
SUBSCRIPTION_ID,
KEY_ID,
EMAIL,
PURCHASE_PRODUCT_ID,
rank () over (partition by SUBSCRIPTION_ID,KEY_ID order by TRANSACTION_DT desc) as rnk
from "DC"."BW_BOOKINGS"
qualify rnk = 1
if you want only the RANK 1 rows for email X then do the filtering in the WHERE, then QUALIFY to keep the wanted rows:
select
TRANSACTION_DT,
SUBSCRIPTION_ID,
KEY_ID,
EMAIL,
PURCHASE_PRODUCT_ID,
rank () over (partition by SUBSCRIPTION_ID,KEY_ID order by TRANSACTION_DT desc) as rnk
from "DC"."BW_BOOKINGS"
where email='abc#gmail.com'
qualify rnk = 1
Greg's note, "if you don't want the rank you don't need it" looks like:
select
TRANSACTION_DT,
SUBSCRIPTION_ID,
KEY_ID,
EMAIL,
PURCHASE_PRODUCT_ID
from "DC"."BW_BOOKINGS"
where email='abc#gmail.com'
qualify rank () over (partition by SUBSCRIPTION_ID,KEY_ID order by TRANSACTION_DT desc) = 1

Related

Altering a QUALIFY with an additional criterion

In Snowflake I have this original query which, for a given consumer_ID, produces a list of unique store IDs.
SELECT
t.consumer_id
, t.business_id
, t.store_id
, t.campaign_id
FROM campaigns_mini AS t
QUALIFY ROW_NUMBER() OVER (PARTITION BY t.consumer_id, t.store_id ORDER BY t.campaign_id) = 1
The original purpose was to provide a list that does not duplicate store_id for a given consumer_id. Suppose now I also need to ensure this list does not duplicate business_id as well for a given consumer_ID. Is there an easy way to modify the above?

SELECT
t.consumer_id
, t.business_id
, t.store_id
, t.campaign_id
FROM campaigns_mini AS t
QUALIFY ROW_NUMBER() OVER
(PARTITION BY t.consumer_id
,t.store_id
,t.business_id
ORDER BY t.campaign_id) = 1
The partition by clause forms windows by the combination of all the expressions in the clause.
This will deduplicate by the combination of consumer_id, store_id, and business_id. If this is not what you need, please update with sample input and output to clarify.

So if I make up some data:
WITH campaigns_mini(consumer_id, business_id, store_id, campaign_id) as (
select * from values
(1,10,100,1000),
(1,10,100,1001),
(1,10,101,1002),
(2,20,200,2000)
)
and use your exist SQL
SELECT
t.consumer_id
,t.business_id
,t.store_id
,t.campaign_id
FROM campaigns_mini AS t
QUALIFY ROW_NUMBER() OVER (PARTITION BY t.consumer_id, t.store_id ORDER BY t.campaign_id) = 1
I get
CONSUMER_ID
BUSINESS_ID
STORE_ID
CAMPAIGN_ID
1
10
101
1002
1
10
100
1000
2
20
200
2000
we get the Store not repeated for the Consumer, but as you note you don't want the business repeated ether..
If we change to using business_id instead of store_id we see we get less rows:
SELECT
t.consumer_id
,t.business_id
,t.store_id
,t.campaign_id
FROM campaigns_mini AS t
QUALIFY ROW_NUMBER() OVER (PARTITION BY t.consumer_id, t.business_id ORDER BY t.campaign_id) = 1
ORDER BY 1;
CONSUMER_ID
BUSINESS_ID
STORE_ID
CAMPAIGN_ID
1
10
100
1000
2
20
200
2000
So if we want "no repeating business_id AND no repeating stores" using the Qualify Greg's has proposed will not help, as we are keeping the first for the distinct set of consumer,business, & store:
QUALIFY ROW_NUMBER() OVER (PARTITION BY t.consumer_id, t.business_id, t.store_id ORDER BY t.campaign_id) = 1
which gives:
CONSUMER_ID |BUSINESS_ID |STORE_ID |CAMPAIGN_ID
1 |10 |100 |1000
1 |10 |101 |1002
2 |20 |200 |2000
So the next thing is to think why not keep the only the first of the two sets:
QUALIFY ROW_NUMBER() OVER (PARTITION BY t.consumer_id, t.store_id ORDER BY t.campaign_id) = 1
AND ROW_NUMBER() OVER (PARTITION BY t.consumer_id, t.business_id ORDER BY t.campaign_id) = 1
which for this data works!
CONSUMER_ID
BUSINESS_ID
STORE_ID
CAMPAIGN_ID
1
10
100
1000
2
20
200
2000
but then for this data:
WITH campaigns_mini(consumer_id, business_id, store_id, campaign_id) as (
select * from values
(1,10,100,1000),
(1,10,101,1001),
(1,20,101,1002)
)
there is only one row with business 20, for store 101, but the first 101 store is on campaign 1001, so both those rows are discarded.
CONSUMER_ID
BUSINESS_ID
STORE_ID
CAMPAIGN_ID
1
10
100
1000
So if we use two layers to do the prune, for this data:
select * from (
SELECT
t.consumer_id
,t.business_id
,t.store_id
,t.campaign_id
FROM campaigns_mini AS t
QUALIFY ROW_NUMBER() OVER (PARTITION BY t.consumer_id, t.business_id ORDER BY t.campaign_id) = 1
)
QUALIFY ROW_NUMBER() OVER (PARTITION BY consumer_id, store_id ORDER BY campaign_id) = 1
works:
CONSUMER_ID
BUSINESS_ID
STORE_ID
CAMPAIGN_ID
1
10
100
1000
1
20
101
1002
but if your flip those orders of QUALIFY you are back to just one row..
so as a general problem it cannot be safely solve for all data cases with this pattern...

How to partition to spread values?

I have a table with data:
Customers
Sequence
ID
many other columns (not important)
Sample data:
Sequence ID
-----------
214906 2613
214906 2614
214906 2615
214907 2613
214907 2614
214907 2615
214908 2613
214908 2614
214908 2615
214000 2613
213004 4444
111111 5555
111111 5556
111112 5556
111112 5555
How can I get the desired result below?
214906 2613
214907 2614
214908 2615
214000 2613
213004 4444
111111 5555
111112 5556
I tried various stuff with ROW_NUMBER() OVER(PARTITION BY Sequence) but it did not help because I need to take row 1 in first group, row 2 in second group etc. In other words, I need to somehow spread those Sequences across ID's. I cannot partition by ID's either because they might appear more than once in the table

I hope I understand you correctly. I use the count of IDs per sequence as a group factor (using SUM() with OVER clause without ORDER BY) and after that appropriate ranking and row numbering:
Input:
CREATE TABLE #Data (
Sequence int,
ID int
)
INSERT INTO #Data
(Sequence, ID)
VALUES
(214906, 2613),
(214906, 2614),
(214906, 2615),
(214907, 2613),
(214907, 2614),
(214907, 2615),
(214908, 2613),
(214908, 2614),
(214908, 2615),
(214000, 2613),
(213004, 4444),
(111111, 5555),
(111111, 5556),
(111112, 5556),
(111112, 5555)
T-SQL:
;WITH SequenceCTE AS (
SELECT
*,
COUNT(*) OVER (PARTITION BY Sequence) AS SequenceCnt
FROM #Data
), RankCTE AS (
SELECT
*,
DENSE_RANK() OVER (PARTITION BY SequenceCnt, Sequence ORDER BY SequenceCnt, ID) AS RankNo,
ROW_NUMBER() OVER (PARTITION BY SequenceCnt, ID ORDER BY Sequence, ID) AS RowNo
FROM SequenceCTE
)
SELECT Sequence, ID
FROM RankCTE
WHERE RankNo = RowNo
Output:
----------------
Sequence ID
----------------
214000 2613
213004 4444
111111 5555
111112 5556
214906 2613
214907 2614
214908 2615
Update (special case with one ID in a sequence):
;WITH SequenceCTE AS (
SELECT
*,
COUNT(*) OVER (PARTITION BY Sequence) AS SequenceCnt
FROM #Data
), RankCTE AS (
SELECT
*,
CASE
WHEN SequenceCnt = 1 THEN 1
ELSE DENSE_RANK() OVER (PARTITION BY SequenceCnt, Sequence ORDER BY SequenceCnt, ID)
END AS RankNo,
CASE
WHEN SequenceCnt = 1 THEN 1
ELSE ROW_NUMBER() OVER (PARTITION BY SequenceCnt, ID ORDER BY Sequence, ID)
END AS RowNo
FROM SequenceCTE
)
SELECT Sequence, ID
FROM RankCTE
WHERE RankNo = RowNo

SQL Server, How to group rows that are near in time

I have a table that has a time value, and a user id, and I want to group the rows if they are near in time (less than 2 mn between each row), and group them by user id.
Here is an Example :
CreatedAt | User ID
'16:01:01' | '01'
'16:02:20' | '01'
'16:03:20' | '01'
'16:04:20' | '01'
'16:05:20' | '02'
'16:06:20' | '02'
'16:07:20' | '02'
'16:08:20' | '02'
'16:14:02' | '02'
'16:15:01' | '02'
'16:20:02' | '03'
The result should be :
User ID = 01
'16:01:01'
'16:02:20'
'16:03:20'
'16:04:20'
User ID = 02
'16:05:20'
'16:06:20'
'16:07:20'
'16:08:20'
'16:14:02'
'16:15:01'
User ID = 03
'16:20:02'
I'm not even sure if it's doable by SQL, or I have to code it (I have few millions lines in my database so it's not the most effective way).
Thanks for your help.

This assigns a "Group Number" to the sets. however, not sure what this really achieves, but might help you achieve what you want on your presentation layer:
WITH VTE AS(
SELECT CONVERT(time(0), V.CreatedAt) AS CreatedAt, UserID
FROM (VALUES ('16:01:01','01'),
('16:02:20','01'),
('16:03:20','01'),
('16:04:20','01'),
('16:05:20','02'),
('16:06:20','02'),
('16:07:20','02'),
('16:08:20','02'),
('16:14:02','02'),
('16:15:01','02'),
('16:20:02','03')) V(CreatedAt, UserID)),
TimeDiff AS(
SELECT *,
CASE WHEN DATEDIFF(SECOND,LAG(CreatedAt,1,CreatedAt) OVER (PARTITION BY UserID ORDER BY CreatedAt ASC),CreatedAt) <= 120 THEN 1 ELSE 0 END AS Succession
FROM VTE)
SELECT TD.CreatedAt,
TD.UserID,
COUNT(CASE WHEN TD.Succession = 0 THEN 1 END) OVER (PARTITION BY UserID ORDER BY TD.CreatedAt
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS GroupNumber
FROM TimeDiff TD;

SQL Server : join with top record selection

Clientcode Emailaddress Accountcode clientname phoneno
----------------------------------------------------------------
AAA ragu#bib.com 100 Berjeya 90909090
AAA ragu1#bib.com 100 Berjeya 90909090
AAABBB jkkjkj#bib.com 200 Berjeya sooo 3222
CCCC dfdf#bib.com 200 Berjeya klkl 123
dddd sdsdsd#bib.com 33300 Berjeya penn 33333
This is the data in my table, I need to remove any one of the email address with same client code and account code. For example the email address ragu#bib.com and ragu1#bib.com have the same client code and account code, but email address is different; I need to show only one of the email addresses with all records. Please suggest the suitable query for this.

you can use top 1 with ties as below:
Select top (1) with ties * from yourtable
order by row_number() over(partition by ClientCode,AccountCode order by EmailAddress)
with subquery you can do like below
Select * from (
Select *, RowN = Row_Number() over(partition by ClientCode, AccountCode order by EmailAddress) from yourtable
) a where a.RowN = 1

T-SQL Query to remove duplicate records in the output based on one particular column

I am running SQL Server 2014 and I have the following T-SQL query:
USE MYDATABASE
SELECT *
FROM RESERVATIONLIST
WHERE [MTH] IN ('JANUARY 2015','FEBRUARY 2015')
RESERVATIONLIST mentioned in the code above is a view. The query gives me the following output (extract):
ID NAME DOA DOD Nights Spent MTH
--------------------------------------------------------------------
251 AH 2015-01-12 2015-01-15 3 JANUARY 2015
258 JV 2015-01-28 2015-02-03 4 JANUARY 2015
258 JV 2015-01-28 2015-02-03 2 FEBRUARY 2015
The above output consist of around 12,000 records.
I need to modify my query so that it eliminates all duplicate ID and give me the following results:
ID NAME DOA DOD Nights Spent MTH
--------------------------------------------------------------------
251 AH 2015-01-12 2015-01-15 3 JANUARY 2015
258 JV 2015-01-28 2015-02-03 4 JANUARY 2015
I tried something like this, but it's not working:
USE MYDATABASE
SELECT *
FROM RESERVATIONLIST
WHERE [MTH] IN ('JANUARY 2015', 'FEBRUARY 2015')
GROUP BY [ID]
HAVING COUNT ([MTH]) > 1

Following query will return one row per ID :
SELECT * FROM
(
SELECT *,ROW_NUMBER() OVER (PARTITION BY ID ORDER BY (SELECT NULL)) rn FROM RESERVATIONLIST
WHERE [MTH] IN ('JANUARY 2015','FEBRUARY 2015')
) T
WHERE rn = 1
Note : this will return a random row from multiple rows having same ID. IF you want to select some specific row then you have to define it in order by. For e.g. :
SELECT * FROM
(
SELECT *,ROW_NUMBER() OVER (PARTITION BY ID ORDER BY DOA DESC) rn FROM RESERVATIONLIST
WHERE [MTH] IN ('JANUARY 2015','FEBRUARY 2015')
) T
WHERE rn = 1
definitely, it will return the row having max(DOA).

You are trying to do a GROUP BY statement which IMHO is the right way to go. You should formulate all columns that are a constant, and roll-up the others. Depending on the value of DOD and DOA I can see two solutions:
SELECT ID,NAME,DOA,DOD,SUM([Nights Spent]) as Nights,
min(MTH) as firstRes, max(MTH) as lastRes
FROM RESERVATIONLIST
GROUP BY ID,NAME,DOA,DOD
OR
SELECT ID,NAME,min(DOA) as firstDOA,max(DOD) as lastDOD,SUM([Nights Spent]) as Nights,
min(MTH) as firstRes, max(MTH) as lastRes
FROM RESERVATIONLIST
GROUP BY ID,NAME

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

SQL rank starting with 2 - snowflake-cloud-data-platform

Related

Altering a QUALIFY with an additional criterion

How to partition to spread values?

SQL Server, How to group rows that are near in time

SQL Server : join with top record selection

T-SQL Query to remove duplicate records in the output based on one particular column

Categories

Resources