I have a table with data:
Customers
Sequence
ID
many other columns (not important)
Sample data:
Sequence ID
-----------
214906 2613
214906 2614
214906 2615
214907 2613
214907 2614
214907 2615
214908 2613
214908 2614
214908 2615
214000 2613
213004 4444
111111 5555
111111 5556
111112 5556
111112 5555
How can I get the desired result below?
214906 2613
214907 2614
214908 2615
214000 2613
213004 4444
111111 5555
111112 5556
I tried various stuff with ROW_NUMBER() OVER(PARTITION BY Sequence) but it did not help because I need to take row 1 in first group, row 2 in second group etc. In other words, I need to somehow spread those Sequences across ID's. I cannot partition by ID's either because they might appear more than once in the table
I hope I understand you correctly. I use the count of IDs per sequence as a group factor (using SUM() with OVER clause without ORDER BY) and after that appropriate ranking and row numbering:
Input:
CREATE TABLE #Data (
Sequence int,
ID int
)
INSERT INTO #Data
(Sequence, ID)
VALUES
(214906, 2613),
(214906, 2614),
(214906, 2615),
(214907, 2613),
(214907, 2614),
(214907, 2615),
(214908, 2613),
(214908, 2614),
(214908, 2615),
(214000, 2613),
(213004, 4444),
(111111, 5555),
(111111, 5556),
(111112, 5556),
(111112, 5555)
T-SQL:
;WITH SequenceCTE AS (
SELECT
*,
COUNT(*) OVER (PARTITION BY Sequence) AS SequenceCnt
FROM #Data
), RankCTE AS (
SELECT
*,
DENSE_RANK() OVER (PARTITION BY SequenceCnt, Sequence ORDER BY SequenceCnt, ID) AS RankNo,
ROW_NUMBER() OVER (PARTITION BY SequenceCnt, ID ORDER BY Sequence, ID) AS RowNo
FROM SequenceCTE
)
SELECT Sequence, ID
FROM RankCTE
WHERE RankNo = RowNo
Output:
----------------
Sequence ID
----------------
214000 2613
213004 4444
111111 5555
111112 5556
214906 2613
214907 2614
214908 2615
Update (special case with one ID in a sequence):
;WITH SequenceCTE AS (
SELECT
*,
COUNT(*) OVER (PARTITION BY Sequence) AS SequenceCnt
FROM #Data
), RankCTE AS (
SELECT
*,
CASE
WHEN SequenceCnt = 1 THEN 1
ELSE DENSE_RANK() OVER (PARTITION BY SequenceCnt, Sequence ORDER BY SequenceCnt, ID)
END AS RankNo,
CASE
WHEN SequenceCnt = 1 THEN 1
ELSE ROW_NUMBER() OVER (PARTITION BY SequenceCnt, ID ORDER BY Sequence, ID)
END AS RowNo
FROM SequenceCTE
)
SELECT Sequence, ID
FROM RankCTE
WHERE RankNo = RowNo
Related
Currently I'm doing this:
select
ProductID = ProductID = ROW_NUMBER() OVER (PARTITION BY PRODUCTID ORDER BY PRODUCtID),
TransactionDate,
TransactionAmount
from ProductsSales
order by ProductID
The results are like this:
ProductID
TransactionDate
TransactionAmount
1
2022-11-06
30
2
2022-11-12
30
3
2022-11-28
30
2
2022-11-03
10
3
2022-11-10
10
4
2022-11-15
10
3
2022-11-02
50
The duplicated IDs are being inserted sequential, but what I need it to be like this:
ProductID
TransactionDate
TransactionAmount
1
2022-11-06
30
1.1
2022-11-12
30
1.2
2022-11-28
30
2
2022-11-03
10
2.1
2022-11-10
10
2.2
2022-11-15
10
3
2022-11-02
50
Is this possible?
Assuming your PRODUCTID field is numeric already, then this should work:
WITH _ProductIdSorted AS
(
SELECT
CONCAT
(
PRODUCTID,
'.',
ROW_NUMBER() OVER (PARTITION BY PRODUCTID ORDER BY TransactionDate) - 1
) AS ProductId,
TransactionDate,
TransactionAmount
FROM ProductsSales
)
SELECT
REPLACE(ProductId, '.0', '') AS ProductId,
TransactionDate,
TransactionAmount
FROM _ProductIdSorted;
By the way, just the same as the ORDER BY clause in your query, the one my answer uses is a nondeterminsitic sort. It seems, based on your Post, it doesn't matter to you the order which the rows are sorted within the partition though.
In Snowflake I have this original query which, for a given consumer_ID, produces a list of unique store IDs.
SELECT
t.consumer_id
, t.business_id
, t.store_id
, t.campaign_id
FROM campaigns_mini AS t
QUALIFY ROW_NUMBER() OVER (PARTITION BY t.consumer_id, t.store_id ORDER BY t.campaign_id) = 1
The original purpose was to provide a list that does not duplicate store_id for a given consumer_id. Suppose now I also need to ensure this list does not duplicate business_id as well for a given consumer_ID. Is there an easy way to modify the above?
SELECT
t.consumer_id
, t.business_id
, t.store_id
, t.campaign_id
FROM campaigns_mini AS t
QUALIFY ROW_NUMBER() OVER
(PARTITION BY t.consumer_id
,t.store_id
,t.business_id
ORDER BY t.campaign_id) = 1
The partition by clause forms windows by the combination of all the expressions in the clause.
This will deduplicate by the combination of consumer_id, store_id, and business_id. If this is not what you need, please update with sample input and output to clarify.
So if I make up some data:
WITH campaigns_mini(consumer_id, business_id, store_id, campaign_id) as (
select * from values
(1,10,100,1000),
(1,10,100,1001),
(1,10,101,1002),
(2,20,200,2000)
)
and use your exist SQL
SELECT
t.consumer_id
,t.business_id
,t.store_id
,t.campaign_id
FROM campaigns_mini AS t
QUALIFY ROW_NUMBER() OVER (PARTITION BY t.consumer_id, t.store_id ORDER BY t.campaign_id) = 1
I get
CONSUMER_ID
BUSINESS_ID
STORE_ID
CAMPAIGN_ID
1
10
101
1002
1
10
100
1000
2
20
200
2000
we get the Store not repeated for the Consumer, but as you note you don't want the business repeated ether..
If we change to using business_id instead of store_id we see we get less rows:
SELECT
t.consumer_id
,t.business_id
,t.store_id
,t.campaign_id
FROM campaigns_mini AS t
QUALIFY ROW_NUMBER() OVER (PARTITION BY t.consumer_id, t.business_id ORDER BY t.campaign_id) = 1
ORDER BY 1;
CONSUMER_ID
BUSINESS_ID
STORE_ID
CAMPAIGN_ID
1
10
100
1000
2
20
200
2000
So if we want "no repeating business_id AND no repeating stores" using the Qualify Greg's has proposed will not help, as we are keeping the first for the distinct set of consumer,business, & store:
QUALIFY ROW_NUMBER() OVER (PARTITION BY t.consumer_id, t.business_id, t.store_id ORDER BY t.campaign_id) = 1
which gives:
CONSUMER_ID |BUSINESS_ID |STORE_ID |CAMPAIGN_ID
1 |10 |100 |1000
1 |10 |101 |1002
2 |20 |200 |2000
So the next thing is to think why not keep the only the first of the two sets:
QUALIFY ROW_NUMBER() OVER (PARTITION BY t.consumer_id, t.store_id ORDER BY t.campaign_id) = 1
AND ROW_NUMBER() OVER (PARTITION BY t.consumer_id, t.business_id ORDER BY t.campaign_id) = 1
which for this data works!
CONSUMER_ID
BUSINESS_ID
STORE_ID
CAMPAIGN_ID
1
10
100
1000
2
20
200
2000
but then for this data:
WITH campaigns_mini(consumer_id, business_id, store_id, campaign_id) as (
select * from values
(1,10,100,1000),
(1,10,101,1001),
(1,20,101,1002)
)
there is only one row with business 20, for store 101, but the first 101 store is on campaign 1001, so both those rows are discarded.
CONSUMER_ID
BUSINESS_ID
STORE_ID
CAMPAIGN_ID
1
10
100
1000
So if we use two layers to do the prune, for this data:
select * from (
SELECT
t.consumer_id
,t.business_id
,t.store_id
,t.campaign_id
FROM campaigns_mini AS t
QUALIFY ROW_NUMBER() OVER (PARTITION BY t.consumer_id, t.business_id ORDER BY t.campaign_id) = 1
)
QUALIFY ROW_NUMBER() OVER (PARTITION BY consumer_id, store_id ORDER BY campaign_id) = 1
works:
CONSUMER_ID
BUSINESS_ID
STORE_ID
CAMPAIGN_ID
1
10
100
1000
1
20
101
1002
but if your flip those orders of QUALIFY you are back to just one row..
so as a general problem it cannot be safely solve for all data cases with this pattern...
Clientcode Emailaddress Accountcode clientname phoneno
----------------------------------------------------------------
AAA ragu#bib.com 100 Berjeya 90909090
AAA ragu1#bib.com 100 Berjeya 90909090
AAABBB jkkjkj#bib.com 200 Berjeya sooo 3222
CCCC dfdf#bib.com 200 Berjeya klkl 123
dddd sdsdsd#bib.com 33300 Berjeya penn 33333
This is the data in my table, I need to remove any one of the email address with same client code and account code. For example the email address ragu#bib.com and ragu1#bib.com have the same client code and account code, but email address is different; I need to show only one of the email addresses with all records. Please suggest the suitable query for this.
you can use top 1 with ties as below:
Select top (1) with ties * from yourtable
order by row_number() over(partition by ClientCode,AccountCode order by EmailAddress)
with subquery you can do like below
Select * from (
Select *, RowN = Row_Number() over(partition by ClientCode, AccountCode order by EmailAddress) from yourtable
) a where a.RowN = 1
I am using this code to find duplicates
Code:
select donrId,
donrFirstName,
donrLastName,
donrBirthDate,
ROW_NUMBER() over (
partition by donrFirstName,
donrBirthDate order by donrLastName
) as SequenceNumber
from donors ) as dd
where dd.SequenceNumber > 1
order by donrId
Problem:
I can't filter the partitioned result set on two consecutive numbers e.g 1 and 2
Desired Result:
donrFirstName |donrLastName |donrBirthDate |SequenceNumber
---------------------------------------------------------------
king |kong |25/05/2017 |1
king |kong |25/05/2017 |2
Your query will return only the records with sequence number > 1. To return all records starting with the number 1 you can use COUNT(*) window function, like this:
SELECT
donrId, donrFirstName, donrLastName, donrBirthDate, SequenceNumber
FROM
(SELECT
donrId, donrFirstName, donrLastName, donrBirthDate,
ROW_NUMBER() OVER (PARTITION BY donrFirstName, donrBirthDate ORDER BY donrLastName) AS SequenceNumber
COUNT(*) OVER (PARTITION BY donrFirstName, donrBirthDate) AS cnt
FROM
donors) AS dd
WHERE
dd.cnt > 1
ORDER BY
donrId
I have a SQL statement.
SELECT
ID, LOCATION, CODE,MAX(DATE),FLAG
FROM
TABLE1
WHERE
DATE <= CONVERT(DATETIME,'11-11-2012')
AND EXISTS (SELECT * FROM #TEMP_CODE WHERE TABLE1.CODE = #TEMP_CODE.CODE)
AND ID IN (14, 279)
GROUP BY
ID, LOCATION, CODE
I need rows with the nearest date to the 11-11-2012, but the table returns all the values. What am I doing wrong. Thanks
ID LOCATION CODE DATE FLAG
-------------------------------------------------------------------
14 CAR STREET,UDUPI 234 2012-08-08 00:00:00.000 0
14 CAR STREET,UDUPI 234 2012-08-10 00:00:00.000 1
14 CAR STREET,UDUPI 234 2012-08-14 00:00:00.000 0
279 MADHUGIRI 234 2012-08-08 00:00:00.000 1
279 MADHUGIRI 234 2012-08-11 00:00:00.000 0
I want to show only the rows with dates less than or equal to the given date. The required result is
ID LOCATION CODE DATE FLAG
-------------------------------------------------------------------
14 CAR STREET,UDUPI 234 2012-08-10 00:00:00.000 1
279 MADHUGIRI 234 2012-08-11 00:00:00.000 0
;WITH x AS
(
SELECT ID, Location, Code, Date, Flag,
rn = ROW_NUMBER() OVER
(PARTITION BY ID, Location, Code ORDER BY [Date] DESC)
FROM dbo.TABLE1 AS t1
WHERE [Date] <= '20121111'
AND ID IN (14, 279) -- sorry, missed this
AND EXISTS (SELECT 1 FROM #TEMP_CODE WHERE CODE = t1.CODE)
)
SELECT ID, Location, Code, Date, Flag
FROM x WHERE rn = 1;
This yields:
ID LOCATION CODE [Date] FLAG
--- ---------------- ---- ---------- ----
14 CAR STREET,UDUPI 234 2012-08-14 0
279 MADHUGIRI 234 2012-08-11 0
This disagrees with your required results, but I think those are wrong and I think you should check them.
Use a subquery to get the max date for each ID, and then join that to your table:
SELECT
ID, LOCATION, CODE, DATE, FLAG
FROM
TABLE1
JOIN (
SELECT ID AS SubID, MAX(DATE) AS SubDATE
FROM TABLE1
WHERE DATE < '11/11/2012'
AND EXISTS (SELECT * FROM #TEMP_CODE WHERE TABLE1.CODE = #TEMP_CODE.CODE)
AND ID IN (14, 279)
GROUP BY ID
) AS SUB ON ID = SubID AND DATE = SubDATE
add a Order BY DATE LIMIT 0,2
With the order by you will make the date order by the closest to your condition in where and with the limit will return only the top 2 values!
SET ROWCOUNT 2
SELECT
ID, LOCATION, CODE,MAX(DATE),FLAG
FROM
TABLE1
WHERE
DATE <= CONVERT(DATETIME,'11-11-2012')
AND EXISTS (SELECT * FROM #TEMP_CODE WHERE TABLE1.CODE = #TEMP_CODE.CODE)
AND ID IN (14, 279)
GROUP BY
ID, LOCATION, CODE
ORDER BY DATE