SQL Server : join with top record selection

SQL Server : join with top record selection - sql-server

Clientcode Emailaddress Accountcode clientname phoneno
----------------------------------------------------------------
AAA ragu#bib.com 100 Berjeya 90909090
AAA ragu1#bib.com 100 Berjeya 90909090
AAABBB jkkjkj#bib.com 200 Berjeya sooo 3222
CCCC dfdf#bib.com 200 Berjeya klkl 123
dddd sdsdsd#bib.com 33300 Berjeya penn 33333
This is the data in my table, I need to remove any one of the email address with same client code and account code. For example the email address ragu#bib.com and ragu1#bib.com have the same client code and account code, but email address is different; I need to show only one of the email addresses with all records. Please suggest the suitable query for this.

you can use top 1 with ties as below:
Select top (1) with ties * from yourtable
order by row_number() over(partition by ClientCode,AccountCode order by EmailAddress)
with subquery you can do like below
Select * from (
Select *, RowN = Row_Number() over(partition by ClientCode, AccountCode order by EmailAddress) from yourtable
) a where a.RowN = 1

Related

SQL rank starting with 2

select
TRANSACTION_DT,
SUBSCRIPTION_ID,
KEY_ID,
EMAIL,
PURCHASE_PRODUCT_ID,
rank () over (partition by SUBSCRIPTION_ID,KEY_ID order by TRANSACTION_DT desc) as rnk
from "DC"."BW_BOOKINGS"
where email='abc#gmail.com'
The above SQL statement result table looks like this
TRANSACTION_DT
SUBSCRIPTION_ID
KEY_ID
EMAIL
PURCHASE_PRODUCT_ID
RNK
2021-07-14 09:42:47.710 -0700
S107283
122693
abc#gmail.com
143510
1
2021-07-14 09:42:47.710 -0700
S107283
122693
abc#gmail.com
139724
1
2020-07-14 09:22:14.033 -0700
S107283
122693
abc#gmail.com
143510
3
2020-07-14 09:22:14.033 -0700
S107283
122693
abc#gmail.com
139724
3
But when I change the SQL statement to this
select * from (
select
TRANSACTION_DT,
SUBSCRIPTION_ID,
KEY_ID,
EMAIL,
PURCHASE_PRODUCT_ID,
rank () over (partition by SUBSCRIPTION_ID,KEY_ID order by TRANSACTION_DT desc) as rnk
from "DC"."BW_BOOKINGS"
) t
where email='abc#gmail.com'
My table looks like this:
TRANSACTION_DT
SUBSCRIPTION_ID
KEY_ID
EMAIL
PURCHASE_PRODUCT_ID
RNK
2021-07-14 09:42:47.710 -0700
S107283
122693
abc#gmail.com
143510
3
2021-07-14 09:42:47.710 -0700
S107283
122693
abc#gmail.com
139724
3
2020-07-14 09:22:14.033 -0700
S107283
122693
abc#gmail.com
139724
5
2020-07-14 09:22:14.033 -0700
S107283
122693
abc#gmail.com
143510
5
I want to get results from the table only when rnk=1, but the rank in table 2 starts with 3 which is not helping me to filter out results on where rnk=1. Also, Can anyone tell me why the order of PURCHASE_PRODUCT_ID chases in row 3.

The second solution is ranking ALL rows, and then you throw away rows not match the email address, thus you have already "lost 1"
The filter need to happen before the ranking function, not after it.
So if you want only RANK = 1 then you can qualify:
select
TRANSACTION_DT,
SUBSCRIPTION_ID,
KEY_ID,
EMAIL,
PURCHASE_PRODUCT_ID,
rank () over (partition by SUBSCRIPTION_ID,KEY_ID order by TRANSACTION_DT desc) as rnk
from "DC"."BW_BOOKINGS"
qualify rnk = 1
if you want only the RANK 1 rows for email X then do the filtering in the WHERE, then QUALIFY to keep the wanted rows:
select
TRANSACTION_DT,
SUBSCRIPTION_ID,
KEY_ID,
EMAIL,
PURCHASE_PRODUCT_ID,
rank () over (partition by SUBSCRIPTION_ID,KEY_ID order by TRANSACTION_DT desc) as rnk
from "DC"."BW_BOOKINGS"
where email='abc#gmail.com'
qualify rnk = 1
Greg's note, "if you don't want the rank you don't need it" looks like:
select
TRANSACTION_DT,
SUBSCRIPTION_ID,
KEY_ID,
EMAIL,
PURCHASE_PRODUCT_ID
from "DC"."BW_BOOKINGS"
where email='abc#gmail.com'
qualify rank () over (partition by SUBSCRIPTION_ID,KEY_ID order by TRANSACTION_DT desc) = 1

Altering a QUALIFY with an additional criterion

In Snowflake I have this original query which, for a given consumer_ID, produces a list of unique store IDs.
SELECT
t.consumer_id
, t.business_id
, t.store_id
, t.campaign_id
FROM campaigns_mini AS t
QUALIFY ROW_NUMBER() OVER (PARTITION BY t.consumer_id, t.store_id ORDER BY t.campaign_id) = 1
The original purpose was to provide a list that does not duplicate store_id for a given consumer_id. Suppose now I also need to ensure this list does not duplicate business_id as well for a given consumer_ID. Is there an easy way to modify the above?

SELECT
t.consumer_id
, t.business_id
, t.store_id
, t.campaign_id
FROM campaigns_mini AS t
QUALIFY ROW_NUMBER() OVER
(PARTITION BY t.consumer_id
,t.store_id
,t.business_id
ORDER BY t.campaign_id) = 1
The partition by clause forms windows by the combination of all the expressions in the clause.
This will deduplicate by the combination of consumer_id, store_id, and business_id. If this is not what you need, please update with sample input and output to clarify.

So if I make up some data:
WITH campaigns_mini(consumer_id, business_id, store_id, campaign_id) as (
select * from values
(1,10,100,1000),
(1,10,100,1001),
(1,10,101,1002),
(2,20,200,2000)
)
and use your exist SQL
SELECT
t.consumer_id
,t.business_id
,t.store_id
,t.campaign_id
FROM campaigns_mini AS t
QUALIFY ROW_NUMBER() OVER (PARTITION BY t.consumer_id, t.store_id ORDER BY t.campaign_id) = 1
I get
CONSUMER_ID
BUSINESS_ID
STORE_ID
CAMPAIGN_ID
1
10
101
1002
1
10
100
1000
2
20
200
2000
we get the Store not repeated for the Consumer, but as you note you don't want the business repeated ether..
If we change to using business_id instead of store_id we see we get less rows:
SELECT
t.consumer_id
,t.business_id
,t.store_id
,t.campaign_id
FROM campaigns_mini AS t
QUALIFY ROW_NUMBER() OVER (PARTITION BY t.consumer_id, t.business_id ORDER BY t.campaign_id) = 1
ORDER BY 1;
CONSUMER_ID
BUSINESS_ID
STORE_ID
CAMPAIGN_ID
1
10
100
1000
2
20
200
2000
So if we want "no repeating business_id AND no repeating stores" using the Qualify Greg's has proposed will not help, as we are keeping the first for the distinct set of consumer,business, & store:
QUALIFY ROW_NUMBER() OVER (PARTITION BY t.consumer_id, t.business_id, t.store_id ORDER BY t.campaign_id) = 1
which gives:
CONSUMER_ID |BUSINESS_ID |STORE_ID |CAMPAIGN_ID
1 |10 |100 |1000
1 |10 |101 |1002
2 |20 |200 |2000
So the next thing is to think why not keep the only the first of the two sets:
QUALIFY ROW_NUMBER() OVER (PARTITION BY t.consumer_id, t.store_id ORDER BY t.campaign_id) = 1
AND ROW_NUMBER() OVER (PARTITION BY t.consumer_id, t.business_id ORDER BY t.campaign_id) = 1
which for this data works!
CONSUMER_ID
BUSINESS_ID
STORE_ID
CAMPAIGN_ID
1
10
100
1000
2
20
200
2000
but then for this data:
WITH campaigns_mini(consumer_id, business_id, store_id, campaign_id) as (
select * from values
(1,10,100,1000),
(1,10,101,1001),
(1,20,101,1002)
)
there is only one row with business 20, for store 101, but the first 101 store is on campaign 1001, so both those rows are discarded.
CONSUMER_ID
BUSINESS_ID
STORE_ID
CAMPAIGN_ID
1
10
100
1000
So if we use two layers to do the prune, for this data:
select * from (
SELECT
t.consumer_id
,t.business_id
,t.store_id
,t.campaign_id
FROM campaigns_mini AS t
QUALIFY ROW_NUMBER() OVER (PARTITION BY t.consumer_id, t.business_id ORDER BY t.campaign_id) = 1
)
QUALIFY ROW_NUMBER() OVER (PARTITION BY consumer_id, store_id ORDER BY campaign_id) = 1
works:
CONSUMER_ID
BUSINESS_ID
STORE_ID
CAMPAIGN_ID
1
10
100
1000
1
20
101
1002
but if your flip those orders of QUALIFY you are back to just one row..
so as a general problem it cannot be safely solve for all data cases with this pattern...

How to filter columns with multiple values for each ID in SQL Server

I have a result set as below and I want to select a single record when the same ID has 2 records with different values for Age and status column, for example
Please see the result set below where ID, name, country name coming from table A and Age, Active status coming from b table
ID name country Age status
----------------------------------------------
1 Prasad India NULL NULL
2 John USA NULL NULL
3 GREG AUS NULL NULL
4 RAVI India NULL NULL
4 RAVI India 18 Years and Above 1

Go with this:
Select *
From
(
Select t2.*,
ROW_NUMBER() over(partition by ID order by name,country,Age, status desc) as rn
From yourtable t2
)
Where rn = 1

Removing Duplicates of two columns in a query

I have a select * query which gives lots of row and lots of columns of results. I have an issue with duplicates of one column A when given the same value of another column B that I would like to only include one of.
Basically I have a column that tells me the "name" of object and another that tells me the "number". Sometimes I have an object "name" with more than one entry for a given object "number". I only want distinct "numbers" within a "name" but I want the query to give the entire table when this is true and not just these two columns.
Name Number ColumnC ColumnD
Bob 1 93 12
Bob 2 432 546
Bob 3 443 76
This example above is fine
Name Number ColumnC ColumnD
Bob 1 93 12
Bob 2 432 546
Bill 1 443 76
Bill 2 54 1856
This example above is fine
Name Number ColumnC ColumnD
Bob 1 93 12
Bob 2 432 546
Bob 2 209 17
This example above is not fine, I only want one of the Bob 2's.

Try it if you are using SQL 2005 or above:
With ranked_records AS
(
select *,
ROW_NUMBER() OVER(Partition By name, number Order By name) [ranked]
from MyTable
)
select * from ranked_records
where ranked = 1

If you just want the Name and number, then
SELECT DISTINCT Name, Number FROM Table1
If you want to know how many of each there are, then
SELECT Name, Number, COUNT(*) FROM Table1 GROUP BY Name, Number

By using a Common Table Expression (CTE) and the ROW_NUMBER OVER PARTION syntax as follows:
WITH
CTE AS
(
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY Name, Number ORDER BY Name, Number) AS R
FROM
dbo.ATable
)
SELECT
*
FROM
CTE
WHERE
R = 1

WITH
CTE AS
(
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY Plant, BatchNumber ORDER BY Plant, BatchNumber) AS R
FROM dbo.StatisticalReports WHERE dbo.StatisticalReports. \!"FermBatchStartTime\!" >= DATEADD(d,-90, getdate())
)
SELECT
*
FROM
CTE
WHERE
R = 1
ORDER BY dbo.StatisticalReports.Plant, dbo.StatisticalReports.FermBatchStartTime

SQL Server 2008 how to select top [column value] and random record?

I'm using SQL Server 2008, I want select random row record, and the total number of record is depend on another table's column value, how to do this?
My SQL statement is something like this, but wrong..
select top b.number a.name, a.link_id
from A a
left join B b on b.link_id = a.link_id
order by newid()
Here are my tables and the expected result.
Table A:
name link_id
james 100
albert 100
susan 100
simon 101
tom 101
fion 101
Table B:
link_id number
100 2
101 1
Expected result:
when run 1st time, result may be:
name link_id
james 100
susan 100
fion 101
2nd time result may be:
albert 100
susan 100
simon 101
3rd time could be:
james 100
albert 100
fion 101
Explaination
Refer to table B, link_id: 100, number: 2
meaning that Table A should select out 2 random record for link_id = 100
and need to select 1 random record for link_id=101

You can use the ROW_NUMBER() function:
SELECT A.name, A.link_id
FROM(
SELECT name,link_id, ROW_NUMBER()OVER(PARTITION BY link_id ORDER BY NEWID()) rn
FROM dbo.tblA
) AS A
JOIN dbo.tblB AS B
ON A.link_id = B.link_id
WHERE A.rn <= B.number;
Here is a SqlFiddle to show this in action: http://sqlfiddle.com/#!3/92eac/2

Try this:
SELECT a.*
FROM b
CROSS APPLY
(
SELECT TOP (b.number) a.*
FROM a
WHERE a.link_id = b.link_id
ORDER BY
NEWID()
) a
Also see: SQLFiddle

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

SQL Server : join with top record selection - sql-server

Related

SQL rank starting with 2

Altering a QUALIFY with an additional criterion

How to filter columns with multiple values for each ID in SQL Server

Removing Duplicates of two columns in a query

SQL Server 2008 how to select top [column value] and random record?

Categories

Resources