Distinct Counts in a Window Function - sql-server

I'm trying to get a distinct count of rows within a window function that has multiple levels of partitioning. Below is a sample of my data.
PRODUCT_ID
KEY_ID
STORECLUSTER
1000078
120
LLNY
1000078
202
LLF
1000078
202
LLNY
1000078
202
LLNY
I want to look at each PRODUCT_ID and then each unique KEY_ID and determine how many unique STORECLUSTERS there are per KEY_ID. For example PRODUCT_ID 1000078 has two unique KEY_ID's (120 and 202) of which 120 has 1 unique STORECLUSTER and 202 has 2 unqiue STORECLUSTER's. I've tried using a RANK() and DENSE_RANK() but I can't seem to get the partitioning correct. I would like to get a table that looks like this:
PRODUCT_ID
KEY_ID
STORECLUSTER
STORECLUSTER_COUNT
1000078
120
LLNY
1
1000078
202
LLF
2
1000078
202
LLNY
2
1000078
202
LLNY
2

Unfortunately, SQL Server does not support COUNT(DISTINCT as a window function.
So you need to nest window functions. I find the simplest and most efficient method is MAX over a DENSE_RANK, but there are others.
The partitioning clause is the equivalent of GROUP BY in a normal aggregate, then the value you are DISTINCTing goes in the ORDER BY of the DENSE_RANK. So you calculate a ranking, while ignoring tied results, then take the maximum rank, per partition.
SELECT
PRODUCT_ID,
KEY_ID,
STORECLUSTER,
STORECLUSTER_COUNT = MAX(rn) OVER (PARTITION BY PRODUCT_ID, KEY_ID)
FROM (
SELECT *,
rn = DENSE_RANK() OVER (PARTITION BY PRODUCT_ID, KEY_ID ORDER BY STORECLUSTER)
FROM YourTable t
) t;
db<>fiddle

Related

Altering a QUALIFY with an additional criterion

In Snowflake I have this original query which, for a given consumer_ID, produces a list of unique store IDs.
SELECT
t.consumer_id
, t.business_id
, t.store_id
, t.campaign_id
FROM campaigns_mini AS t
QUALIFY ROW_NUMBER() OVER (PARTITION BY t.consumer_id, t.store_id ORDER BY t.campaign_id) = 1
The original purpose was to provide a list that does not duplicate store_id for a given consumer_id. Suppose now I also need to ensure this list does not duplicate business_id as well for a given consumer_ID. Is there an easy way to modify the above?
SELECT
t.consumer_id
, t.business_id
, t.store_id
, t.campaign_id
FROM campaigns_mini AS t
QUALIFY ROW_NUMBER() OVER
(PARTITION BY t.consumer_id
,t.store_id
,t.business_id
ORDER BY t.campaign_id) = 1
The partition by clause forms windows by the combination of all the expressions in the clause.
This will deduplicate by the combination of consumer_id, store_id, and business_id. If this is not what you need, please update with sample input and output to clarify.
So if I make up some data:
WITH campaigns_mini(consumer_id, business_id, store_id, campaign_id) as (
select * from values
(1,10,100,1000),
(1,10,100,1001),
(1,10,101,1002),
(2,20,200,2000)
)
and use your exist SQL
SELECT
t.consumer_id
,t.business_id
,t.store_id
,t.campaign_id
FROM campaigns_mini AS t
QUALIFY ROW_NUMBER() OVER (PARTITION BY t.consumer_id, t.store_id ORDER BY t.campaign_id) = 1
I get
CONSUMER_ID
BUSINESS_ID
STORE_ID
CAMPAIGN_ID
1
10
101
1002
1
10
100
1000
2
20
200
2000
we get the Store not repeated for the Consumer, but as you note you don't want the business repeated ether..
If we change to using business_id instead of store_id we see we get less rows:
SELECT
t.consumer_id
,t.business_id
,t.store_id
,t.campaign_id
FROM campaigns_mini AS t
QUALIFY ROW_NUMBER() OVER (PARTITION BY t.consumer_id, t.business_id ORDER BY t.campaign_id) = 1
ORDER BY 1;
CONSUMER_ID
BUSINESS_ID
STORE_ID
CAMPAIGN_ID
1
10
100
1000
2
20
200
2000
So if we want "no repeating business_id AND no repeating stores" using the Qualify Greg's has proposed will not help, as we are keeping the first for the distinct set of consumer,business, & store:
QUALIFY ROW_NUMBER() OVER (PARTITION BY t.consumer_id, t.business_id, t.store_id ORDER BY t.campaign_id) = 1
which gives:
CONSUMER_ID |BUSINESS_ID |STORE_ID |CAMPAIGN_ID
1 |10 |100 |1000
1 |10 |101 |1002
2 |20 |200 |2000
So the next thing is to think why not keep the only the first of the two sets:
QUALIFY ROW_NUMBER() OVER (PARTITION BY t.consumer_id, t.store_id ORDER BY t.campaign_id) = 1
AND ROW_NUMBER() OVER (PARTITION BY t.consumer_id, t.business_id ORDER BY t.campaign_id) = 1
which for this data works!
CONSUMER_ID
BUSINESS_ID
STORE_ID
CAMPAIGN_ID
1
10
100
1000
2
20
200
2000
but then for this data:
WITH campaigns_mini(consumer_id, business_id, store_id, campaign_id) as (
select * from values
(1,10,100,1000),
(1,10,101,1001),
(1,20,101,1002)
)
there is only one row with business 20, for store 101, but the first 101 store is on campaign 1001, so both those rows are discarded.
CONSUMER_ID
BUSINESS_ID
STORE_ID
CAMPAIGN_ID
1
10
100
1000
So if we use two layers to do the prune, for this data:
select * from (
SELECT
t.consumer_id
,t.business_id
,t.store_id
,t.campaign_id
FROM campaigns_mini AS t
QUALIFY ROW_NUMBER() OVER (PARTITION BY t.consumer_id, t.business_id ORDER BY t.campaign_id) = 1
)
QUALIFY ROW_NUMBER() OVER (PARTITION BY consumer_id, store_id ORDER BY campaign_id) = 1
works:
CONSUMER_ID
BUSINESS_ID
STORE_ID
CAMPAIGN_ID
1
10
100
1000
1
20
101
1002
but if your flip those orders of QUALIFY you are back to just one row..
so as a general problem it cannot be safely solve for all data cases with this pattern...

sql server 2012 merge values from two tables

I have two tables
tblA(sn, ID int pk, name varchar(50), amountA decimal(18,2))
and
tblB(ID int fk, amountB decimal(18,2))
here: tblA occures only once and tblB may occure multiple time
I need the query to display data like:
sn ID name AmountA amountB Balance
1 1001 abc 5000.00 5000.00
2 1002 xyz 10000.00
1002 4000.00 6000.00 (AmountA-AmountB)
3 1003 pqr 15000.00
1003 4000.00
1003 3000.00
1003 2000.00 6000.00 (AmountA-sum(AmountB))
Please ask if any confusion
I tried using lag and lead function but I couldnot get the desire result, Please help.
Since you are using SQL Server 2012, you can use a partition with an aggregate function (SUM):
SELECT t.sn,
t.ID,
t.name,
t.credits AS AmountA,
t.debits AS amountB,
SUM(t.credits - t.debits) OVER (PARTITION BY t.ID ORDER BY t.debits, t.credits) AS Balance
FROM
(
SELECT sn,
ID,
name,
AmountA AS credits,
0 AS debits
FROM tblA
UNION ALL
SELECT 0 AS sn,
ID,
NULL AS name,
0 AS credits,
amountB AS debits
FROM tblB
) t
ORDER BY t.ID,
t.debits,
t.credits
Explanation:
Since the records in tables A and B each represent a single transaction (i.e. a credit or debit), using a UNION query to bring both sets of data into a single table works well here. After this, I compute a rolling sum using the difference between credit and debit, for each record, for each ID partition group. The ordering is chosen such that credits appear at the top of each partition while debits appear on the bottom.

Can I select data ordered by a field a way that it values would be different in consecutive rows?

I have a table of products with manufacturers field and I need to extract data so that two manufacturers with same id won't stay together.
For example:
id prod_id manf_id
1 100 300
2 101 300
3 102 400
4 103 400
5 103 500
So that result would look smth like:
1 100 300
3 102 400
5 103 500
2 101 300
4 103 400
It doesn't matter too much in the example above if there'll be sequences with ids that has same neighbours (300-400-300) but it would be more interesting to see more complex logic so that a single id would have only one neighbour id of the same type (300-400-500).
If such ordering could not be applied - show data with same consecutive values (300-300-300).
Something like this.
SELECT Row_number()OVER(partition BY manf_id ORDER BY id) rn, *
FROM Yourtable
ORDER BY rn,id
Try this.
;with cte as (
select id,prod_id,manuf_id,ROW_NUMBER() over(partition by manuf_id order by id) as row_no
from products
)
select id,prod_id,manuf_id from cte
order by row_no

Removing Duplicates of two columns in a query

I have a select * query which gives lots of row and lots of columns of results. I have an issue with duplicates of one column A when given the same value of another column B that I would like to only include one of.
Basically I have a column that tells me the "name" of object and another that tells me the "number". Sometimes I have an object "name" with more than one entry for a given object "number". I only want distinct "numbers" within a "name" but I want the query to give the entire table when this is true and not just these two columns.
Name Number ColumnC ColumnD
Bob 1 93 12
Bob 2 432 546
Bob 3 443 76
This example above is fine
Name Number ColumnC ColumnD
Bob 1 93 12
Bob 2 432 546
Bill 1 443 76
Bill 2 54 1856
This example above is fine
Name Number ColumnC ColumnD
Bob 1 93 12
Bob 2 432 546
Bob 2 209 17
This example above is not fine, I only want one of the Bob 2's.
Try it if you are using SQL 2005 or above:
With ranked_records AS
(
select *,
ROW_NUMBER() OVER(Partition By name, number Order By name) [ranked]
from MyTable
)
select * from ranked_records
where ranked = 1
If you just want the Name and number, then
SELECT DISTINCT Name, Number FROM Table1
If you want to know how many of each there are, then
SELECT Name, Number, COUNT(*) FROM Table1 GROUP BY Name, Number
By using a Common Table Expression (CTE) and the ROW_NUMBER OVER PARTION syntax as follows:
WITH
CTE AS
(
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY Name, Number ORDER BY Name, Number) AS R
FROM
dbo.ATable
)
SELECT
*
FROM
CTE
WHERE
R = 1
WITH
CTE AS
(
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY Plant, BatchNumber ORDER BY Plant, BatchNumber) AS R
FROM dbo.StatisticalReports WHERE dbo.StatisticalReports. \!"FermBatchStartTime\!" >= DATEADD(d,-90, getdate())
)
SELECT
*
FROM
CTE
WHERE
R = 1
ORDER BY dbo.StatisticalReports.Plant, dbo.StatisticalReports.FermBatchStartTime

T-SQL getting all unique groups with their usage count

How do I find the unique groups that are present in my table, and display how often that type of group is used?
For example (SQL Server 2008R2)
So, I would like to find out how many times the combination of
PMI 100
RT 100
VT 100
is present in my table and for how many itemid's it is used;
These three form a group because together they are assigned to a single itemid. The same combination is assigned to id 2527 and 2529, so therefore this group is used at least twice. (usagecount = 2)
(and I want to know that for all types of groups that are appearing)
The entire dataset is quite large, about 5.000.000 records, so I'd like to avoid using a cursor.
The number of code/pct combinations per itemid varies between 1 and 6.
The values in the "code" field are not known up front, there are more than a dozen values on average
I tried using pivot, but I got stuck eventually and I also tried various combinations of GROUP-BY and counts.
Any bright ideas?
Example output:
code pct groupid usagecount
PMI 100 1 234
RT 100 1 234
VT 100 1 234
CD 5 2 567
PMI 100 2 567
VT 100 2 567
PMI 100 3 123
PT 100 3 123
VT 100 3 123
RT 100 4 39
VT 100 4 39
etc
Just using a simple group:
SELECT
code
, pct
, COUNT(*)
FROM myTable
GROUP BY
code
, pct
Not too sure if that's more like what you're looking for:
select
uniqueGrp
, count(*)
from (
select distinct
itemid
from myTable
) as I
cross apply (
select
cast(code as varchar(max)) + cast(pct as varchar(max)) + '_'
from myTable
where myTable.itemid = I.itemid
order by code, pct
for xml path('')
) as x(uniqueGrp)
group by uniqueGrp
Either of these should return each combination of code and percentage with a group id for the code and the total number of instances of the code against it. You can use them for also adding the number of instances of the specific code/pct combo too for determining % contribution etc
select
distinct
t.code, t.pct, v.groupcol, v.vol
from
[tablename] t
inner join (select code, rank() over(order by count(*)) as groupcol,
count(*) as vol from [tablename] s
group by code) v on v.code=t.code
or
select
t.code, t.pct, v.groupcol, v.vol
from
(select code, pct from [tablename] group by code, pct) t
inner join (select code, rank() over(order by count(*)) as groupcol,
count(*) as vol from [tablename] s
group by code) v on v.code=t.code
Grouping by Code, and Pct should be enough I think. See the following :
select code,pct,count(p.*)
from [table] as p
group by code,pct

Resources