Distinct with Count and SQl Server 2005 - sql-server

Trying to work on a query that will return the top 3 selling products with the three having a distinct artist. Im getting stuck on getting the unique artist.
Simplified Table schema
Product
ProductID
Product Name
Artist Name
OrderItem
ProductID
Qty
So results would look like this...
PID artist qty
34432, 'Jimi Hendrix', 6543
54833, 'stevie ray vaughan' 2344
12344, 'carrie underwood', 1

Use this:
with summed_sales_of_each_product as
(
select p.artist_name, p.product_id, sum(i.qty) as total
from product p join order_item i
on i.product_id = p.product_id
group by p.artist_name, p.product_id
),
each_artist_top_selling_product as
(
select x_in.artist_name, x_in.product_id, x_in.total
from summed_sales_of_each_product x_in where total =
(select max(x_out.total)
from summed_sales_of_each_product x_out
where x_out.artist_name = x_in.artist_name)
)
select top 3
artist_name, product_id, total
from each_artist_top_selling_product
order by total desc
But you cannot stop at that query, how about if there are two products on one artist that are ties on highest selling? This is how the data like this...
beatles yesterday 1000
beatles something 1000
elvis jailbreak rock 800
nirvana lithium 600
tomjones sexbomb 400
...will result to following using the above query:
beatles yesterday 1000
beatles something 1000
elvis jailbreak rock 800
Which one to choose? yesterday or something? Since you cannot arbitrarily chose one over the other, you must list both. Also, what if the top 10 highest selling belongs to beatles and are ties, each with a quantity of 1000? Since that is the very best thing you are avoiding(i.e. reporting same artist on top 3), you have to amend the query so the top 3 report will look like this:
beatles yesterday 1000
beatles something 1000
elvis jailbreak rock 800
nirvana lithium 600
To Amend:
with summed_sales_of_each_product as
(
select p.artist_name, p.product_id, sum(i.qty) as total
from product p join order_item i
on i.product_id = p.product_id
group by p.artist_name, p.product_id
),
each_artist_top_selling_product as
(
select x_in.artist_name, x_in.product_id, x_in.total
from summed_sales_of_each_product x_in
where x_in.total =
(select max(x_out.total)
from summed_sales_of_each_product x_out
where x_out.artist_name = x_in.artist_name)
),
top_3_total as
(
select distinct top 3 total
from each_artist_top_selling_product
order by total desc
)
select artist_name, product_id, total
from each_artist_top_selling_product
where total in (select total from top_3_total)
order by total desc
How about if the beatles has another product which has 900 qty? Will the above query still work? Yes, it will still work. Since the top_3 CTE only concerns itself from the already filtered top qty on each artist. So this source data...
beatles yesterday 1000
beatles something 1000
beatles and i love her 900
elvis jailbreak rock 800
nirvana lithium 600
tomjones sexbomb 400
...will still result to following:
beatles yesterday 1000
beatles something 1000
elvis jailbreak rock 800
nirvana lithium 600

If I have understood your schema correctly, you should be able to do it like this:
select top 3 * from(
select p.ProductId, p.ArtistName, sum(o.qty) as qty from Product p, OrderItem o
where p.ProductId = o.ProductId
group by p.productId, p.ArtistName
order by sum(o.qty)
)

I don't know what you want to do if an Artist has two top-ranked products with identical sales--this will return two in case of a tie.
If you want to add another criteria, such as "most recent", you have to add that to both subqueries.
select top 3 sales_by_item.ProductID,
sales_by_item.Artist,
sales_by_item.Qty
from
(
select * from product x
inner join OrderItem y
on x.productid = y.productid
group by productid, Artist
) sales_by_item
inner join
(
select artist, max(qty) as maxqty
from product x
inner join OrderItem y
on x.productid = y.productid
group by artist
) max_by_artist
on sales_by_item.artist = max_by_artist.artist
and sales_by_item.qty = max_by_artist.maxqty
order by sales_by_item.qty
Edited to make subquery names more descriptive

Second attempt. I’m not in a position to test this code, and I’m not sure if I’ve got that “partition by” clause configured correctly. The idea is:
The inner query gets the sum of Qty for all product/artists, and uses the row_number() function to number them starting with the largest, and resets the ordering for each artist. (This can be done, but my syntax may be off.)
The outer query picks out the first (largest) item for each artist, and returns only the first three (ordere by Qty)
If an artists top two products tie for total Qty, I arbitrarily break the tie in favor of the “earliest” album.
(I try to avoid using "Top n", but it's late and I don't want to tackle another row_number() function.)
SELECT top 3
ProductId
,ArtistName
,Qty
from (-- Products + Artists by total qty
select
pr.ProductId
,pr.ArtistName
,sum(oi.Qty) Qty
,row_number() over (partition by pr.ArtistName order by pr.ArtistName, sum(oi.Qty) desc, pr.ProductId) Ranking
from Product pr
inner join OrderItem oi
on oi.ProductID = pr.ProductID
group by pr.ProductId, pr.ArtistName) BestSellers
where Ranking = 1
group by ProductId, ArtistName) BestArtists
order by Qty desc

Analyzing your request, it sounds like the results should be the highest product quantity for the top three artists. So, if Jimi Hendrix has the top 10 product quantities and Stevie Ray Vaughan is 11th, you want Jimi with his highest product then Stevie with his highest product.
With ProductRanksForArtists As
(
Select P.ProductId, P.ArtistName, Sum(O.Qty) As Total
, ROW_NUMBER OVER( PARTITION BY P.ArtistName ORDER BY Sum(O.Qty) DESC ) As ProductRank
From Product As P
Join OrderItem As O
On O.ProductId = P.ProductId
Group By P.ProductId, P.ArtistName
)
, HighestProductForArtists As
(
Select ProductId, ArtistName, Total
, ROW_NUMBER OVER( ORDER BY Total DESC ) As TotalRank
From ProductRanksForArtists
Where ProductRank = 1
)
Select ProductId, ArtistName, Total
From HighestProductForArtists
Where TotalRank <= 3

Try this
Select top 3 artist, count(artist) from tablename group by artist order by artist count(artist) desc

Related

Top 2 of each group ordered SQL Server

I have 8 records
Name Value Product
--------------------------
Abraham 4 A
Lincoln 6 B
Abraham 4 C
Lincoln 2 D
Lincoln 3 E
Lincoln 2 F
Abraham 1 G
Abraham 9 H
Abraham has 4 records and Lincoln too.
I need, from the SQL, the top 2 values from Abraham and the top 2 values from Lincoln
I've tried:
SELECT TOP 2 WITH TIES
NAME,
VALUE,
PRODUCT
FROM
blabla
JOIN
blabla...
ORDER BY
NAME
This takes 2 of each name, but not the most valuables, because I'm not setting a VALUE order by desc.
But I can't put the VALUE order by desc because of the top with ties..
What I need is something like making my TOP WITH TIES works only for ORDER BY NAME (if there is a way to do this, like restricting the top with ties to the first order by), but I need only the 2 max values of them.
The final result I need:
Abraham 9 H
Abraham 4 C
Lincoln 6 B
Lincoln 3 E
PS: This is only a simulation of what I want, the original query has like over 100 lines with union and stuff, so i think it was better to simplify.
You can do this with ROW_NUMBER and partitioning.
select Name
, Value
, Product
from
(
select Name
, Value
, Product
, RowNum = ROW_NUMBER() over(partition by Name order by value desc)
from SomeTable
) x
where x.RowNum <= 2
order by x.Name
, x.Value desc
A solution is a CROSS APPLY with TOP and ORDER BY. You start with the list of names and call a "function" that returns the TOP for each name.
SELECT
N.Name,
V.Value,
V.Product
FROM
(SELECT DISTINCT Y.Name FROM YourTable AS Y) AS N
CROSS APPLY (
SELECT TOP 2 WITH TIES
P.Value,
P.Product
FROM
YourTable AS P
WHERE
N.Name = P.Name
ORDER BY
P.Value DESC
) AS V
Keep in mind that using WITH TIES can make the TOP return more than the supplied number of rows (2 for your example), in case that there are ties of values.
If you need to show names which don't have products (in this example it won't work because both come from the same table), you can switch the CROSS APPLY for an OUTER APPLY, which behaves similar to a LEFT JOIN as it will return NULL values (but not 2 rows!).

SQL Server 2008 Is it Possible to Have Select Top Return Nulls

(Select top 1 pvd.Code from PatientVisitDiags pvd
where pvd.PatientVisitId = pv.PatientVisitId
Order By pvd.Listorder) as "DX1",
(Select top 1 a.code from (Select top 2 pvd.Code,pvd.ListOrder from PatientVisitDiags pvd
where pvd.PatientVisitId = pv.PatientVisitId
Order By pvd.Listorder)a order by a.ListOrder DESC ) as "DX2",
(Select top 1 a.code from (Select top 3 pvd.Code,pvd.ListOrder from PatientVisitDiags pvd
where pvd.PatientVisitId = pv.PatientVisitId
Order By pvd.Listorder)a order by a.ListOrder DESC ) as "DX3",
(Select top 1 a.code from (Select top 4 pvd.Code,pvd.ListOrder from PatientVisitDiags pvd
where pvd.PatientVisitId = pv.PatientVisitId
Order By pvd.Listorder)a order by a.ListOrder DESC ) as "DX4",
(Select top 1 a.code from (Select top 5 pvd.Code,pvd.ListOrder from PatientVisitDiags pvd
where pvd.PatientVisitId = pv.PatientVisitId
Order By pvd.Listorder)a order by a.ListOrder DESC ) as "DX5"
The above code is what I am using currently (It is not optimal but is only being used once for a one time Data Export).
In the database that we are currently exporting from, there is a table PatientVisitDiags that has columns "ListOrder" and "Code". There can be between 1 and 5 codes. The ListOrder holds the number of that code. For example:
ListOrder|Code |
1 |M51.27 |
2 |M54.17 |
3 |G83.4 |
I am trying to export the Code to its corresponding Column in the new table(DX1,DX2..etc). If I sort by ListOrder I can get them in the order I need (Row 1 to DX1 | Row 2 to DX2 etc.) However when I run the above SQL code, If the source table only has 3 Codes DX4 and DX5 will repeat DX3. For Example:
DX1 |DX2 |DX3 |DX4 |DX5
M51.27 |M54.17 |G83.4 |G83.4 |G83.4
Is there a way to have TOP return NULL values if you Select TOP more than what is given? SQL Sever 2008 does not allow for OFFSET/FETCH, this is what I normally would have done given the option to select individual rows.
TL:DR
ID | Name
1 | Joe
2 | Eric
3 | Steve
4 | John
If I have a table like above and run
SELECT TOP 5 Name FROM Table
Is there anyway to return?
Joe
Eric
Steve
John
NULL
What you're really doing is pivoting. So pivot! Try this little query:
WITH Top5 AS (
SELECT TOP 5
Dx = 'DX' + Convert(varchar(11), Row_Number() OVER (ORDER BY pvd.Listorder)),
pvd.Code
FROM dbo.PatientVisitDiags pvd
WHERE pvd.PatientVisitId = #patientVisitId
)
SELECT *
FROM
Top5 t
PIVOT (Max(Code) FOR Dx IN (DX1, DX2, DX3, DX4, DX5)) p
;
To answer your second question about getting an unpivoted rowset, basically do the same thing but provide the 5 rows somehow and left join to the desired data.
WITH Data AS (
SELECT TOP 5
Seq = Row_Number() OVER(ORDER BY ID),
Name
FROM dbo.Table
ORDER BY ID
)
SELECT
n.Seq,
t.Name
FROM
(VALUES
(1), (2), (3), (4), (5) -- or a numbers-generating CTE perhaps
) n (Seq)
LEFT JOIN Top 5 t
ON n.Seq = t.Seq
;
Side note
The fact that you're doing this:
where pvd.PatientVisitId = pv.PatientVisitId
tells me you're not using ANSI joins. Stop. Don't do that any more. Put this join condition in the ON clause of a JOIN. It's the year 2016... why are you using join syntax from the last century?
Oh, and prefix the schema on the table names. Look it up--you'll find actual performance reasons why you should do that. It's not just about the time taken to find the correct schema, but also about the execution plan cache...
one at a time - answering the last question
create a table with a bunch of null
select top (5) col
from
(
select col from table1
union
select nulCol from nullTable
) tt
order by tt.col

Select all rows for the first N distinct child table rows

It seems like this should be a common problem with a simple solution, but I haven't found it.
I would like to compute child_order, which is the order of appearance of distinct child table rows as is shown below in the following data:
child_order PK1 PK2 ACCESS ACCESS_ID
1 99 Al NULL NULL
2 55 Charles Accounts 1
2 55 Charles Desktop 2
2 55 Charles Printer 3
2 55 Charles Servers 4
2 55 Charles VMs 5
3 66 Charles Desktop 2
3 66 Charles VMs 5
4 22 Chris Desktop 2
4 22 Chris Printer 3
4 22 Chris Servers 4
5 89 Evan Desktop 2
Retrieved by a query like:
SELECT sub1.*
FROM (
SELECT ??? as child_order, sub2.*
FROM (
SELECT ct.PK1, ct.PK2, pt1.ACCESS, pt1.ACCESS_ID
FROM child_table ct
LEFT JOIN some_linktable lt ON lt.child_id = ct.id
LEFT JOIN parent_table1 pt1 ON lt.parent_id = pt1.id
WHERE ct.PK2 IN ('Charles', 'Evan', 'Al', 'Chris')
ORDER BY ct.PK2, pt1.ACCESS -- Order must be preserved
) sub2
) sub1
WHERE child_order < 10 AND (other_conditions)
I can use subqueries, aggregates, analytics, etc. but not really CTEs/"WITH" statements or temporary tables because of the complexity of generating SQL for them dynamically.
Specifically, I am generating pagination SQL (for several DBMSs) for search results from a query joining several tables.
I am trying to figure out how to simply show the top N rows, not counting repeats due to a join (e.g. Chris counts as only one row. Access shows "Desktop, Printer, Servers").
I've tried DENSE_RANK() OVER (ORDER BY PK1, PK2), but of course I get ranking in PK1 PK2 order, which is useless for the WHERE clause. Al, for example, would get a value higher than 1.
I've tried DENSE_RANK() OVER (ORDER BY PK2, ACCESS), but it enumerates only the search terms, not the child table rows.
I've tried DENSE_RANK() OVER (PARTITION BY PK2, ACCESS ORDER BY (SELECT NULL)) (to get DENSE_RANK to use the row order it is given, which is how I want to rank values) but only "1" is returned.
I'll omit my other "try random stuff"-phase attempts.
I would like to avoid having a SELECT DISTINCT PK1, PK2 WHERE (search) ORDER BY (sortorder) subquery because there may be zero or very many primary key fields so dynamic SQL generation would be tricky and, additionally, I suspect the performance would suck with all the WHERE sub3.field1 = sub2.field1 AND sub3.field2 = sub2.field2... checks.
Despite your misgivings about SELECT DISTINCT, this might be the best choice for a sub-query:
SELECT row_number() OVER (ORDER BY PK2) AS child_order, PK1, PK2
FROM (
SELECT DISTINCT PK1, PK2
FROM child_table
WHERE PK2 IN ('Charles', 'Evan', 'Al', 'Chris')
ORDER BY PK2
LIMIT 9) sub2
The child_order field depends only on table child_table and you want only 9 rows of them, so compute the child_order in a sub-query on that table only. After you have that you can join to the other tables. If you have an index on child_table(PK1, PK2) this should be a very fast index-only search. It takes some of the filtering and the limiting inside, so the enveloping query is much simpler:
SELECT sub1.child_order, PK1, PK2, pt1.ACCESS, pt1.ACCESS_ID
FROM child_table ct
JOIN (
SELECT row_number() OVER (ORDER BY PK2) AS child_order, PK1, PK2
FROM (
SELECT DISTINCT PK1, PK2
FROM child_table
WHERE PK2 IN ('Charles', 'Evan', 'Al', 'Chris')
ORDER BY PK2
LIMIT 9) sub2
) sub1 USING (PK1, PK2)
LEFT JOIN some_linktable lt ON lt.child_id = ct.id
LEFT JOIN parent_table1 pt1 ON lt.parent_id = pt1.id
WHERE <other conditions>
ORDER BY sub1.child_order, pt1.ACCESS; -- Faster to order by int

SQL MULTIPLE JOIN-LEFT OUTER JOIN [Find Values between date range]

I have 4 tables to use JOIN query in MSSQL
1. Sales: ItemID, AreaCode, IndID,Cost
2. Ind: AreaCode, IndID, Insite,InLocation
3. ItemPrice: ItemID, AreaID, IndID, ActicationDate, Price
4. Invoice: ItemID, IndID, AreaId, InvoiceDate
I want to get IndID and AreaCode from table 1 on basis of that I want to find records from Ind table and from those records look for Insite and AreaCode with InLocation=’ ’ from same table.
From that records get IndID .
Using that IndID and ItemID, find that Price for that particular Item from ItemPrice Table. The price for that particular item is from the activationDate is applicable to specific invoice on basis of date.
I.e. If Activation Date is 1st Jan and Price is $5 -->And Invoice date is 2nd then Price is $5. But if next day ActivationDate is 2nd Jan and Price is updated to $7. and Invoice is created on 3rd Jan the price should be $7.
I.e.
Getting this:
Looking for something like below
SELECT DISTINCT sl.AreaCode,sl.IndID,sl.cost,
id.IndID, id.AreaCode, id.InLocation, id.InSite,
id2.IndID, id2.AreaCode, id2.InLocation, id2.InSite,
ip.ItemId,ip.price,ip.ActivationDate,
iv.InvoiceDate
from Sales s
LEFT OUTER JOIN Ind id
ON s.IndID= id.IndID
AND s.AreaCode=id.AreaCode
LEFT OUTER JOIN Ind id2
ON id.AreaCode=id2.AreaCode
AND id.IndID=id2.IndID
AND id.InSite=id2.InSite
AND id.InLocation = ''
LEFT OUTER JOIN ItemPrice ip
ON s.ItemId=ip.ItemId
AND id2.AreaCode=ip.AreaCode
AND id2.IndID=ip.IndID
LEFT OUTER JOIN Invoice iv
ON s.ItemId=iv.ItemId
AND iv.InvoiceDate >= ip.ActivationDate
Finally cracked.
select *,
(select top 1 price
from itemPrice
where itemPrice.ActivationDate <= t.InvoiceDate
order by itemPrice.ActivationDate desc) as Price
from Invoice t
Here s result

Selecting top 3 distinct records from a table

I have a table like this which contains the columns: country, woeid, sometopic, LastDateTime
Venezuela 23424982 Metoo
Venezuela 23424982 Chaderton
India 25424282 BossAgain
World 1 EL AVIADOR
Venezuela 23424982 ChicagoBurning and so on...
I want distinct country to be selected and only top 3 rows.
I want the result to be like this ordered by LastDateTime.
Venezuela 23424982 Chaderton
India 25424282 BossAgain
World 1 EL AVIADOR
I tried like this:
select distinct(country), woeid, sometopic, * from PopularTrends order by LastModifiedTime desc
but this didnot work.
any pointers?
Try this,
SELECT COUNTRY,WOEID,SOMETOPIC,LASTDATETIME
FROM (SELECT *,
Row_number()
OVER(
PARTITION BY COUNTRY
ORDER BY LASTDATETIME) AS RN
FROM #Yourtable)A
WHERE RN = 1

Resources