GROUP BY doesn't contain specific column - sql-server

I have the following statement in MSSQL
SELECT a, b, MAX(t)
FROM table
GROUP BY a, b
What I want is just to show c and d columns for each specific row in the result. How can I do that?

It sounds like you're looking for ROW_NUMBER() or RANK() (the former will ignore ties, the latter will include them), something like:
;With Ranked as (
SELECT a,b,c,d,t,
ROW_NUMBER() OVER (PARTITION BY a,b
ORDER BY t desc) as rn
FROM table
)
SELECT * from Ranked where rn = 1
Which will return one row for each unique combination of the a,b columns, choosing the other values such that they come from the row with the highest t value (and, as I say, this variant ignores ties).

Related

How to limit result for each item in IN list

Found similar questions, but none is clear enough to solve my problem.
I need to find the top 3 for each of the groups.
This query works, but only for one. I need it to work for a 'IN (...)' list.
SELECT top 3 a.make, a.model, a.colour, b.price, c.dealerCode FROM Dealers c
INNER JOIN Cars a on a.make=c.make
INNER JOIN Prices b on a.make=b.make
WHERE c.dealerName='A' and a.make='VW' and b.price_range='X1'
Ideally I want my arguments in an IN list like, and the top 3 from each of them:
WHERE c.dealerName IN ('A', 'B', 'C') and a.make IN ('VW','FIAT','Volvo') and b.price_range ='X1'
Although this would work with IN, it would only give me the top 3 of the entire list, while I need the top 3 of each dealer/make combination.
(slightly simplified example)
(for the moment I need just the top 3 of each, in whatever order they come)
I suspect I need to use GROUP BY, but can't get that it to work.
Thanks
It seems you're looking for something like this. In a common table expression (CTE), a row_number sequence is assigned to unique combinations PARTITION BY of c.dealerName, a.make, b.price_range and the ordering is not specified, i.e. ORDER BY is (select null). To remove duplicates only 1 row is selected per (dealerName, make, price_range) triplet. Using the de-duplicated result of the first CTE, another row number is assigned to unique combinations of dealerName and make. Then the 'top 3' are chosen for each group (dealerName, Make) by limiting the 'dm_rn' row number value in the outer query to <=3.
with
rn_cte as (
select *, row_number() over (partition by c.dealerName, a.make, b.price_range
order by (select null)) rn
from Dealers c
join Cars a on a.make=c.make
join Prices b on a.make=b.make
where c.dealerName IN ('A', 'B', 'C')
and a.make IN ('VW','FIAT','Volvo')
and b.price_range ='X1'),
dm_cte as (
select *, row_number() over (partition by dealerName, make
order by (select null)) dm_rn
from rn_cte
where rn=1)
select make, model, colour, price, dealerCode
from dm_cte
where dm_rn<=3
order by dealerName, make, price_range;

only display one row when key field is the same

I have created a key field (C) by joining two columns(A&C). I want to run an sql that says, if column C is unique take only the top row.
Sample data:-
A B C D
10022 Blue 10022Blue Buggy
10300 Red 10300Red Noodle
10300 Red 10300Red Sammy
so I only want one line to show for 10300Red
Cheers
One way to do it is with a cte and ROW_NUMBER():
;WITH CTE AS
(
SELECT A,
B,
C,
D,
ROW_NUMBER() OVER(PARTITION BY C ORDER BY (SELECT NULL)) rn
FROM Table
)
SELECT A, B, C, D
FROM CTE
WHERE rn = 1
Note: You did say you want the "first" record, but you didn't specify the order of the records. Since tables in a relational database are unsorted by nature, "first" is simply an arbitrary row, hence "order by (select null)"
Do it this way:
select distinct A, B, C from tablename
You can find the result set by grouping it, then join it with the main table.
SELECT
A.*
FROM
YourTable A INNER JOIN
(
SELECT
G.C,
MAX(G.D) D
FROM
YourTable G
GROUP BY
G.C
) B ON A.C = B.C AND A.D = B.D

DISTINCT Query for One Column but not any other columns

I have a small 2 column table. Lets say the columns are A and B. Column A needs to be distinct so that it does not display a repeated value. Column B needs to have everything selected in the query so if there are multiple B values for a value in A, the multiple values will display. How can I write a query that will do this for me?
While the duplicates are now gone...there is a bunch of blank space in my dropdown.
You could use a CTE to simplify it:
WITH CTE AS
(
SELECT A, B,
RN = ROW_NUMBER() OVER (PARTITION BY A ORDER BY A, B)
FROM dbo.TableName
)
SELECT A = CASE WHEN RN = 1 THEN Cast(A as varchar(50)) ELSE '' END,
B
FROM CTE

SQL syntax for complex GROUP BY with OVER statement: calculating Gini coefficient for multiple sets

I want to calculate the Gini coefficient for a number of sets, containing in a two-column table (here called #cits) containing a value and a set-ID. I have been experimenting with different Gini-coefficient calculations, described here (StackExchange query) and here (StackOverflow question with some good replies). Both of the examples only calculate one coefficient for one table, whereas I would like to do it with a GROUP BY clause.
The #cits table contains two columns, c and cid, being the value and set-ID respectively.
Here is my current try (incomplete):
select count(c) as numC,
sum(c) as totalC,
(select row_number() over(order by c asc, cid) id, c from #cits) as a
from #cits group by cid
selecting numC and totalC works well, of course, but the next line is giving me a headache. I can see that the syntax is wrong, but I can't figure out how to assign the row_number() per c per cid.
EDIT:
Based on the suggestions, I used partition, like so:
select cid,sumC = sum(a.id * a.c)
into #srep
from (
select cid,row_number() over (partition by cid order by c asc) id,
c
from #cits
) as a
group by a.cluster_id1
select count(c) as numC,
sum(c) as totalC, b.sumC
into #gtmp
from #cits a
join #srep b
on a.cid = b.cid
group by a.cid,b.sumC
select
gini = 2 * sumC / (totalC * numC) - (numC - 1) / numC
from #gtmp
This almost works. I get a result, but it is >1, which is unexpected, as the Gini-coefficient should be between 0 and 1. As stated in the comments, I would have preferred a one-query solution as well, but it is not a major issue at all.
You can "partition" the data so row numbering would start over for each ID...
but I'm not sure this is what you're after..
I'm assuming you want the CID displayed as you are grouping by it.
select count(c) as numC
, sum(c) as totalC
, row_number() over(partition by cID order by c asc) as a
, cid
from #cits group by cid
Note you don't need the subquery.
Yeah this isn't likely right.
output
NumC TotalC A CID
24 383 1 1
15 232 1 2
If I'm understanding correctly, you need numC and totalC for each C in a cid set, as well as the position of the c inside of that set. This should get you what you need:
select
rn.cid,
rn.c,
row_number() over (partition by rn.cid order by rn.c) as id,
agg.numC,
agg.totalC
from #cits rn
left outer join
(
select
cid,
count(c) as numC,
sum(c) as totalC
from #cits
group by cid
) agg
on rn.cid = agg.cid

Delete duplicates MS-SQL with minimum date and multiple keys

I have a big table that has duplicates as such:
number (primary key),
group (primary key),
Division (primary key),
dateChange.
Example:
1,2,3,20121015
1,2,3,20120101
1,2,3,20110101
2,2,2,20121010
2,2,2,20120101
result should be:
1,2,3,20121015
2,2,2,20121010
I have tried many combinations including group by the primary key with minimum "changeDate"
but nothing seems to work perfectly.
I want to have something like this:
delete from table where (number,group.devision,changeDate) not in
(select from table(number,group,devision,Max(changeDate))
group by (number,group.devision)
But I dont think it is a valid MS-SQL syntax.
Your help will be very appreciated!!
To delete all rows except for the latest for a number, group, Division combination.
;WITH cte
AS (SELECT ROW_NUMBER() OVER (PARTITION BY number, group, Division
ORDER BY dateChange DESC) RN
FROM YourTable)
DELETE FROM cte
WHERE RN > 1
The following should work.
delete table from
table inner join (select
number, group, division, changeDate, row_number() over
(partition by number, group, division order by changeDate desc) as ranker
from table) Z
on table.number = Z.number and table.group = Z.group and
table.changeDate = Z.changeDate and Z.ranker != 1

Resources