Select subset of duplicate records in SQL Server - sql-server

I need to select a subset of duplicate records in SQL Server 2016. Below is the data set and the code used. I need to select only duplicates highlighted in red. Basically I need only those duplicate records that have matching LName, FName, dateOfBirth, StreetAddress values and in the Source the nave NUll. At the same time, I need only those records that also match in the abovementioned fields and have Source value of "Company XYZ"
IF OBJECT_ID('tempdb..#Dataset') IS NOT NULL DROP TABLE #Dataset
GO
create table #Dataset
(
ID int not null,
LName varchar(50) null,
Fname varchar(50) null,
DateOfBirth varchar(50) null,
StreetAddress varchar(50) null,
Source varchar(50) null,
)
insert into #Dataset (ID, LName, Fname, DateOfBirth, StreetAddress, Source)
values
('1', 'John', 'Ganske', '37171', ' 1223 Sunrise St', 'Company XYZ'),
('2', 'John', 'Ganske', '37171', ' 1233 Sunrise St', 'Company XYZ'),
('4', 'Brent', 'Paine', '20723', ' 5443 Fox Dr', Null),
('3', 'Brent', 'Paine', '20723', ' 5443 Fox Dr', 'Company XYZ'),
('5', 'Adam', 'Smith', '22805', ' 1254 Lake Ridge Ct', Null),
('6', 'Adam', 'Smith', '22805', ' 1254 Lake Ridge Ct', Null),
('7', 'Adam', 'Smith', '22805', ' 1254 Lake Ridge Ct', 'Company XYZ'),
('8', 'Timothy', 'Johnson', '36165', ' 1278 Lee H-W', Null),
('9', 'Timothy', 'Johnson', '36165', ' 1278 Lee H-W', Null),
('10', 'Judy', 'Wilson', '32579', ' 5678 Dotties Dr', 'Company XYZ'),
('12', 'Peter', 'Pan', '37507', NULL, Null),
('11', 'Peter', 'Pan', '37507', NULL, 'Company XYZ');
--select * from #Dataset
select d.ID, d.LName, d.Fname, d.DateOfBirth, d.StreetAddress, d.Source
from #Dataset d
inner join (select
LName, Fname, DateOfBirth, StreetAddress
from #Dataset
--where Source is not null
group by
LName, Fname, DateOfBirth, StreetAddress
having count(*) > 1 ) b
on d.LName = b.LName
and
d.Fname = b.Fname
and
d.DateOfBirth = b.DateOfBirth
and
d.StreetAddress = b.StreetAddress
left outer join (select min(ID) as ID from #Dataset
group by LName, Fname, DateOfBirth, StreetAddress
having count(*) > 1 ) c
on d.ID = c.ID
My output looks like this one below:

You could use ROW_NUMBER:
WITH cte AS (
SELECT *,ROW_NUMBER() OVER(PARTITION BY LName,Fname,DateOfBirth,StreetAddress
ORDER BY ID DESC) rn
FROM #Dataset
)
SELECT *
FROM cte
WHERE rn > 1
ORDER BY ID;
db<>fiddle demo
EDIT:
WITH cte AS (
SELECT *,
ROW_NUMBER() OVER(PARTITION BY LName, Fname, DateOfBirth, StreetAddress
ORDER BY ID DESC) rn,
SUM(CASE WHEN Source = 'Company XYZ' THEN 1 ELSE 0 END)
OVER(PARTITION BY LName, Fname, DateOfBirth, StreetAddress) AS cnt
FROM #Dataset
)
SELECT *
FROM cte
WHERE rn > 1
AND cnt > 0
AND [Source] IS NULL
ORDER BY ID;
db<>fiddle demo2
EDIT 2:
WITH cte AS (
SELECT *,
SUM(CASE WHEN Source IS NULL THEN 1 ELSE 0 END) OVER(PARTITION BY LName, Fname, DateOfBirth, StreetAddress) c1,
SUM(CASE WHEN Source = 'Company XYZ' THEN 1 ELSE 0 END) OVER(PARTITION BY LName, Fname, DateOfBirth, StreetAddress) AS c2,
COUNT(*) OVER(PARTITION BY LName, Fname, DateOfBirth, StreetAddress) c3
FROM #Dataset
)
SELECT *
FROM cte
WHERE c1 > 0
AND c2 > 0
AND c3 > 1
AND Source IS NULL
ORDER BY ID;
db<>fiddle demo3

Related

Repeated data on inserted rows

--demo setup
drop table if exists dbo.product
go
create table dbo.Product
(
ProductId int,
ProductTitle varchar(55),
ProductCategory varchar(255),
Loaddate datetime
)
insert into dbo.Product
values (1, 'Table', 'ABCD', '3/4/2018'),
(1, 'Table', 'ABCD', '3/5/2018'),
(1, 'Table', 'ABCD', '3/6/2018'),
(1, 'Table', 'XYZ', '3/7/2018'),
(1, 'Table', 'XYZ', '3/8/2018'),
(1, 'Table', 'XYZ', '3/9/2018'),
(1, 'Table', 'GHI', '3/10/2018'),
(1, 'Table', 'GHI', '3/11/2018'),
(1, 'Table', 'XYZ', '3/12/2018'),
(1, 'Table', 'XYZ', '3/13/2018')
SELECT
product.productid,
product.producttitle,
product.productcategory,
MIN(product.loaddate) AS BeginDate,
-- ,max(product.LoadDate) as BeginDate1
CASE
WHEN MAX(product.loaddate) = MAX(oa.enddate1)
THEN '12/31/9999'
ELSE MAX(product.loaddate)
END AS EndDate
FROM
dbo.product product
CROSS APPLY
(SELECT MAX(subproduct.loaddate) EndDate1
FROM dbo.product subproduct
WHERE subproduct.productid = product.productid) oa
GROUP BY
productid, producttitle, productcategory
Output:
productid
producttitle
productcategory
BeginDate
EndDate
1
Table
ABCD
2018-03-04 00:00:00.000
2018-03-06 00:00:00.000
1
Table
XYZ
2018-03-07 00:00:00.000
9999-12-31 00:00:00.000
1
Table
GHI
2018-03-10 00:00:00.000
2018-03-11 00:00:00.000
Desired output:
productid
producttitle
productcategory
BeginDate
EndDate
1
Table
ABCD
2018-03-04 00:00:00.000
2018-03-06 00:00:00.000
1
Table
XYZ
2018-03-07 00:00:00.000
2018-03-09 00:00:00.000
1
Table
GHI
2018-03-10 00:00:00.000
2018-03-11 00:00:00.000
1
Table
XYZ
2018-03-12 00:00:00.000
9999-12-31 00:00:00.000
The last two inserted rows repeat the data from Loaddate '3/7/2018'-'3/9/2018', this doesn't happen if any of the new inserted rows doesn't repeat data. The only thing that changes is the LoadDate, giving me incorrect output. how can i get something like that desired output?
Well, first of all, you need to find a sequence number over all your records. If you already have a primary key, that's good. In example you gave us, there's no such column, so let's generate it.
Then, we make pairs with start and end dates for each product's category change. Another thing is to group all these product's category changes.
Finally, we make just a simple group by:
;
with cte as ( select *,
row_number() over(partition by ProductId order by Loaddate) as rn
from product
), cte2 as ( select t1.ProductId,
t1.ProductTitle,
t1.ProductCategory,
t1.Loaddate as BeginDate,
case
when t1.ProductCategory <> t2.ProductCategory
then t1.Loaddate
else coalesce(t2.Loaddate, null)
end as EndDate,
row_number() over(order by t1.ProductId, t1.Loaddate) as rn_overall,
row_number() over(partition by t1.ProductId, t1.ProductCategory order by t1.Loaddate) as rn_category
from cte as t1
left join cte as t2
on t2.ProductId = t1.ProductId
and t2.rn = t1.rn + 1
), cte3 as ( select *,
min(rn_overall) over (partition by ProductId, ProductCategory, rn_overall - rn_category) as product_group
from cte2
)
select ProductId, ProductTitle, ProductCategory,
min(BeginDate) as BeginDate,
case
when max(case when EndDate is null then 1 else 0 end) = 0
then max(EndDate)
else null
end as EndDate
from cte3
group by ProductId, ProductTitle, ProductCategory, product_group
order by ProductId, BeginDate

Use min() when row_number() is descending

I have #Tickets table with 2 open tickets for JFK.
declare #Tickets table
(
Airport varchar(10),
TicketNum varchar(10),
Created date,
Modified date,
LastModified date,
Modified_By varchar(10),
TicketStatus varchar(10),
AssignedTo varchar(10)
)
insert into #Tickets
select 'JFK', '001', '9/25/2021', '9/26/2021', '9/29/2021', 'Jimmy', 'Open', 'Ralph' union
select 'JFK', '002', '9/28/2021', '9/28/2021', '9/30/2021', 'Mary', 'Open', 'Andrew'
select Airport, lastmodified, assignedto, Modified_By
from
(
select airport, lastmodified, assignedto, Modified_By,
row_number() over(partition by airport order by lastmodified desc) rn
from #Tickets
) src
where rn = 1
The following returns the last modified date of JFK tickets (9/30/2021), the last person that modified any of the JFK tickets (Mary) and the owner of the ticket that Mary last modified (Andrew).
Airport lastmodified assignedto Modified_By
JFK 2021-09-30 Andrew Mary
What I can't figure out is how to show min(created) to show when the first ticket was created.
The complete result should be
Airport First_Created lastmodified assignedto Modified_By
JFK 2021-09-25 2021-09-30 Andrew Mary
How can I plug in a min(create_date) as 'First_Created' in the query above?
I'm sure I can have two cte like below and then a join, but I prefer not using joins unless there are no other options:
declare #Tickets table
(
Airport varchar(10),
TicketNum varchar(10),
Created date,
Modified date,
LastModified date,
Modified_By varchar(10),
TicketStatus varchar(10),
AssignedTo varchar(10)
)
insert into #Tickets
select 'JFK', '001', '9/25/2021', '9/26/2021', '9/29/2021', 'Jimmy', 'Open', 'Ralph' union
select 'EWR', '001', '9/25/2021', '9/26/2021', '9/29/2021', 'Jimmy', 'Open', 'Ralph' union
select 'STI', '001', '9/25/2021', '9/26/2021', '9/29/2021', 'Jimmy', 'Open', 'Ralph' union
select 'JFK', '002', '9/28/2021', '9/28/2021', '9/30/2021', 'Mary', 'Open', 'Andrew'
;with cte as
(
select Airport, lastmodified, assignedto, Modified_By
from
(
select airport, lastmodified, assignedto, Modified_By,
row_number() over(partition by airport order by lastmodified desc) rn
from #Tickets
) src
where rn = 1
), cte2 as
(
select airport, min(created) as 'created' from #Tickets group by airport
)
select cte.Airport, lastmodified, assignedto, Modified_By, created
from cte inner join cte2 on
cte.Airport = cte2.Airport
You could try something like this. The query selects the latest modified row (partitioned by Airport) and CROSS APPLY's the earliest Created date.
with last_mod_cte as (
select top 1 with ties *
from #tickets
order by row_number() over(partition by airport order by lastmodified desc))
select lm.Airport, min_created.dt First_Created, lm.LastModified, lm.AssignedTo, lm.Modified_By
from last_mod_cte lm
cross apply (select min(Created)
from #Tickets t
where lm.Airport=t.Airport) min_created(dt);

Update column with 4 consecutive purchases

I need to update my Result column values for the entire user to yes if the user did make 4 consecutive purchases without receiving a bonus in between. How can this be done. Please see my code below.....
-- drop table #Test
CREATE TABLE #Test (UserID int, TheType VARCHAR(10), TheDate DATETIME, Result VARCHAR(10))
INSERT INTO #Test
SELECT 1234, 'Bonus', GETDATE(), NULL
UNION
SELECT 1234, 'Purchase', GETDATE()-1, NULL
UNION
SELECT 1234, 'Purchase', GETDATE()-2, NULL
UNION
SELECT 1234, 'Purchase', GETDATE()-3, NULL
UNION
SELECT 1234, 'Purchase', GETDATE()-4, NULL
UNION
SELECT 1234, 'Bonus', GETDATE()-5, NULL
UNION
SELECT 1234, 'Purchase', GETDATE()-6, NULL
UNION
SELECT 1234, 'Bonus', GETDATE()-7, NULL
SELECT * FROM #Test ORDER BY TheDate
Again, please note that the purchases need to be consecutive (By TheDate)
You can as the below:
;WITH CTE1
AS
(
SELECT
ROW_NUMBER() OVER (ORDER BY TheDate) RowId,
ROW_NUMBER() OVER (PARTITION BY UserID,TheType ORDER BY TheDate) PurchaseRowId,
*
FROM #Test
), CTE2
AS
(
SELECT
MIN(A.RowId) MinId,
MAX(A.RowId) MaxId
FROM
CTE1 A
GROUP BY
A.TheType,
A.RowId - A.PurchaseRowId
)
SELECT
A.UserID ,
A.TheType ,
A.TheDate ,
CASE WHEN B.MinId IS NULL THEN NULL ELSE 'YES' END Result
FROM
CTE1 A LEFT JOIN
CTE2 B ON A.RowId >= B.MinId AND A.RowId <= B.MaxId AND (B.MaxId - B.MinId) > 2
--AND A.TheType = 'Purchase'
ORDER BY A.TheDate
Result:
UserID TheType TheDate Result
----------- ---------- ----------------------- - ------
1234 Bonus 2017-06-06 11:06:03.130 NULL
1234 Purchase 2017-06-07 11:06:03.130 NULL
1234 Bonus 2017-06-08 11:06:03.130 NULL
1234 Purchase 2017-06-09 11:06:03.130 YES
1234 Purchase 2017-06-10 11:06:03.130 YES
1234 Purchase 2017-06-11 11:06:03.130 YES
1234 Purchase 2017-06-12 11:06:03.130 YES
1234 Bonus 2017-06-13 11:06:03.130 NULL
First you have to derive the column group and then group by that (having = 4) and inner join with the original table.
drop table if exists #Test;
create table #Test
(
UserID int
, TheType varchar(10)
, TheDate date
, Result varchar(10)
);
insert into #Test
select 1234, 'Bonus', getdate(), null
union
select 1234, 'Purchase', getdate() - 1, null
union
select 1234, 'Purchase', getdate() - 2, null
union
select 1234, 'Purchase', getdate() - 3, null
union
select 1234, 'Purchase', getdate() - 4, null
union
select 1234, 'Bonus', getdate() - 5, null
union
select 1234, 'Purchase', getdate() - 6, null
union
select 1234, 'Bonus', getdate() - 7, null;
drop table if exists #temp;
select
*
, lag(t.TheDate, 1) over ( order by t.TheDate ) as Lag01
, lag(t.TheType, 1) over ( order by t.TheDate ) as LagType
into
#temp
from #Test t;
with cteHierarchy
as
(
select
UserID
, TheType
, TheDate
, Result
, Lag01
, t.TheDate as Root
from #temp t
where t.LagType <> t.TheType
union all
select
t.UserID
, t.TheType
, t.TheDate
, t.Result
, t.Lag01
, cte.Root as Root
from #temp t
inner join cteHierarchy cte on t.Lag01 = cte.TheDate
and t.TheType = cte.TheType
)
update test
set
Result = 4
from (
select
t.Root
, count(t.UserID) as Cnt
, t.UserID
from cteHierarchy t
group by t.UserID, t.Root
having count(t.UserID) = 4
) tt
inner join #Test test on tt.UserID = test.UserID
select * from #Test t
order by t.TheDate;

Grouping similar items recursively

I have been reading the following Microsoft article on recursive queries using CTE and just can't seem to wrap my head around how to use it for group common items.
I have a table the contains the following columns:
ID
FirstName
LastName
DateOfBirth
BirthCountry
GroupID
What I need to do is start with the first person in the table and iterate through the table and find all the people that have the same (LastName and BirthCountry) or have the same (DateOfBirth and BirthCountry).
Now the tricky part is that I have to assign them the same GroupID and then for each person in that GroupID, I need to see if anyone else has the same information and then put the in the same GroupID.
I think I could do this with multiple cursors but it is getting tricky.
Here is sample data and output.
ID FirstName LastName DateOfBirth BirthCountry GroupID
----------- ---------- ---------- ----------- ------------ -----------
1 Jonh Doe 1983-01-01 Grand 100
2 Jack Stone 1976-06-08 Grand 100
3 Jane Doe 1982-02-08 Grand 100
4 Adam Wayne 1983-01-01 Grand 100
5 Kay Wayne 1976-06-08 Grand 100
6 Matt Knox 1983-01-01 Hay 101
John Doe and Jane Doe are in the same Group (100) because they have the same (LastName and BirthCountry).
Adam Wayne is in Group (100) because he has the same (BirthDate and BirthCountry) as John Doe.
Kay Wayne is in Group (100) because she has the same (LastName and BirthCountry) as Adam Wayne who is already in Group (100).
Matt Knox is in a new group (101) because he does not match anyone in previous groups.
Jack Stone is in a group (100) because he has the same (BirthDate and BirthCountry) as Kay Wayne who is already in Group (100).
Data scripts:
CREATE TABLE #Tbl(
ID INT,
FirstName VARCHAR(50),
LastName VARCHAR(50),
DateOfBirth DATE,
BirthCountry VARCHAR(50),
GroupID INT NULL
);
INSERT INTO #Tbl VALUES
(1, 'Jonh', 'Doe', '1983-01-01', 'Grand', NULL),
(2, 'Jack', 'Stone', '1976-06-08', 'Grand', NULL),
(3, 'Jane', 'Doe', '1982-02-08', 'Grand', NULL),
(4, 'Adam', 'Wayne', '1983-01-01', 'Grand', NULL),
(5, 'Kay', 'Wayne', '1976-06-08', 'Grand', NULL),
(6, 'Matt', 'Knox', '1983-01-01', 'Hay', NULL);
Here's what I came up with. I have rarely written recursive queries so it was some good practice for me. By the way Kay and Adam do not share a birth country in your sample data.
with data as (
select
LastName, DateOfBirth, BirthCountry,
row_number() over (order by LastName, DateOfBirth, BirthCountry) as grpNum
from T group by LastName, DateOfBirth, BirthCountry
), r as (
select
d.LastName, d.DateOfBirth, d.BirthCountry, d.grpNum,
cast('|' + cast(d.grpNum as varchar(8)) + '|' as varchar(1024)) as equ
from data as d
union all
select
d.LastName, d.DateOfBirth, d.BirthCountry, r.grpNum,
cast(r.equ + cast(d.grpNum as varchar(8)) + '|' as varchar(1024))
from r inner join data as d
on d.grpNum > r.grpNum
and charindex('|' + cast(d.grpNum as varchar(8)) + '|', r.equ) = 0
and (d.LastName = r.LastName or d.DateOfBirth = r.DateOfBirth)
and d.BirthCountry = r.BirthCountry
), g as (
select LastName, DateOfBirth, BirthCountry, min(grpNum) as grpNum
from r group by LastName, DateOfBirth, BirthCountry
)
select t.*, dense_rank() over (order by g.grpNum) + 100 as GroupID
from T as t
inner join g
on g.LastName = t.LastName
and g.DateOfBirth = t.DateOfBirth
and g.BirthCountry = t.BirthCountry
For the recursion to terminate it's necessary to keep track of the equivalences (via string concatenation) so that at each level it only needs to consider newly discovered equivalences (or connections, transitivities, etc.) Notice that I've avoided using the word group to avoid bleeding into the GROUP BY concept.
http://rextester.com/edit/TVRVZ10193
EDIT: I used an almost arbitrary numbering for the equivalences but if you wanted them to appear in a sequence based on the lowest ID with each block that's easy to do. Instead of using row_number() say min(ID) as grpNum presuming, of course, that IDs are unique.
I assume groupid is the output you want which start from 100.
Even if groupid come from another table,then it is no problem.
Firstly,sorry for my "No cursor comments".Cursor or RBAR operation is require for this task.In fact after a very long time i met such requirement which took so long and I use RBAR operation.
if tommorrow i am able to do using SET BASE METHOD,then I will come and edit it.
Most importantly using RBAR operation make the script more understanding and I think it wil work for other sample data too.
Also give feedback about the performance and how it work with other sample data.
Alsi in my script you note that id are not in serial,and it do not matter,i did this in order to test.
I use print for debuging purpose,you can remove it.
SET NOCOUNT ON
DECLARE #Tbl TABLE(
ID INT,
FirstName VARCHAR(50),
LastName VARCHAR(50),
DateOfBirth DATE,
BirthCountry VARCHAR(50),
GroupID INT NULL
);
INSERT INTO #Tbl VALUES
(1, 'Jonh', 'Doe', '1983-01-01', 'Grand', NULL) ,
(2, 'Jack', 'Stone', '1976-06-08', 'Grand', NULL),
(3, 'Jane', 'Doe', '1982-02-08', 'Grand', NULL),
(4, 'Adam', 'Wayne', '1983-01-01', 'Grand', NULL),
(5, 'Kay', 'Wayne', '1976-06-08', 'Grand', NULL),
(6, 'Matt', 'Knox', '1983-01-01', 'Hay', NULL),
(7, 'Jerry', 'Stone', '1976-06-08', 'Hay', NULL)
DECLARE #StartGroupid INT = 100
DECLARE #id INT
DECLARE #Groupid INT
DECLARE #Maxid INT
DECLARE #i INT = 1
DECLARE #MinGroupID int=#StartGroupid
DECLARE #MaxGroupID int=#StartGroupid
DECLARE #LastGroupID int
SELECT #maxid = max(id)
FROM #tbl
WHILE (#i <= #maxid)
BEGIN
SELECT #id = id
,#Groupid = Groupid
FROM #Tbl a
WHERE id = #i
if(#Groupid is not null and #Groupid<#MinGroupID)
set #MinGroupID=#Groupid
if(#Groupid is not null and #Groupid>#MaxGroupID)
set #MaxGroupID=#Groupid
if(#Groupid is not null)
set #LastGroupID=#Groupid
UPDATE A
SET groupid =case
when #id=1 and b.groupid is null then #StartGroupid
when #id>1 and b.groupid is null then #MaxGroupID+1--(Select max(groupid)+1 from #tbl where id<#id)
when #id>1 and b.groupid is not null then #MinGroupID --(Select min(groupid) from #tbl where id<#id)
end
FROM #Tbl A
INNER JOIN #tbl B ON b.id = #ID
WHERE (
(
a.BirthCountry = b.BirthCountry
and a.DateOfBirth = b.dateofbirth
)
or (a.LastName = b.LastName and a.BirthCountry = b.BirthCountry)
or (a.LastName = b.LastName and a.dateofbirth = b.dateofbirth)
)
--if(#id=7) --#id=2,#id=3 and so on (for debug
--break
SET #i = #i + 1
SET #ID = #I
END
SELECT *
FROM #Tbl
Alternate Method but still it return 56,000 rows without rownum=1.See if it work with other sample data or see if you can further optimize it.
;with CTE as
(
select a.ID,a.FirstName,a.LastName,a.DateOfBirth,a.BirthCountry
,#StartGroupid GroupID
,1 rn
FROM #Tbl A where a.id=1
UNION ALL
Select a.ID,a.FirstName,a.LastName,a.DateOfBirth,a.BirthCountry
,case when ((a.BirthCountry = b.BirthCountry and a.DateOfBirth = b.dateofbirth)
or (a.LastName = b.LastName and a.BirthCountry = b.BirthCountry)
or (a.LastName = b.LastName and a.dateofbirth = b.dateofbirth)
) then b.groupid else b.groupid+1 end
, b.rn+1
FROM #tbl A
inner join CTE B on a.id>1
where b.rn<#Maxid
)
,CTE1 as
(select * ,row_number()over(partition by id order by groupid )rownum
from CTE )
select * from cte1
where rownum=1
Maybe you can run it in this way
SELECT *
FROM table_name
GROUP BY
FirstName,
LastName,
GroupID
HAVING COUNT(GroupID) >= 2
ORDER BY GroupID

sql Server: Rank by sum of points and order by ranking

I have a game table with these fields:
ID Name Email Points
----------------------------------
1 John john#aaa.com 120
2 Test bob#aaa.com 100
3 John john#bbb.com 80
4 Bob bob#aaa.com 50
5 John john#aaa.com 80
I want to group them by email (email Identifies that both players are the same no matter that row 2 and 4 have different names) and have also sum of points and the last entered name in the results and rank them with the heighest sum of points to the lowest
the Result I want from the sample table is:
Ranking Name Points Games_Played Average_Points
------------------------------------------------------------------------------------------
1 John 200 2 100
2 Bob 150 2 75
3 John 80 1 80
I could achieve getting ranking, sum of points, and average points but getting the last entered name I think need joining with the same table again and it seems a little wrong.
Any ideas how to do this?
Displaying the Name and grouping be email will cause to use e.g. MIN(Name) and lead to duplicate names.
Select Rank() over (order by Points desc) as Rank
,Name,Points,Games_Played,Average_Points
from
(
Select Min(Name) as Name,Email,Sum(Points) as Points
,Count(*) as Games_Played,AVG(Points) as Average_Points
From #a Group by Email
) a
order by Rank
SQLFiddle
in the Fiddle are two commented lines you should uncomment to see the behavior on identical results.
You can use Ranking Functions from SQL-Server 2005 upwards:
WITH Points
AS (SELECT Sum_Points = Sum(points) OVER (
partition BY email),
Games_Played = Count(ID) OVER (
partition BY email),
Average_Points = AVG(Points) OVER (
partition BY email),
Rank = DENSE_RANK() OVER (
Partition BY email Order By Points DESC),
*
FROM dbo.Game)
SELECT Ranking=DENSE_RANK()OVER(ORDER BY Sum_Points DESC),
Name,
Points=Sum_Points,
Games_Played,
Average_Points
FROM Points
WHERE Rank = 1
Order By Sum_Points DESC;
DEMO
Note that the result is different since i'm showing the row with the highest point in case that the email is not unique, so "Test" instead of "Bob".
Below are separate solutions for SQL Server 2012+, 2005 to 2008 R2, and 2000:
2012+
CREATE TABLE #PlayerPoints
( ID INT PRIMARY KEY
, Name VARCHAR(10) NOT NULL
, Email VARCHAR(20) NOT NULL
, Points INT NOT NULL);
INSERT INTO #PlayerPoints (ID, Name, Email, Points)
VALUES
(1, 'John', 'john#aaa.com', 120)
, (2, 'Test', 'bob#aaa.com', 100)
, (3, 'John', 'john#bbb.com', 80)
, (4, 'Bob', 'bob#aaa.com', 50)
, (5, 'John', 'john#aaa.com', 80)
WITH BaseData
AS
(SELECT ID
, Email
, Points
, LastRecordName = LAST_VALUE(Name) OVER
(PARTITION BY Email
ORDER BY ID DESC
ROWS UNBOUNDED PRECEDING)
FROM #PlayerPoints)
SELECT Email
, LastRecordName = MAX(LastRecordName)
, Points = SUM(Points)
, Games_Played = COUNT(*)
, Average_Points = AVG(Points)
FROM BaseData
GROUP BY Email
ORDER BY Points DESC;
2005 to 2008 R2
CREATE TABLE #PlayerPoints
( ID INT PRIMARY KEY
, Name VARCHAR(10) NOT NULL
, Email VARCHAR(20) NOT NULL
, Points INT NOT NULL);
INSERT INTO #PlayerPoints (ID, Name, Email, Points)
VALUES
(1, 'John', 'john#aaa.com', 120)
, (2, 'Test', 'bob#aaa.com', 100)
, (3, 'John', 'john#bbb.com', 80)
, (4, 'Bob', 'bob#aaa.com', 50)
, (5, 'John', 'john#aaa.com', 80)
WITH BaseData
AS
(SELECT ID
, Email
, Name
, ReverseOrder = ROW_NUMBER() OVER
(PARTITION BY Email
ORDER BY ID DESC)
FROM #PlayerPoints)
SELECT pp.Email
, LastRecordName = MAX(bd.Name)
, Points = SUM(pp.Points)
, Games_Played = COUNT(*)
, Average_Points = AVG(pp.Points)
FROM #PlayerPoints pp
JOIN BaseData bd
ON pp.Email = bd.Email
AND bd.ReverseOrder = 1
GROUP BY pp.Email
ORDER BY Points DESC;
2000
CREATE TABLE #PlayerPoints
( ID INT PRIMARY KEY
, Name VARCHAR(10) NOT NULL
, Email VARCHAR(20) NOT NULL
, Points INT NOT NULL);
INSERT INTO #PlayerPoints (ID, Name, Email, Points)
SELECT 1, 'John', 'john#aaa.com', 120
UNION ALL
SELECT 2, 'Test', 'bob#aaa.com', 100
UNION ALL
SELECT 3, 'John', 'john#bbb.com', 80
UNION ALL
SELECT 4, 'Bob', 'bob#aaa.com', 50
UNION ALL
SELECT 5, 'John', 'john#aaa.com', 80;
SELECT pp.Email
, LastRecordName = MAX(sppmi.Name)
, Points = SUM(pp.Points)
, Games_Played = COUNT(*)
, Average_Points = AVG(pp.Points)
FROM #PlayerPoints pp
JOIN
(SELECT spp.Email
, spp.Name
FROM #PlayerPoints spp
JOIN
(SELECT Email
, MaximumID = MAX(ID)
FROM #PlayerPoints
GROUP BY Email) mi
ON spp.ID = mi.MaximumID) sppmi
ON pp.Email = sppmi.Email
GROUP BY pp.Email
ORDER BY Points DESC;
I think this is what you need
select ROW_NUMBER() OVER (ORDER BY sum(r1.points) Desc) as Ranking,
r1.name as Name,
sum(r1.points) as Points,
r3.gplayed as 'Games Played',
r2.points 'Average Points'
from ranks r1
join (select avg(points) as points, email from ranks group by email) r2
on r1.email = r2.email
join (select email, count(*) as gplayed from ranks group by email) r3
on r1.email = r3.email
group by
r1.email,
r1.name,
r2.points,
r3.gplayed
Here is a SQL Fiddle.
Only the solution from #RegisteredUser seems to handle the constraint on the name. However, it requires SQL Server 2012, so here is a more general solution:
Select dense_rank() over (order by sum(points) desc) as ranking
max(case when islastid = 1 then Name end) as Name, Email, Sum(Points) as Points,
Count(*) as Games_Played, AVG(Points) as Average_Points
From (select g.*,
row_number() over (partition by email order by id desc) as islastid
from games g
) t
Group by Email;
You don't have enough information in the question to choose between rank() and dense_rank().
Also, this version is simpler relative to other versions, because you can mix window functions and aggregation functions.

Resources