How to use DISTINCT keyword in SQL Server? - sql-server

How to use DISTINCT keyword in SQL Server? I mean if it can work for given field.
select id, name, age
from dbo.XXX
There are multiple row returned by the query. I would like to get how many kinds of id or name or age.
select **distinct** id, name, age from dbo.XXX or
select id, **distinct** name, age from dbo.XXX or
select id, name, **distinct** age from dbo.XXX
To sum up, I would like to use a single SQL to get the distinct count of each fields, like select π—±π—Άπ˜€π˜π—Άπ—»π—°π˜ id, π—±π—Άπ˜€π˜π—Άπ—»π—°π˜ name, π—±π—Άπ˜€π˜π—Άπ—»π—°π˜ age from dbo.XXX

Dense_Rank can be used to calculate a distinct count for any column and multiple columns:
Select col1, col2, col3,
dense_rank() over (partition by [col1] order by [Unique ID]) + dense_rank() over (partition by [col1] order by [Unique ID] desc) - 1 as DistCountCol1,
dense_rank() over (partition by [col2] order by [Unique ID]) + dense_rank() over (partition by [col2] order by [Unique ID] desc) - 1 as DistCountCol2,
dense_rank() over (partition by [col3] order by [Unique ID]) + dense_rank() over (partition by [col3] order by [Unique ID] desc) - 1 as DistCountCol3
from [table]

select distinct ID
from dbo.XXX
Select distinct name
from dbo.XXX
Select distinct age
from dbo.XXX
If you want to know how many rows you have for each distinct ID or Name or Age, you can use the following:
Select ID, count(id) as [ID_Recurrence]
from dbo.XXX
group by ID
Select Age, count(age) as [Age_Recurrence]
from dbo.XXX
group by Age
Select Name, count(name) as [Name_Recurrence]
from dbo.XXX
group by Name

The DISTINCT keyword return a unique row like the Following
SELECT DISTINCT ID FROM SomeTable
SELECT DISTINCT ID , SCORE FROM SomeTable
If you want to get unique value from row try the following code.
The Below code is copied from here
select t.id, t.player_name, t.team
from tablename t
join (select team, min(id) as minid from tablename group by team) x
on x.team = t.team and x.minid = t.id

select COUNT(distinct id) uniqueIDCount
from dbo.XXX
would count distinct values of id field, if you want to count distinct values for field combination you must concat fields, assuming your id is integer and name is nvarchar:
select COUNT(distinct CONVERT(nvarchar, id) + name) uniqueIDCount
from dbo.XXX
note that even this way looks nice it is probably not the most efficient one, here you have more efficient, but also more complicated method way:
with c as (
select distinct id, name
from dbo.XXX
)select COUNT(1)
from c

Not sure why it's complicated. U can have 3 different queries and u can union to return single set if u want .

Related

3rd highest salary in each department

I have the following tables: https://pastebin.com/Js0Sm69S (CREATE and INSERT statements).
I would like to find the third-highest salary in each department if there is such.
I was able to achieve this:
Using the following query:
SELECT *,
DENSE_RANK() OVER
(PARTITION BY DepartmentId ORDER BY Salary DESC) AS DRank
FROM Employees
I am not sure if DENSE_RANK() is the best ranking function to use here. Maybe not, because WHERE DRank=3 may return more than one result (but we can say TOP(1)). What do you think about this? Now how to display the third-highest salary in each department if there is such?
Try this
Select EmployeeID,FirstName,DepartmentID,Salary
From (
Select *
,RN = Row_Number() over (Partition By DepartmentID Order By Salary)
,Cnt = sum(1) over (Partition By DepartmentID)
From Employees
) A
Where RN = case when Cnt<3 then Cnt else 3 end
You're almost there, but you can achieve this with ROW_NUMBER, instead of DENSE_RANK. I think following query should help.
WITH cte AS
(
SELECT *,
ROW_NUMBER() OVER (PARTITION BY DepartmentId ORDER BY Salary DESC) AS DRank
FROM Employees
)
SELECT *
FROM cte
WHERE DRank= 3

Error in SQLServer: Subquery returned more than 1 value

I would like to insert to Clients table data from two different tables (Surname and name). Moreover I would like to have a third column (email) that is a concatination from the first two. when i try the code hereunder it gives me the following error: "Subquery returned more than 1 value".
insert into CLIENTS (LastName,Firstname, EMAIL)
select (select top 150 Surname from Surname order by NEWID()),
(select top 150 Name from Name order by Newid()),
(select concat(concat(FisrtName, LastName),'#novaims.com') from clients);
Could you please help me understand where is the problem?
The error message is obvious your sub-query can result more than one record. Try this
;WITH cte
AS (SELECT 1 AS val
UNION ALL
SELECT val + 1
FROM cte
WHERE val < 150)
SELECT FisrtName,
LastName,
Concat(FisrtName, LastName, '#novaims.com')
FROM cte
OUTER apply (SELECT TOP 1 Surname FROM Surname ORDER BY Newid()) s (FisrtName)
OUTER apply (SELECT TOP 1 NAME FROM NAME ORDER BY Newid()) n (LastName)
Option (Maxrecursion 0)
You need to move the table references to the from clause. I think this does what you want:
insert into CLIENTS (LastName, Firstname, EMAIL)
select surname, name, concat(name, surname, '#novaims.com')
from (select Surname, row_number() over (order by newid()) as seqnum
from Surname
) s join
(select Name, row_number() over (order by newid()) as seqnum
from Name
)
on n.seqnum = s.seqnum;
Another method uses apply:
insert into CLIENTS (LastName, Firstname, EMAIL)
select top 150 s.surname, n.name, concat(n.name, s.surname, '#novaims.com')
from surname s cross apply
(select top 1 n.*
from names n
order by newid()
) n
order by newid();
This is more similar to your original idea. Do note, though, that the same name can appear more than once. And the performance should be better for the first version (because the sort is only happening once on each table).

Display distinct Badge_ID

This is my query having the Current Results as displayed below.
SELECT
Distinct CONVERT(int, Employees_1.Emp_Badge_No) AS Emp_Badge_No,
Employees_1.Emp_LastName, Employees_1.Emp_FirstName, Employees_1.Email,
Employees_1.NT_Name, Employees_1.Dept_key,
Employees_1.Emp_LastName + ',' + Employees_1.Emp_FirstName AS FullName,
dbo.department_vw.DepartmentShortName AS deptname,
Employees_1.active_flag
FROM data_common.dbo.employees_union_vw AS Employees_1
INNER JOIN dbo.department_vw
ON Employees_1.Dept_key = dbo.department_vw.DepartmentKey
Sample data:
I need help to achieve the Expected Results. What will I modify with my existing sql query?
I want to keep all the records even though it is inactive as long as the Emp_Badge_No is not repeated. I only want those duplicate Emp_Badge_No to be remove.
Thanks in advance.
You may want to use ROW_NUMBER for this. Modify the ORDER BY clause depending on which row from the duplicate entry you want to retrieve:
WITH Cte AS(
SELECT
e.Emp_Badge_No,
e.Emp_LastName,
e.Emp_FirstName,
e.Email,
e.NT_Name,
e.Dept_key,
e.Emp_LastName + ',' + e.Emp_FirstName AS FullName,
d.DepartmentShortName AS deptname,
e.active_flag,
rn = ROW_NUMBER() OVER(PARTITION BY e.Emp_Badge_No ORDER BY e.Active)
FROM data_common.dbo.employees_union_vw AS e
INNER JOIN dbo.department_vw d
ON e.Dept_key = d.DepartmentKey
)
SELECT
Emp_Badge_No,
Emp_LastName,
Emp_FirstName,
Email,
NT_Name,
Dept_key,
FullName,
deptname,
active_flag
FROM Cte
WHERE rn = 1
The above will get the Inactive record if there are duplicates. If you want to get the Active records instead, replace rn with:
ROW_NUMBER() OVER(PARTITION BY Emp_Badge_No ORDER BY e.Active DESC)
If you don't care whether it's Active or Inactive, replace rn with:
ROW_NUMBER() OVER(PARTITION BY Emp_Badge_No ORDER BY (SELECT NULL))

How do I select the top 1 results of the if the same identifier exist? SQL Server

I have a data table in sql server 2008 that I would like to select the top 1 out of each identifier:
The results shld looks like this during before and after:
Thus it should only select the 1st results if the same identifier do exist. Thanks a lot.
select distinct [Primary Identifier] from tbl
If you have entire records (other columns) instead of that single column, you can row number them and choose one.
select {list of columns}
from
(
select *, rn = row_number over (partition by [Primary Identifier]
order by 1/0)
from tbl
) X
where rn = 1;
order by 1/0 is arbitrary. If you need to choose a specific one from the "duplicates", for example the highest cost, you order by cost descending, i.e.
(partition by [Primary Identifier]
order by [cost] descending)
Just distinct them:
select distinct [primary identifier] from tablename
Or by grouping:
select [primary identifier] from tablename group by [primary identifier]
If more columns exist you can rank rows with window function:
;with cte as(select *, row_number() over(partition by [primary identifier] order by (select null)) rn from tablename)
select * from cte where rn = 1
Change order by (select null) to appropriate ordering column.
i think this will be an appropriate solution to your need-
;WITH cte AS
(
SELECT *,
ROW_NUMBER() OVER (PARTITION BY [Primary Identifier] ORDER BY [sort columns]) AS rowid
FROM [table]
)
SELECT *
FROM cte
WHERE rowid = 1

Finding a recent most duplicate records from SQL Server 2012

I want to find the recent duplicate records from SQL Server 2012. Here is the table structure I have.
I have table name called UserRegistration which contains the duplicate of UserID(GUID) and in same table, I have CreatedDate Column as well (Date). Now I want to find the recent duplicate records from this table.
Here is the same data.
id FirstName LastName CreatedDate UserID
109 FirstNameA LastNameA 28-04-2015 GUID1
110 FirstNameC LastNameD 19-05-2015 GUID2
111 FirstNameE LastNameF 22-05-2015 GUID1
If you notice on above tables, GUID 1 are having the duplicate, Now I want to find the recent one means it should return me only those rows with duplication but recent data. So in above table structure, it should return me 111 because record has been created recently compared to the 109. I believe you understand.
Do let me know if you have any question. I am happy to answer. Thanks. Awaiting for the reply.
Harshal
Try the below query this should do the work based on your i/p data -
create table #UserRegistration (id int,FirstName varchar(20),LastName varchar(20),CreatedDate date,UserID varchar(20))
insert into #UserRegistration
select 109, 'FirstNameA', 'LastNameA', '2015-04-28', 'GUID1' union
select 110, 'FirstNameC', 'LastNameD', '2015-05-19', 'GUID2' union
select 111, 'FirstNameE', 'LastNameF', '2015-05-22', 'GUID1'
select id, FirstName, LastName, CreatedDate, UserID from
(SELECT ur.*,row_number() over(partition by UserID order by CreatedDate) rn
FROM #UserRegistration ur) A
where rn > 1
You could use CTE. Group your records by UserID and give your particular row a rank ordered by CreatedDate.
insert into tab(id, FirstName, LastName, CreatedDate, UserID)
values(109, 'FirstNameA', 'LastNameA', '2015-04-28', 'guid1'),
(110, 'FirstNameC', 'LastNameD', '2015-05-19', 'guid2'),
(111, 'FirstNameE', 'LastNameF', '2015-05-22', 'guid1');
with cte as
(
select id, ROW_NUMBER() over (partition by UserID order by CreatedDate asc) as [Rank],
FirstName, LastName, CreatedDate, UserID
from tab
)
select id, FirstName, LastName, CreatedDate, UserID from cte where Rank > 1
Rank > 1 condition is responsible for retrieving duplicated items.
sqlfiddle link:
http://sqlfiddle.com/#!6/4d1f2/6
Solved this by using tmp-tables:
SELECT a.UserID,
MAX(a.CreatedDate) As CreatedDate
INTO #latest
FROM <your table> a
GROUP BY a.UserID
HAVING COUNT(a.UserID) > 1
SELECT b.id
FROM #latest a
INNER JOIN <your table> b ON a.UserID = b.UserID AND a.CreatedDate = b.CreatedDate
try this,
SELECT * FROM TableName tt WHERE
exists(select MAX(createdDate)
from TableName
where tt.UserID = UserID
group by UserID
having MAX(createdDate)= tt.createdDate)
I think your createddate field is not a date field, then try Format
WITH TempAns (id,UserID,duplicateRecordCount)
AS
(
SELECT id,
UserID,
ROW_NUMBER()OVER(partition by UserID ORDER BY id)
AS duplicateRecordCount
FROM #t
)
select * from #t where id in (
select max(id )
from TempAns
where duplicateRecordCount > 1
group by name )
You'd rank your records with ROW_NUMBER() to give all last records per userid #1. With COUNT() you make sure only to get the userids having more than one record.
select
id, firstname, lastname, createddate, userid
from
(
select
id, firstname, lastname, createddate, userid,
row_number() over (partition by userid oder by createddate desc) as rn,
count(*) over (partition by userid) as cnt
from userregistration
) ranked
where rn = 1 -- only last one
and cnt > 1; -- but only if there is more than one record for the userid
This gets the latest record for every userid that has duplicates.

Resources