Getting Distinct Data from two Columns SQL Server - sql-server

I am trying to get the distinct data from two columns in the same table.
Table 1:
***ID Address City***
01 Test Street Springdale
01 Main Street Springdale
01 Pass Dr. New Town
01 Main Street New Town
I want the results to look like this;
***Address City***
Test Street Springdale
Main Street New Town
Pass Dr.
Currently I have this:
SELECT DISTINCT Address
FROM Table1
WHERE ID = 01
UNION
SELECT DISTINCT City
FROM Table1
WHERE ID = 01
But what I get in return is:
***Address***
Test Street
Main Street
Pass Dr.
Springdale
New Town

Using nested CTEs as follows will produce the result set required in the OP:
;WITH CTE_Address AS
(
SELECT DISTINCT Address
FROM #T
), CTE_Address_rn AS
(
SELECT Address, ROW_NUMBER() OVER (ORDER BY Address) AS rn
FROM CTE_Address
), CTE_City AS
(
SELECT DISTINCT City
FROM #T
), CTE_City_rn AS
(
SELECT City, ROW_NUMBER() OVER (ORDER BY City) AS rn
FROM CTE_City
)
SELECT a.Address, c.City
FROM CTE_Address_rn AS a
LEFT JOIN CTE_City_rn AS c ON a.rn = c.rn
The basic idea is to produce two separate result sets containing distinct Addresses and Cities and join these by ROW_NUMBER.
SQL Fiddle Demo here
P.S. The above answer is based on the assumption that the OP just wants distinct Address and City values put into a single table, disassociated from each other.

It is because you are only ever selecting one column. The union just puts two data sets together and deupes. So the first one reads distinct address and the second distinct city and then retuns as one list.
You should really return these as two different data sets or use two different procs. You can do the former just be getting rid of your UNION.

;WITH CTE_Address AS
(
SELECT ID, Street_Address, DENSE_RANK () over (Order by Street_Address) as Denserank_Street
FROM The_Table
),
CTE_City AS
(
SELECT ID, City_Name, DENSE_RANK () over (Order by City_Name) as Denserank_City
FROM The_Table
)
SELECT a.Address, c.City
FROM CTE_Address AS A
INNER JOIN CTE_City AS C ON A.ID = C.ID
P.S. Without the ID column, the JOIN statement will give wrong match between City and Address.

Related

Want to see query result order as written in query

I have written a query in which I want to show the order of employees as per written in the query. Query is as follow
select * from employeemaster where employeename in
('Sachin','Gaurav','Vinay','Shiv','Sandeep','Vaibhav','Prashant')
I want to see the query result dislpaying Sachin first then the others and in this case the ID's of the employees is not in sequence, ex. Sachin's ID can be 4 and Vinay's ID can be 1. But as I have written Sachin in first place, then I want to see Sachin starting first in the result.
You can use a CTE with IDs to do the sorting and the filtering with an inner join.
WITH cte as (
SELECT *
FROM (VALUES
(1,'Sachin')
,(2,'Gaurav')
,(3,'Vinay')
,(4,'Shiv')
,(5,'Sandeep')
,(6,'Vaibhav')
,(7,'Prashant')
) a (id, [name])
)
SELECT em.*
FROM employeemaster em
JOIN cte
ON em.employeename = cte.[name]
ORDER BY cte.id
select * from employeemaster
join (values
('Sachin',1)
,('Gaurav',2)
,('Vinay',3)
,('Shiv',4)
,('Sandeep',5)
,('Vaibhav',6)
,('Prashant',7)) a(employeename ,_order) on a.employeename = employeemaster.employeename
order by a.[_order]

TSQL - De-duplication report - grouping

So I'm trying to create a report that ranks a duplicate record, the idea behind this is that the customer wants to merge a whole lot of duplicate records that came about from a migration.
I need the ranking so that my report can show which record should be the "main" record, i.e. the record that will have missing data pulled into it.
The duplicate definition is pretty simple:
If the email addresses are the same then it is always a duplicate, if
the emails do not match, then the first name, surname, and mobile must
match.
The ranking will be based on a whole bunch of columns in the table, so:
email address isn't NULL = 50
phone number isn't NULL = 20
etc.. whichever gets the highest number in the duplicate group becomes the main record. This is where I am having issues, I can't seem to find a way to get an incremental number for each duplicate set. This is some of the code I have so far:
( I took out some of the rank columns in the temp table and CTE expression to shorten it )
DECLARE #tmp_Duplicates TABLE (
tmp_personID INT
, tmp_Firstname NVARCHAR(100)
, tmp_Surname NVARCHAR(100)
, tmp_HomeEmail NVARCHAR(300)
, tmp_MobileNumber NVARCHAR(100)
--- Ratings
, tmp_HomeEmail_Rating INT
--- Groupings
, tmp_GroupNumber INT
)
;WITH cteDupes AS
(
SELECT ROW_NUMBER() OVER(PARTITION BY personHomeEmail ORDER BY personID DESC) AS RND,
ROW_NUMBER() OVER(PARTITION BY personHomeEmail ORDER BY personId) AS RNA,
p.personID, p.PersonFirstName, p.PersonSurname, p.PersonHomeEMail
, personMobileTelephone
FROM tblCandidate c INNER JOIN tblPerson p ON c.candidateID = p.personID
)
INSERT INTO #tmp_Duplicates
SELECT PersonID, PersonFirstName, PersonSurname, PersonHomeEMail, personMobileTelephone
, 10, RND
FROM cteDupes
WHERE RNA + RND > 2
ORDER BY personID, PersonFirstName, PersonSurname
SELECT * FROM #tmp_Duplicates
This gives me the results I want, but the group number isn't showing how I need it:
What I need is for each group to be an incremental value:

Is there a way to combine these queries?

I have begun working some of the programming problems on HackerRank as a "productive distraction".
I was working on the first few in the SQL section and came across this problem (link):
Query the two cities in STATION with the shortest and
longest CITY names, as well as their respective lengths
(i.e.: number of characters in the name). If there is
more than one smallest or largest city, choose the one
that comes first when ordered alphabetically.
Input Format
The STATION table is described as follows:
where LAT_N is the northern latitude and LONG_W is
the western longitude.
Sample Input
Let's say that CITY only has four entries:
1. DEF
2. ABC
3. PQRS
4. WXY
Sample Output
ABC 3
PQRS 4
Explanation
When ordered alphabetically, the CITY names are listed
as ABC, DEF, PQRS, and WXY, with the respective lengths
3, 3, 4 and 3. The longest-named city is obviously PQRS,
but there are options for shortest-named city; we choose
ABC, because it comes first alphabetically.
I agree that this requirement could be written much more clearly, but the basic gist is pretty easy to get, especially with the clarifying example. The question I have, though, occurred to me because the instructions given in the comments for the question read as follows:
/*
Enter your query here.
Please append a semicolon ";" at the end of the query and
enter your query in a single line to avoid error.
*/
Now, writing a query on a single line doesn't necessarily imply a single query, though that seems to be the intended thrust of the statement. However, I was able to pass the test case using the following submission (submitted on 2 lines, with a carriage return in between):
SELECT TOP 1 CITY, LEN(CITY) FROM STATION ORDER BY LEN(CITY), CITY;
SELECT TOP 1 CITY, LEN(CITY) FROM STATION ORDER BY LEN(CITY) DESC, CITY;
Again, none of this is advanced SQL. But it got me thinking. Is there a non-trivial way to combine this output into a single results set? I have some ideas in mind where the WHERE clause basically adds some sub-queries in an OR statement to combine the two queries into one. Here is another submission I had that passed the test case:
SELECT
CITY,
LEN(CITY)
FROM
STATION
WHERE
ID IN (SELECT TOP 1 ID FROM STATION ORDER BY LEN(CITY), CITY)
OR
ID IN (SELECT TOP 1 ID FROM STATION ORDER BY LEN(CITY) DESC, CITY)
ORDER BY
LEN(CITY), CITY;
And, yes, I realize that the final , CITY in the final ORDER BY clause is superfluous, but it kind of makes the point that this query hasn't really saved that much effort, especially against returning the query results separately.
Note: This isn't a true MAX and MIN situation. Given the following input, you aren't actually taking the first and last rows:
Sample Input
1. ABC
2. ABCD
3. ZYXW
Based on the requirements as written, you'd take #1 and #2, not #1 and #3.
This makes me think that my solutions actually might be the most efficient way to accomplish this, but my set-based thinking could always use some strengthening, and I'm not sure if that might play in here or not.
Here's another alternative. I think it's pretty straight forward, easy to understand what's going on. Performance is good.
Still has a couple of sub-queries though.
select
min(City), len(City)
from Station
group by
len(City)
having
len(City) = (select min(len(City)) from Station)
or
len(City) = (select max(len(City)) from Station)
Untested as well, but I don't see a reason for it not to work:
SELECT *
FROM (
SELECT TOP (1) CITY, LEN(CITY) AS CITY_LEN
FROM STATION
ORDER BY CITY_LEN, CITY
) AS T
UNION ALL
SELECT *
FROM (
SELECT TOP (1) CITY, LEN(CITY) AS CITY_LEN
FROM STATION
ORDER BY CITY_LEN DESC, CITY
) AS T2;
You cant have UNION ALL with ORDER BY for each SELECT statement, but you can workaround it by using subqueries togeter with TOP (1) clause and ORDER BY.
UNTESTED:
WITH CTE AS (
Select ID, len(City), row_number() over (order by City) as AlphaRN,
row_number() over (order by Len(City) desc) as LenRN) B
Select * from cte
Where AlphaRN = 1 and (lenRN = (select max(lenRN) from cte) or
lenRN = (Select min(LenRN) from cte))
Here's the best I could come up with:
with Ordering as
(
select
City,
Forward = row_number() over (order by len(City), City),
Backward = row_number() over (order by len(City) desc, City)
from
Station
)
select City, len(City) from Ordering where 1 in (Forward, Backward);
There are definitely a lot of ways to approach this as evidenced by the variety of answers, but I don't think anything beats your original two-query solution in terms of cleanly and concisely expressing the intended behavior. Interesting question, though!
This is what I came with. I tried to use only one query, without CTE's or sub-queries.
;WITH STATION AS ( --Dummy table
SELECT *
FROM (VALUES
(1,'DEF','EU',1,9),
(2,'ABC','EU',1,6), -- This is shortest
(3,'PQRS','EU',1,5),
(4,'WXY','EU',1,4),
(5,'FGHA','EU',1,2),
(6,'ASDFHG','EU',1,3) --This is longest
) as t(ID, CITY, [STATE], LAT_N,LONG_W)
)
SELECT TOP 1 WITH TIES CITY,
LEN(CITY) as CITY_LEN
FROM STATION
ORDER BY ROW_NUMBER() OVER(PARTITION BY LEN(CITY) ORDER BY LEN(CITY) ASC),
CASE WHEN MAX(LEN(CITY)) OVER (ORDER BY (SELECT NULL)) = LEN(CITY)
OR MIN(LEN(CITY)) OVER (ORDER BY (SELECT NULL))= LEN(CITY)
THEN 0 ELSE 1 END
Output:
CITY CITY_LEN
ABC 3
ASDFHG 6
select min(CITY), length(CITY)
from STATION
group by length(CITY)
having length(CITY) = (select min(length(CITY)) from STATION)
or length(CITY) = (select max(length(CITY)) from STATION);

SELECT from multiple queries

I have this tables:
tblDiving(
diving_number int primary key
diving_club int
date_of_diving date)
tblDivingClub(
number int primary key not null check (number>0),
name char(30),
country char(30))
tblWorks_for(
diver_number int
club_number int
end_working_date date)
tblCountry(
name char(30) not null primary key)
I need to write a query to return a name of a country and the number of "Super club" in it.
a Super club is a club which have more than 25 working divers (tblWorks_for.end_working_date is null) or had more than 100 diving's in it(tblDiving) in the last year.
after I get the country and number of super club, I need to show only the country's that contains more than 2 super club.
I wrote this 2 queries:
select tblDivingClub.name,count(distinct tblWorks_for.diver_number) as number_of_guids
from tblWorks_for
inner join tblDivingClub on tblDivingClub.number = tblWorks_for.club_number,tblDiving
where tblWorks_for.end_working_date is null
group by tblDivingClub.name
select tblDivingClub.name, count(distinct tblDiving.diving_number) as number_of_divings
from tblDivingClub
inner join tblDiving on tblDivingClub.number = tblDiving.diving_club
WHERE tblDiving.date_of_diving <= DATEADD(year,-1, GETDATE())
group by tblDivingClub.name
But I don't know how do I continue.
Every query works separately, but how do I combine them and select from them?
It's university assignment and I'm not allowed to use views or temporary tables.
It's my first program so I'm not really sure what I'm doing:)
WITH CTE AS (
select tblDivingClub.name,count(distinct tblWorks_for.diver_number) as diving_number
from tblWorks_for
inner join tblDivingClub on tblDivingClub.number = tblWorks_for.club_number,tblDiving
where tblWorks_for.end_working_date is null
group by tblDivingClub.name
UNION ALL
select tblDivingClub.name, count(distinct tblDiving.diving_number) as diving_number
from tblDivingClub
inner join tblDiving on tblDivingClub.number = tblDiving.diving_club
WHERE tblDiving.date_of_diving <= DATEADD(year,-1, GETDATE())
group by tblDivingClub.name
)
SELECT * FROM CTE
You can combine the queries using a UNION ALL as long as there are the same number of columns in each query. You can then roll them into a Common Table Expression (CTE) and do a select from that.

T-SQL order by, based on other column value

I'm stuck with a query which should be pretty simple but, for reasons unknown, my brain is not playing ball here ...
Table:
id(int) | strategy (varchar) | value (whatever)
1 "ABC" whatevs
2 "ABC" yeah
3 "DEF" hello
4 "DEF" kitty
5 "QQQ" hurrr
The query should select ALL rows grouped on strategy but only one row per strategy - the one with the higest id.
In the case above, it should return rows with id 2, 4 and 5
SELECT id, strategy , value
FROM (
SELECT id, strategy , value
,ROW_NUMBER() OVER (PARTITION BY strategy ORDER BY ID DESC) rn
FROM Table_Name
) Sub
WHERE rn = 1
Working SQL FIDDLE
You can use window function to get the solution you want. Fiddle here
with cte as
(
select
rank()over(partition by strategy order by id desc) as rnk,
id, strategy, value from myT
)
select id, strategy, value from
cte where rnk = 1;
Try this:
SELECT T2.id,T1.strategy,T1.value
FROM TableName T1
INNER JOIN
(SELECT MAX(id) as id,strategy
FROM TableName
GROUP BY strategy) T2
ON T1.id=T2.id
Result:
ID STRATEGY VALUE
2 ABC yeah
4 DEF kitty
5 QQQ hurrr
See result in SQL Fiddle.
SELECT id, strategy , value
FROM (
SELECT id, strategy , value
,MAX(id) OVER (PARTITION BY strategy) MaxId
FROM YourTable
) Sub
WHERE id=MaxId
You may try this one as well:
SELECT id, strategy, value FROM TableName WHERE id IN (
SELECT MAX(id) FROM TableName GROUP BY strategy
)
Bit depends on your data, you might get results faster with it as it does not do sorting, but by the other hand it uses IN, which can slow you down if there is many 'strategies'

Resources