select unique rows based on single distinct column [duplicate] - sql-server

This question already has answers here:
Get top 1 row of each group
(19 answers)
Closed 7 months ago.
I want to select rows that have a distinct email, see the example table below:
+----+---------+-------------------+-------------+
| id | title | email | commentname |
+----+---------+-------------------+-------------+
| 3 | test | rob#hotmail.com | rob |
| 4 | i agree | rob#hotmail.com | rob |
| 5 | its ok | rob#hotmail.com | rob |
| 6 | hey | rob#hotmail.com | rob |
| 7 | nice! | simon#hotmail.com | simon |
| 8 | yeah | john#hotmail.com | john |
+----+---------+-------------------+-------------+
The desired result would be:
+----+-------+-------------------+-------------+
| id | title | email | commentname |
+----+-------+-------------------+-------------+
| 3 | test | rob#hotmail.com | rob |
| 7 | nice! | simon#hotmail.com | simon |
| 8 | yeah | john#hotmail.com | john |
+----+-------+-------------------+-------------+
Where I don't care which id column value is returned.
What would be the required SQL?

Quick one in TSQL
SELECT a.*
FROM emails a
INNER JOIN
(SELECT email,
MIN(id) as id
FROM emails
GROUP BY email
) AS b
ON a.email = b.email
AND a.id = b.id;

I'm assuming you mean that you don't care which row is used to obtain the title, id, and commentname values (you have "rob" for all of the rows, but I don't know if that is actually something that would be enforced or not in your data model). If so, then you can use windowing functions to return the first row for a given email address:
select
id,
title,
email,
commentname
from
(
select
*,
row_number() over (partition by email order by id) as RowNbr
from YourTable
) source
where RowNbr = 1

If you are using MySql 5.7 or later, according to these links (MySql Official, SO QA), we can select one record per group by with out the need of any aggregate functions.
So the query can be simplified to this.
select * from comments_table group by commentname;
Try out the query in action here

Since you don't care which id to return I stick with MAX id for each email to simplify SQL query, give it a try
;WITH ue(id)
AS
(
SELECT MAX(id)
FROM table
GROUP BY email
)
SELECT * FROM table t
INNER JOIN ue ON ue.id = t.id

SELECT * FROM emails GROUP BY email;

Related

Check if records exists at least once in LEFT JOINED Table

I have an Images, Orders and OrderItems table, I want to match for any images, if any has already been bought by the User passed as parameters by displaying true or false in an IsBought column.
Select Images.Id,
Images.Title,
Images.Description,
Images.Location,
Images.PriceIT,
Images.PostedAt,
CASE WHEN OrderItems.ImageId = Images.Id THEN CAST(1 AS BIT)
ELSE CAST(0 AS BIT) END
AS 'IsBought'
FROM Images
INNER JOIN Users as u on Images.UserId = u.Id
LEFT JOIN Orders on Orders.UserId = #userId
LEFT JOIN OrderItems on Orders.Id = OrderItems.OrderId and OrderItems.ImageId = Images.Id
Group By Images.Id,
Images.Title,
Images.Description,
Images.Location,
Images.PriceIT,
Images.PostedAt,
OrderItems.ImageId,
Orders.UserId
When I use this CASE WHEN I have duplicates when the item has been bought where IsBought is True and the duplicate is False.
In the case where the Item has never been bought, there is no duplicates, IsBought is just equal to False
----------------------------------
| User | type |
----------------------------------
| Id | nvarchar(450) |
----------------------------------
| .......|
----------------------------------
----------------------------------
| Orders | type |
----------------------------------
| Id | nvarchar(255) |
----------------------------------
| UserId | nvarchar(450) |
----------------------------------
| ........................... |
----------------------------------
----------------------------------
| OrderItems | type |
----------------------------------
| Id | nvarchar(255) |
----------------------------------
| OrderId | nvarchar(255) |
----------------------------------
| ImageId | int |
----------------------------------
----------------------------------
| Images | type |
----------------------------------
| Id | int |
----------------------------------
| UserId | nvarchar(450) |
----------------------------------
| Title | nvarchar(MAX) |
----------------------------------
| Description| nvarhar(MAX) |
----------------------------------
| ......................... |
----------------------------------
Any ideas on how I could just have one row per Images with IsBought set to true or false but not duplicates?
I would like something like this:
----------------------------------------------------------------------------
| Id | Title | Description | Location | PriceIT | Location | IsBought |
----------------------------------------------------------------------------
| 1 | Eiffel Tower | .... | ...... | 20.0 | Paris | true |
----------------------------------------------------------------------------
| 2 | Tore di Pisa | .... | ...... | 20.0 | Italia | false |
---------------------------------------------------------------------------
| etc ......
---------------------------------------------------------------------------
Your query logic looks suspicious. It is unusual to see a join that consists only of a comparison of a column from the unpreserved table to a parameter. I suspect that you don't need a join to users at all since you seem to be focused on things "bought" by a person and not things "created" (which is implied by the name "author") by that same person. And a group by clause with no aggregate is often a cover-up for a logically flawed query.
So start over. You want to see all images apparently. For each, you simply want to know if that image is associated with any order of a given person.
select img.*, -- you would, or course, only select the columns needed
(select count(*) from Sales.SalesOrderDetail as orddet
where orddet.ProductID = img.ProductID) as [Order Count],
(select count(*) from Sales.SalesOrderDetail as orddet
inner join Sales.SalesOrderHeader as ord
on orddet.SalesOrderID = ord.SalesOrderID
where orddet.ProductID = img.ProductID
and ord.CustomerID = 29620
) as [User Order Count],
case when exists(select * from Sales.SalesOrderDetail as orddet
inner join Sales.SalesOrderHeader as ord
on orddet.SalesOrderID = ord.SalesOrderID
where orddet.ProductID = img.ProductID
and ord.CustomerID = 29620) then 1 else 0 end as [Has Ordered]
from Production.ProductProductPhoto as img
where img.ProductID between 770 and 779
order by <something useful>;
Notice the aliases - it is much easier to read a long query when you use aliases that are shorter but still understandable (i.e., not single letters). I've included 3 different subqueries to help you understand correlation and how you can build your logic to achieve your goal and help debug any issues you find.
This is based on AdventureWorks sample database - which you should install and use as a learning tool (and to help facilitate discussions with others using a common data source). Note that I simply picked a random customer ID value - you would use your parameter. I filtered the query to a range of images to simplify debugging. Those are very simple but effective methods to help write and debug sql.

MS SQL SERVER pivot table aggregation function

I have a question about the application of the aggregation function that used in pivot function.
The table OCCUPATIONS looks like this:
+-----------+------------+
| Name | Occupation |
+-----------+------------+
| Ashley | Professor |
| Samantha | Actor |
| Julia | Doctor |
| Britney | Professor |
| Maria | Professor |
| Meera | Professor |
| Priya | Doctor |
| Priyanka | Professor |
| Jennifer | Actor |
| Ketty | Actor |
| Belvet | Professor |
| Naomi | Professor |
| Jane | Singer |
| Jenny | Singer |
| Kristeen | Singer |
| Christeen | Singer |
| Eve | Actor |
| Aamina | Doctor |
+-----------+------------+
The first column is name and second is occupation.
Now I want to make a pivot table that each column is one kind of occupation and name is sorted alphabetically and print NULL when no more names for an occupation.
The output should looks like this:
+--------+-----------+-----------+----------+
| Doctor | Professor | Singer | Actor |
+--------+-----------+-----------+----------+
| Aamina | Ashley | Christeen | Eve |
| Julia | Belvet | Jane | Jennifer |
| Priya | Britney | Jenny | Ketty |
| NULL | Maria | Kristeen | Samantha |
| NULL | Meera | NULL | NULL |
| NULL | Naomi | NULL | NULL |
| NULL | Priyanka | NULL | NULL |
+--------+-----------+-----------+----------+
Here the first column is Doctor, second is Professor, third is Singer and fourth is Actor. The code to generate result is
select [Doctor],[Professor],[Singer],[Actor] from (select o.Name,
o.Occupation, row_number() over(partition by o.Occupation order by
o.Name) id from OCCUPATIONS o) as src
pivot
(max(src.Name)
for src.Occupation in ([Doctor],[Professor],[Singer],[Actor])
) as m
But when I replace the table generated from here:
(select o.Name, o.Occupation, row_number() over(partition by o.Occupation order by o.Name) id from OCCUPATIONS o) as src' to 'OCCUPATIONS'
the result is like this:
Priya Priyanka Kristeen Samantha
I understand why this happens, because we take a MAX() in each group. However, in the previous result, I also use a MAX() function to generate NULL when there's no more names coming, it doesn't return a max value as my expected, instead it return every name.
My question is why this happens?
Thank you!
Here could be the source of issue:
row_number() over(partition by o.Occupation order by
o.Name) id from OCCUPATIONS o
The Row_Number here you are using is PARTITION BY o.Occupation, so in your PIVOT, it will pivot the records by the occupation group, which means the id is repeating. If you get rid of the PARTITION BY and just keep the Order by part, it should work.
Try this approach:
find the occupations with more people associated
generate table with a sequence of numbers from 1 to the number of people calculated in the previous point
join the table generated in point 2. four times with the original table each time filtering on a different Occupation
This is the query:
declare #tmp table([Name] varchar(50),[Occupation] varchar(50))
insert into #tmp values
('Ashley','Professor') ,('Samantha','Actor') ,('Julia','Doctor') ,('Britney','Professor') ,('Maria','Professor') ,('Meera','Professor') ,('Priya','Doctor') ,('Priyanka','Professor') ,('Jennifer','Actor') ,('Ketty','Actor') ,('Belvet','Professor') ,('Naomi','Professor') ,('Jane','Singer') ,('Jenny','Singer') ,('Kristeen','Singer') ,('Christeen','Singer') ,('Eve','Actor') ,('Aamina','Doctor')
--this variable contains the occuation that has more Names (rows) in the table
--it will be the number of total rows in output table
declare #Occupation_with_max_rows varchar(50)
--populate #Occupation_with_max_rows variable
select top 1 #Occupation_with_max_rows=Occupation
from #tmp
group by Occupation
order by count(*) desc
--generate final results joining 4 times the original table with the sequence table
select D.Name as Doctor,P.Name as Professor,S.Name as Singer,A.Name as Actor
from
(select ROW_NUMBER() OVER (ORDER BY [Name]) as ord from #tmp where Occupation = #Occupation_with_max_rows) O
left join
(select ROW_NUMBER() OVER (ORDER BY [Name]) as ord, [Name] from #tmp where Occupation='Doctor') D on O.ord = D.ord
left join
(select ROW_NUMBER() OVER (ORDER BY [Name]) as ord, [Name] from #tmp where Occupation='Professor') P on O.ord = P.ord
left join
(select ROW_NUMBER() OVER (ORDER BY [Name]) as ord, [Name] from #tmp where Occupation='Singer') S on O.ord = S.ord
left join
(select ROW_NUMBER() OVER (ORDER BY [Name]) as ord, [Name] from #tmp where Occupation='Actor') A on O.ord = A.ord
Results:
Please find below code which works as expected :
select [Doctor],[Professor],[Singer],[Actor]
from
(
select row_number() over (partition by occupation order by name)[A],name,occupation
from occupations
)src
pivot
(
max(Name)
for occupation in ([Doctor],[Professor],[Singer],[Actor])
)piv;

Select query with only a single record in a mapping table

Please, can someone help me with what is possibly a simple query?
We have two tables with below structure.
Customer table:
+----+-----------+
| id | name |
+----+-----------+
| 1 | customer1 |
| 2 | customer2 |
| 3 | customer3 |
+----+-----------+
Customer role mapping table:
+-------------+-----------------+
| customer_id | customerRole_id |
+-------------+-----------------+
| 1 | 1 |
| 1 | 2 |
| 2 | 1 |
| 3 | 1 |
| 4 | 1 |
| 5 | 1 |
+-------------+-----------------+
I want to select customers with role id 1 only NOT with role id 1 AND 2.
So, in this case, it would be customer id 2,3,4 & 5. ignoring 1 as it has multiple roles.
Is there a simple query to do this?
Many thanks, for any help offered.
Hmmm, there are several ways to do this.
select c.*
from customers c
where exists (select 1 from mapping m where m.customerid = c.id and m.role = 1) and
not exists (select 1 from mapping m where m.customerid = c.id and m.role <> 1);
If you just want the customer id, a perhaps simpler version is:
select customerid
from mapping
group by customerid
having min(role) = 1 and max(role) = 1;
This solution assumes that role is never NULL.

SQL Server 2012 Count

I'm using SQL Server 2012. I have a table CustomerMaster. Here is some sample content:
+--------+---------------+-----------------+-------------+
| CustNo | NewMainCustNo | Longname | NoOfMembers |
+--------+---------------+-----------------+-------------+
| 3653 | 3653 | GroupId:003 | |
| 3654 | 3654 | GroupId:004 | |
| 11 | 3653 | Peter Evans | |
| 155 | 3653 | Harold Charley | |
| 156 | 3654 | David Arnold | |
| 160 | 3653 | Mickey Willson | |
| 2861 | 3653 | Jonathan Wickey | |
| 2871 | 3653 | William Jason | |
+--------+---------------+-----------------+-------------+
The NewMainCustNo for Customer records is equivalent to CustNo from Group records. Basically each customer belongs to a particular group.
My question is how to update the NoOfMembers column for group records with total number of customer belongs to a certain group.
Please share your ideas on how to do this.
Thank you...
This is the solution I came up with
update CustomerMaster
set NoOfMembers = (select count(*) from CustomerMaster m2 where m2.NewMainCustNo = CustomerMaster.CustNo and m2.CustNo <> CustomerMaster.CustNo)
where LongName like 'GroupId:%'
Check this SQL Fiddle to see the query in action.
However I disagree with your data structure. You should have a separate table for your groups. In your customer table you only need to reference the ID of the group in the group table. This makes everything (including the query above) much cleaner.
If I understand correctly, you can use a window function for the update. Here is an example with an updatable CTE:
with toupdate as (
select cm.*, count(*) over (partition by NewMainCustNo) as grpcount
from customermaster
)
update toupdate
set NoOfMembers = grpcount;
You may not have the option to do so, but I would separate groups out into their own table.
create table Groups (
GroupID int primary key,
Name varchar(200)
)
Then, change NewMainCustNo to GroupID, create, purge your customer table of groups, and go from there. Then, getting a group count would be:
select GroupID,
Name [Group Name],
COUNT(*)
from Groups g
join Customers c on
c.GroupID = g.GroupID

I would like to Pivot column data related to one entity into one row in SQL Server [duplicate]

This question already has an answer here:
Pivot without aggregate function in MSSQL 2008 R2
(1 answer)
Closed 8 years ago.
I am selecting data specific to certain clients out of multiple tables where data from one client spans multiple rows, however I would like duplicate entries to be combined onto one row. One basic example would be as follows
+------------+-------+-------------------------------+
| ClientCode | Name | Email |
+------------+-------+-------------------------------+
| CAL01 | Doug | itsjustdoug#internet.org |
| CAL01 | Doug | doug#email.com |
| MER03 | Jane | janehasemail#email.com |
| MER03 | Jane | janerocks#web.com |
| MER03 | Jane | janehatesspam#justforspam.net |
+------------+-------+-------------------------------+
The results I am looking for would be more like
+------------+-------+-------------------+-------------------+-----------------------+
| ClientCode | Name | Email1 | Email2 | Email3 |
+------------+-------+-------------------+-------------------+-----------------------+
| CAL01 | Doug | itsjustdoug#inte | doug#email.com | NULL |
| MER03 | Jane | janehasemail#ema | janerocks#web.com | janehatesspam#justfor |
+------------+-------+-------------------+-------------------+-----------------------+
Here is what I have tried.
Select * From
(Select
ClientCode
,Name
,Email
From dbo.Clients) T
PIVOT(Max (Email) for Email in (Email1, Email2, Email3)) T2
This does not seem to be the correct way to achieve what I want. Any suggestions would be appreciated. It is worth noting that the actual query is much more complicated and contains many joins and perhaps several different instances where I would use this sort of "pivoting?"
Thanks
Generate Row_number per clientcode in pivot source query
And concatenate Email text with the generated row_number which will create the pivot column list
SELECT *
FROM (SELECT ClientCode,
NAME,
Email,
'Email'+ CONVERT(VARCHAR(50), Row_number() OVER(partition BY ClientCode ORDER BY email)) Emails
FROM dbo.Clients) T
PIVOT(Max (Email)
FOR Emails IN( [Email1],
[Email2],
[Email3])) T2
SQLFIDDLE DEMO

Resources