Select all columns, group by one column with other requirements

Select all columns, group by one column with other requirements - sql-server

This is not the original table, I create the similar situation so that I can explain the problem better.
Let's say I have a table called [student], with 4 columns: [name], [gender], [age], [country].
How to do a 'SELECT *' query that returns the rows that meet this requirements:
student must be male
only one student from each country
if there are more than one students from a country, choose the oldest one
I tried using GROUP BY on [country] but keep getting error "Column '...' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause"

One possible approach:
SELECT *
FROM (
SELECT *, ROW_NUMBER() OVER (PARTITION BY Country ORDER BY Age DESC) RN
FROM students
WHERE gender = 'male') T
WHERE RN = 1;
The subquery selects only male students, and assigns them a row number based on their age, partitioned by country.

Related

User Defined Function with a sub query

Create a user defined function named XXRepeatCustomer (where the XX are your initials). The function is to have one input parameter. Use the INT datatype for the input parameter. When the function is executed it is to return a three column table (CustFirstName, CustLastName, and Phone) for customers that placed a number of orders greater than or equal to the number passed in via the input parameter.
In order to receive a total number of orders placed I have joined together the Customer, and CustOrder tables. The problem only wants me to show the first, last, and phone of each customer but not the total of orders. I'm struggling with assigning the #orders parameter, and counting the total amount of orders in the sub query.
CREATE FUNCTION dbo.JERepeatCustomer
(#orders INT)
RETURNS TABLE AS
RETURN (SELECT CustFirstName, CustLastName, Phone
FROM Customer C JOIN CustOrder CO
ON C.CustomerID = CO.CustomerID
WHERE #orders <= OrderID AND OrderID = (SELECT COUNT (DISTINCT OrderID) FROM CustOrder)
GROUP BY CustFirstName, CustLastName, Phone)
I expect the user to enter a 7, or any number, and the results show only the customers who have ordered 7, or more.

The keyword you need is HAVING. HAVING is similar to WHERE. WHERE will filter returned rows based on a specific value in that column, while HAVING will filter rows based on an aggregated value in the column.
For example, you have a customer table, and in your orders table, you have all the orders for each customer.
DECLARE #input INT = 7
SELECT ct.customer, ct.phone, COUNT(ot.orderID)
FROM customertable ct
INNER JOIN ordertable ot
ON ct.customerID = ot.customerID
GROUP BY ct.customer, ct.phone
HAVING COUNT(ot.OrderID) >= #input

SELECT CustFirstName, CustLastName, Phone
FROM Customer C
CORSS APPLY (
SELECT COUNT(*) AS Orders
FROM CustOrder CO
WHERE C.CustomerID = CO.CustomerID) CustomerOrders
Where CustomerOrders.Orders >= #orders

TSQL - De-duplication report - grouping

So I'm trying to create a report that ranks a duplicate record, the idea behind this is that the customer wants to merge a whole lot of duplicate records that came about from a migration.
I need the ranking so that my report can show which record should be the "main" record, i.e. the record that will have missing data pulled into it.
The duplicate definition is pretty simple:
If the email addresses are the same then it is always a duplicate, if
the emails do not match, then the first name, surname, and mobile must
match.
The ranking will be based on a whole bunch of columns in the table, so:
email address isn't NULL = 50
phone number isn't NULL = 20
etc.. whichever gets the highest number in the duplicate group becomes the main record. This is where I am having issues, I can't seem to find a way to get an incremental number for each duplicate set. This is some of the code I have so far:
( I took out some of the rank columns in the temp table and CTE expression to shorten it )
DECLARE #tmp_Duplicates TABLE (
tmp_personID INT
, tmp_Firstname NVARCHAR(100)
, tmp_Surname NVARCHAR(100)
, tmp_HomeEmail NVARCHAR(300)
, tmp_MobileNumber NVARCHAR(100)
--- Ratings
, tmp_HomeEmail_Rating INT
--- Groupings
, tmp_GroupNumber INT
)
;WITH cteDupes AS
(
SELECT ROW_NUMBER() OVER(PARTITION BY personHomeEmail ORDER BY personID DESC) AS RND,
ROW_NUMBER() OVER(PARTITION BY personHomeEmail ORDER BY personId) AS RNA,
p.personID, p.PersonFirstName, p.PersonSurname, p.PersonHomeEMail
, personMobileTelephone
FROM tblCandidate c INNER JOIN tblPerson p ON c.candidateID = p.personID
)
INSERT INTO #tmp_Duplicates
SELECT PersonID, PersonFirstName, PersonSurname, PersonHomeEMail, personMobileTelephone
, 10, RND
FROM cteDupes
WHERE RNA + RND > 2
ORDER BY personID, PersonFirstName, PersonSurname
SELECT * FROM #tmp_Duplicates
This gives me the results I want, but the group number isn't showing how I need it:
What I need is for each group to be an incremental value:

SELECT from multiple queries

I have this tables:
tblDiving(
diving_number int primary key
diving_club int
date_of_diving date)
tblDivingClub(
number int primary key not null check (number>0),
name char(30),
country char(30))
tblWorks_for(
diver_number int
club_number int
end_working_date date)
tblCountry(
name char(30) not null primary key)
I need to write a query to return a name of a country and the number of "Super club" in it.
a Super club is a club which have more than 25 working divers (tblWorks_for.end_working_date is null) or had more than 100 diving's in it(tblDiving) in the last year.
after I get the country and number of super club, I need to show only the country's that contains more than 2 super club.
I wrote this 2 queries:
select tblDivingClub.name,count(distinct tblWorks_for.diver_number) as number_of_guids
from tblWorks_for
inner join tblDivingClub on tblDivingClub.number = tblWorks_for.club_number,tblDiving
where tblWorks_for.end_working_date is null
group by tblDivingClub.name
select tblDivingClub.name, count(distinct tblDiving.diving_number) as number_of_divings
from tblDivingClub
inner join tblDiving on tblDivingClub.number = tblDiving.diving_club
WHERE tblDiving.date_of_diving <= DATEADD(year,-1, GETDATE())
group by tblDivingClub.name
But I don't know how do I continue.
Every query works separately, but how do I combine them and select from them?
It's university assignment and I'm not allowed to use views or temporary tables.
It's my first program so I'm not really sure what I'm doing:)

WITH CTE AS (
select tblDivingClub.name,count(distinct tblWorks_for.diver_number) as diving_number
from tblWorks_for
inner join tblDivingClub on tblDivingClub.number = tblWorks_for.club_number,tblDiving
where tblWorks_for.end_working_date is null
group by tblDivingClub.name
UNION ALL
select tblDivingClub.name, count(distinct tblDiving.diving_number) as diving_number
from tblDivingClub
inner join tblDiving on tblDivingClub.number = tblDiving.diving_club
WHERE tblDiving.date_of_diving <= DATEADD(year,-1, GETDATE())
group by tblDivingClub.name
)
SELECT * FROM CTE
You can combine the queries using a UNION ALL as long as there are the same number of columns in each query. You can then roll them into a Common Table Expression (CTE) and do a select from that.

How can I output information from a different table? [duplicate]

This question already has answers here:
How to get max number in a column?
(3 answers)
Closed 8 years ago.
I have a query that i get the MAX number of "stars"
<cfquery datasource="Intranet" name="getMaxstars">
SELECT TOP (1) WITH TIES employee, SUM(execoffice_status) AS 'total_max'
FROM CSEReduxResponses
GROUP BY employee
ORDER BY 'total_max' DESC
</cfquery >
I also have a different table EMPLOYEE. Table EMPLOYEE also comes from a different datasource="phonelist". Where in this table I have the employees first_name and last_name columns , they share the same column emp_id.
How can I output the employee first_name and last_name using the other table.
What I eventually I want to do it output:
max:
john doe - stars = 4

Use a subquery like below:
select employee_id, sum(stars) as num_stars
from table_a
group by employee_id
having sum(stars) = (select max(num_stars)
from (select employee_id, sum(stars) as num_stars
from table_a
group by employee_id) x)

SELECT TOP (1) WITH TIES employee_id, SUM(stars) AS 'total'
FROM Table_A
GROUP BY employee_id
ORDER BY 'total' DESC
This is an alternative method.

Problem with unique SQL query

I want to select all records, but have the query only return a single record per Product Name. My table looks similar to:
SellId ProductName Comment
1 Cake dasd
2 Cake dasdasd
3 Bread dasdasdd
where the Product Name is not unique. I want the query to return a single record per ProductName with results like:
SellId ProductName Comment
1 Cake dasd
3 Bread dasdasdd
I have tried this query,
Select distict ProductName,Comment ,SellId from TBL#Sells
but it is returning multiple records with the same ProductName. My table is not realy as simple as this, this is just a sample. What is the solution? Is it clear?

Select ProductName,
min(Comment) , min(SellId) from TBL#Sells
group by ProductName
If y ou only want one record per productname, you ofcourse have to choose what value you want for the other fields.
If you aggregate (using group by) you can choose an aggregate function,
htat's a function that takes a list of values and return only one : here I have chosen MIN : that is the smallest walue for each field.
NOTE : comment and sellid can come from different records, since MIN is taken...
Othter aggregates you might find useful :
FIRST : first record encountered
LAST : last record encoutered
AVG : average
COUNT : number of records
first/last have the advantage that all fields are from the same record.

SELECT S.ProductName, S.Comment, S.SellId
FROM
Sells S
JOIN (SELECT MAX(SellId)
FROM Sells
GROUP BY ProductName) AS TopSell ON TopSell.SellId = S.SellId
This will get the latest comment as your selected comment assuming that SellId is an auto-incremented identity that goes up.

I know, you've got an answer already, I'd like to offer a way that was fastest in terms of performance for me, in a similar situation. I'm assuming that SellId is Primary Key and identity. You'd want an index on ProductName for best performance.
select
Sells.*
from
(
select
distinct ProductName
from
Sells
) x
join
Sells
on
Sells.ProductName = x.ProductName
and Sells.SellId =
(
select
top 1 s2.SellId
from
Sells s2
where
x.ProductName = s2.ProductName
Order By SellId
)
A slower method, (but still better than Group By and MIN on a long char column) is this:
select
*
from
(
select
*,ROW_NUMBER() over (PARTITION BY ProductName order by SellId) OccurenceId
from sells
) x
where
OccurenceId = 1
An advantage of this one is that it's much easier to read.

create table Sale
(
SaleId int not null
constraint PK_Sale primary key,
ProductName varchar(100) not null,
Comment varchar(100) not null
)
insert Sale
values
(1, 'Cake', 'dasd'),
(2, 'Cake', 'dasdasd'),
(3, 'Bread', 'dasdasdd')
-- Option #1 with over()
select *
from Sale
where SaleId in
(
select SaleId
from
(
select SaleId, row_number() over(partition by ProductName order by SaleId) RowNumber
from Sale
) tt
where RowNumber = 1
)
order by SaleId
-- Option #2
select *
from Sale
where SaleId in
(
select min(SaleId)
from Sale
group by ProductName
)
order by SaleId
drop table Sale