How to combine several rows into one row in SQL - sql-server

I want to find the total amount of associates at a company, as well as how many female and male engineers all in one query.
I am able to just get the sum of all associates on one row when that is the only thing that my query is looking for, but as soon as I try and combine it with the query looking for the number of males an females, the associate job titles start to separate.
My current code looks like this:
SELECT
count(*) as [Number of Employees], Gender, Job
FROM
#table
WHERE
Job like '%Associate%'
GROUP BY grouping sets
((Job), (Gender))
the result sets have a row for each type of associate job, and I am trying to figure out how to combine them under one row under the name 'associate'

Here is the simple way
SELECT SUM(numberofemployees) AS total,
Gender,
CASE WHEN JobTitle LIKE '%Engineer%' THEN 'Engineer' ELSE JobTitle END AS JobTitle
FROM #table
GROUP BY Gender,
CASE WHEN JobTitle LIKE '%Engineer%' THEN 'Engineer' ELSE JobTitle END
OUTPUT:
total Gender JobTitle
5 F NULL
3 M NULL
8 NULL Engineer

Check this script. In addition, this will also return "Engineer" value in Job Title column for F & M. But if it is your requirement to have NULL there, some adjustment required in the script.
Note: Sub query is just for a better understanding. This can be also achieved in a single query.
SELECT COUNT(*),
GENDER,
JobTitle
FROM
(
SELECT
JobTitle,
Gender,
CASE
WHEN JobTitle like '%Engineer%' THEN 'Engineer'
ELSE JobTitle
END AS JobTitle
FROM HumanResources.EmployeeFROM
)A
WHERE A.JobTitle = 'Engineer'
GROUP BY GENDER,JobTitle

Add this:
SELECT
count(*) as [Number of Employees],
Gender,
JobTitle
FROM
HumanResources.Employee
WHERE
JobTitle like '%Engineer%'
Group by grouping sets
((JobTitle), (Gender))
HAVING
COUNT(*)>=3

Ok so for what you want
You can try this because
SELECT SUM(a.[Number of Employees]) AS Value1, a.Gender , a.JobGroup FROM (
SELECT 'Engineer' AS JobGroup, COUNT(*) AS [Number of Employees], Gender, JobTitle FROM HumanResources.Employee WHERE JobTitle LIKE '%Engineer%' GROUP BY grouping sets ((JobTitle), (Gender))) a
GROUP BY a.Gender,a.JobGroup

Related

order by on set 1 union set 2 sql

I have this situation:
select name,
subject
from Table_1
where date > getdate()-1
group by name, subject
order by id desc
union
select name,
subject
from table_2
where name not like 'abc%'
Table_1 and table_2 has similar structure.
I need to order by in SET1 UNION SET 2
This is not allowed in sql server. says "ORDER BY items must appear in the select list". I dont understand why the problem is. I am selecting equal number of columns on both queries. only that I want the result set together.
(on SQL Server 2017)
Anybody help!!
Thanks in advance.
Elaborating on my comment
select name,
subject
from Table_1
where date > getdate()-1
--group by name, subject --this isn't needed
union
select name,
subject
from table_2
where name not like 'abc%'
order by <yourCol> desc --notice change here
And for a conditional order by, there are a few posts on that.
Also, you don't need the group by since union removes duplicates.
But, the error is clear that the column you want to order by must be contained in the select list...
If you want to keep the first set ordered before the second set, just use a static column....
select name,
subject,
1 as Sort
from Table_1
where date > getdate()-1
--group by name, subject --this isn't needed
union
select name,
subject,
2 as Sort
from table_2
where name not like 'abc%'
order by Sort asc--notice change here

Same query giving different results

I am still new to working in databases, so please have patience with me. I have read through a number of similar questions, but none of them seem to be talking about the same issue I am facing.
Just a bit of info on what I am doing, I have a table filled with contact information, and some of the contacts are duplicated, but most of the duplicated rows have a truncated phone number, which makes that data useless.
I wrote the following query to search for the duplicates:
WITH CTE (CID, Firstname, lastname, phone, email, length, dupcnt) AS
(
SELECT
CID, Firstname, lastname, phone, email, LEN(phone) AS length,
ROW_NUMBER() OVER (PARTITION BY Firstname, lastname, email
ORDER BY Firstname) AS dupcnt
FROM
[data.com_raw]
)
SELECT *
FROM CTE
WHERE dupcnt > 1
AND length <= 10
I assumed that this query would find all records that have duplicates based on the three columns that I have specified, and select any that have the dupcnt greater than 1, and a phone column with a length less than or equal to 10. But when I run the query more than once I get different result sets each execution. There must be some logic that I am missing here, but I am completely baffled by this. All of the columns are of varchar datatype, except for CID, which is int.
Instead of ROW_NUMBER() use COUNT(*), and remove the ORDER BY since that's not necessary with COUNT(*).
The way you have it now, you are chunking up records into similar groups/partitions of records by firstname/lastname/email. Then you are ORDERING each group/partition by firstname. Firstname is part of the partition, meaning every firstname in that group/partition is identical. You will get different results depending on how SQL Server fetches the results from storage (which record it found first is 1, what it found second is 2). Every time it fetches records (every time you run this sql) it may fetch each record from disk or cache at a different order.
Count(*) will return ALL duplicate rows
So instead:
COUNT(*) OVER (PARTITION BY Firstname, lastname, email ) AS dupcnt
Which will return the number of records that share the same firstname, lastname, and email. You then keep any record that is greater than 1.
ORDER BY Firstname is non-deterministic here as they all have the same Firstname from the partition by
If CID is unique you could use that for the order by but I suspect you really want count.
I believe you are getting different results with every run would be because (a) unless clearly specified in the query, you can assume nothing about the order in which SQL return data in a query, and (b) the only ordering criteria you provide is by FirstName, which is far less precise than your grouping (Firstname, lastname, email).
As for the query itself, as written it assumes that the first item found in a given partition contains a valid phone number. Without specifying the order, you cannot know this will be true… and what if all items in a given grouping have invalid phone numbers? Below is my stab at pulling out the data you're looking for, in a hopefully useful format.
WITH CTE -- Sorry, I'm lazy and generally don't list the columns
AS
(
SELECT
Firstname
,lastname
,phone
,count(*) HowMany -- How many in group
,sum(case len(phone) when 10 then 1 else 0 end) BadLength -- How many "bad" in group
from data.com_raw
group by
Firstname
,lastname
,phone
having count(*) <> sum(case len(phone) when 10 then 1 else 0 end)
and count(*) > 1 -- Remove this to find singletons with invalid phone numbers
)
select
cr.CID
,cr.Firstname
,cr.lastname
,case len(cr.phone) when 10 then '' else 'Bad' end) IsBad
,cr.phone
,cr.email
from data.com_raw cr
inner join CTE
on CTE.Firstname = cr.Firstname
and CTE.lastname = cr.lastname
and CTE.phone = cr.phone
order by
cr.CID
,cr.Firstname
,cr.lastname
,case len(cr.phone) when 10 then '' else 'Bad' end)
,cr.phone
(Yes, if there are no indexes to support this, you will end up with a table scan.)
SELECT Firstname, lastname,email, COUNT(*)
FROM [data.com_raw]
GROUP BY Firstname, lastname,email HAVING COUNT(*)>1
WHERE LEN(PHONE)<= 10

Display if picture was uploaded or not for a particular id using group by

Schema:
tablename : Picture_profile
personID, pictureNumber(count which increase upon adding a picture for an id there can be many pictures),pictureName, datePictureAdded
tablename : person
personID, phone, address, zip, city, email,country and many other columns
Query:
select personID, email,country
case when count(pictureNumber)>=1 then 'yes' else 'no' PictureIsUploaded
from person_profile
left join Picture_profile on Picture_profile.personID = person_profile.personID
group by personID, email,country
above query works fine and displays what ever I want but the question is
If I need to display all the columns. Am I supposed to do "group by"
for all the columns in select.
Is there a way where I can eliminate this group by?
This only works on fairly recent versions of Sql Server, but I'd use the APPLY operator:
SELECT p.*,
COALESCE(a.HasPicture, 'N') PictureIsUploaded
FROM person_profile p
OUTER APPLY
(
SELECT TOP 1 'Y' HasPicture
FROM Picture_profile pic
WHERE pic.personID = p.personID
) a
You could also do this as a correlated derived table (sub query):
SELECT p.*,
COALESCE(
(SELECT TOP 1 'Y'
FROM Picture_profile pic
WHERE pic.personID = p.personID)
, 'N') PictureIsUploaded
FROM person_profile p
The advantage of the APPLY operator over the correlated subquery is when you want more than one value from the chosen Picture_Profile record. This would allow you to include, for example, both the picturename and picturenumber fields in the results, but only incur the subquery cost once.

Group by grouping sets wtihout including all columns selected

I have 4 columns: AccountNumber, Company, Batch, and Amount. I want to sum(amount) by Company and Batch but not by AccountNumber as AccountNumber is always unique and this wouldn't really accomplish anything. Is there a way using grouping sets to sum(amount) by Company and Batch but not by AccountNumber while still displaying AccountNumbers in the results set?
Thank you
Yes, this kind of problem requires a sub-query and a join. It is easy to understand if we think of the problem as two steps:
This will sum by company and batch:
select company, batch, sum(amount) as sum_amount
from atable
group by company, batch
Now just join that to your account numbers
select atable.accountnumber, subq.company, subq.batch, subq.sum_amount
from (
select company, batch, sum(amount) as sum_amount
from atable
group by company, batch
) subq
join atable on subq.company = atable.company and subq.batch = atable.batch
select b.Company, b.Batch, b.AccountNumber, s.Total
from baseTable b
inner join
(
select Company, Batch, sum(Amount) Total
from basetable
group by Company, Batch
) s on s.Company = b.Company and s.Batch = b.Batch
This will do it, but does give the total on each line. Not really any way to prevent that, though.

How to put a subquery within an aggregate subquery

I am using SQL-Server 2000 and am trying to find duplicates with certain conditions. Someone here helped me earlier with the duplicate part which was great, however, I can't figure out how to filter the duplicated cases further.
I need to move the "where" statement to the subquery so that I only get contractor duplicated names as opposed to all duplicated names which is what's happening with this code (the code is first finding all duplicates and then filtering out the contractors and I'd like it to do the opposite). The problem is that I'm mixing it into an aggregate statement and it's giving me an error. I tried to put in another subquery within the subquery but it still gave me an error.
Any help is appreciated. Here's a simpler (I'm learning) version of the code:
SELECT DISTINCT(c1.contactid) as 'ContactID', c1.lastname as 'Last Name', c1.firstname as 'First Name'
FROM contacts c1 INNER JOIN (SELECT lastname, firstname FROM contacts group by lastname, firstname
HAVING count(*)>1)
dups on c1.lastname=dups.lastname and c1.firstname=dups.firstname
WHERE (c1.contractor=1)
For the example code you have given, placing the where clause between the "from contacts" and "group by lastname, firstname" should do they trick.
i think this is what you want:
SELECT
contact.ContactId,
contact.FirstName,
contact.LastName
FROM contacts as contact
INNER JOIN
(
SELECT ContactID FROM contacts GROUP BY FirstName, LastName HAVING Count(*) > 1
) AS Dups
ON Dups.ContactId = contact.ContactId
WHERE
contact.isContractor = 1
you are very close. this will provide you a list of each individual record that has a matching record with the same first name and last name. Hope this helps
Select C.contactid As ContactID
, C.LastName As [Last Name]
, C.FirstName As [First Name]
From contacts C
Inner Join (
Select C1.LastName, C1.FirstName
From contacts As C1
Where C1.contractor = 1
Group By C1.LastName, C1.FirstName
Having Count(*) > 1
) As dups
On C.LastName = dups.LastName
And C.FirstName = dups.FirstName
Where C.contractor = 1
You are going to need the filter on contractor = 1 in both the subquery and the outer query. Otherwise, you might return people that happen to have the same name as a contractor and are duplicated.
Also, you do not need the Distinct keyword if ContactId is the primary key of the Contacts table.
Move the where clause to the subquery. It fill first filter out all non-contractors, and then group the remainder.
SELECT DISTINCT
c1.ContactId
,c1.lastname as 'Last Name'
,c1.firstname as 'First Name'
from contacts c1
inner join (select lastname, firstname
from contacts
where c1.contractor = 1
group by lastname, firstname
having count(*) > 1) dups
on c1.lastname = dups.lastname
and c1.firstname = dups.firstname
Oh, and unless there's a really good reason, you really don't want to include embedded spaces in your column aliases ("Last Name", "First Name").

Resources