MS SQL SERVER pivot table aggregation function

MS SQL SERVER pivot table aggregation function - sql-server

I have a question about the application of the aggregation function that used in pivot function.
The table OCCUPATIONS looks like this:
+-----------+------------+
| Name | Occupation |
+-----------+------------+
| Ashley | Professor |
| Samantha | Actor |
| Julia | Doctor |
| Britney | Professor |
| Maria | Professor |
| Meera | Professor |
| Priya | Doctor |
| Priyanka | Professor |
| Jennifer | Actor |
| Ketty | Actor |
| Belvet | Professor |
| Naomi | Professor |
| Jane | Singer |
| Jenny | Singer |
| Kristeen | Singer |
| Christeen | Singer |
| Eve | Actor |
| Aamina | Doctor |
+-----------+------------+
The first column is name and second is occupation.
Now I want to make a pivot table that each column is one kind of occupation and name is sorted alphabetically and print NULL when no more names for an occupation.
The output should looks like this:
+--------+-----------+-----------+----------+
| Doctor | Professor | Singer | Actor |
+--------+-----------+-----------+----------+
| Aamina | Ashley | Christeen | Eve |
| Julia | Belvet | Jane | Jennifer |
| Priya | Britney | Jenny | Ketty |
| NULL | Maria | Kristeen | Samantha |
| NULL | Meera | NULL | NULL |
| NULL | Naomi | NULL | NULL |
| NULL | Priyanka | NULL | NULL |
+--------+-----------+-----------+----------+
Here the first column is Doctor, second is Professor, third is Singer and fourth is Actor. The code to generate result is
select [Doctor],[Professor],[Singer],[Actor] from (select o.Name,
o.Occupation, row_number() over(partition by o.Occupation order by
o.Name) id from OCCUPATIONS o) as src
pivot
(max(src.Name)
for src.Occupation in ([Doctor],[Professor],[Singer],[Actor])
) as m
But when I replace the table generated from here:
(select o.Name, o.Occupation, row_number() over(partition by o.Occupation order by o.Name) id from OCCUPATIONS o) as src' to 'OCCUPATIONS'
the result is like this:
Priya Priyanka Kristeen Samantha
I understand why this happens, because we take a MAX() in each group. However, in the previous result, I also use a MAX() function to generate NULL when there's no more names coming, it doesn't return a max value as my expected, instead it return every name.
My question is why this happens?
Thank you!

Here could be the source of issue:
row_number() over(partition by o.Occupation order by
o.Name) id from OCCUPATIONS o
The Row_Number here you are using is PARTITION BY o.Occupation, so in your PIVOT, it will pivot the records by the occupation group, which means the id is repeating. If you get rid of the PARTITION BY and just keep the Order by part, it should work.

Try this approach:
find the occupations with more people associated
generate table with a sequence of numbers from 1 to the number of people calculated in the previous point
join the table generated in point 2. four times with the original table each time filtering on a different Occupation
This is the query:
declare #tmp table([Name] varchar(50),[Occupation] varchar(50))
insert into #tmp values
('Ashley','Professor') ,('Samantha','Actor') ,('Julia','Doctor') ,('Britney','Professor') ,('Maria','Professor') ,('Meera','Professor') ,('Priya','Doctor') ,('Priyanka','Professor') ,('Jennifer','Actor') ,('Ketty','Actor') ,('Belvet','Professor') ,('Naomi','Professor') ,('Jane','Singer') ,('Jenny','Singer') ,('Kristeen','Singer') ,('Christeen','Singer') ,('Eve','Actor') ,('Aamina','Doctor')
--this variable contains the occuation that has more Names (rows) in the table
--it will be the number of total rows in output table
declare #Occupation_with_max_rows varchar(50)
--populate #Occupation_with_max_rows variable
select top 1 #Occupation_with_max_rows=Occupation
from #tmp
group by Occupation
order by count(*) desc
--generate final results joining 4 times the original table with the sequence table
select D.Name as Doctor,P.Name as Professor,S.Name as Singer,A.Name as Actor
from
(select ROW_NUMBER() OVER (ORDER BY [Name]) as ord from #tmp where Occupation = #Occupation_with_max_rows) O
left join
(select ROW_NUMBER() OVER (ORDER BY [Name]) as ord, [Name] from #tmp where Occupation='Doctor') D on O.ord = D.ord
left join
(select ROW_NUMBER() OVER (ORDER BY [Name]) as ord, [Name] from #tmp where Occupation='Professor') P on O.ord = P.ord
left join
(select ROW_NUMBER() OVER (ORDER BY [Name]) as ord, [Name] from #tmp where Occupation='Singer') S on O.ord = S.ord
left join
(select ROW_NUMBER() OVER (ORDER BY [Name]) as ord, [Name] from #tmp where Occupation='Actor') A on O.ord = A.ord
Results:

Please find below code which works as expected :
select [Doctor],[Professor],[Singer],[Actor]
from
(
select row_number() over (partition by occupation order by name)[A],name,occupation
from occupations
)src
pivot
(
max(Name)
for occupation in ([Doctor],[Professor],[Singer],[Actor])
)piv;

Related

How to use last_value with group by with count in SQL Server?

I have table like:
name | timeStamp | previousValue | newValue
--------+---------------+-------------------+------------
Mark | 13.12.2020 | 123 | 155
Mark | 12.12.2020 | 123 | 12
Tom | 14.12.2020 | 123 | 534
Mark | 12.12.2020 | 123 | 31
Tom | 11.12.2020 | 123 | 84
Mark | 19.12.2020 | 123 | 33
Mark | 17.12.2020 | 123 | 96
John | 22.12.2020 | 123 | 69
John | 19.12.2020 | 123 | 33
I'd like to mix last_value, count (*) and group to get this result:
name | count | lastValue
--------+-----------+-------------
Mark | 5 | 33
Tom | 2 | 534
John | 2 | 69
This part:
select name, count(*)
from table
group by name
returns table:
name | count
--------+---------
Mark | 5
Tom | 2
John | 2
but I have to add the last value for each name.
How to do it?
Best regards!

LAST_VALUE is a windowed function, so you'll need to get that value first, and then aggregate:
WITH CTE AS(
SELECT [name],
[timeStamp], --This is a poor choice for a column's name. timestamp is a (deprecated) synonym of rowversion, and a rowversion is not a date and time value
previousValue,
newValue,
LAST_VALUE(newValue) OVER (PARTITION BY [name] ORDER BY [timeStamp] ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS lastValue
FROM dbo.YourTable)
SELECT [Name],
COUNT(*) AS [count],
lastValue
FROM CTE
GROUP BY [Name],
lastValue;

I got a solution that works, but here's another one:
SELECT
[name], COUNT([name]), [lastValue]
FROM (
SELECT
[name], FIRST_VALUE([newValue]) OVER (PARTITION BY [name] ORDER BY TimeStamp DESC ROWS UNBOUNDED PRECEDING) AS [lastValue]
FROM [table]
) xyz GROUP BY [name], [lastValue]
Keep well!

SQL Server: Creating transposed table and joining with existing table

I have a set of data from table [MSPWIP].[MSPWIP].[Event] that looks like this:
| Createdby | StationName | SerialNumber |
-------------------------------------------------------
| Jay | L1.A1 | 22191321572 |
| Allan | L1.A2 | 22191321572 |
| Nathan | L2.A1 | 22191321579 |
| Jane | L2.A2 | 22191321579 |
And I have other sets of data that I have already joined in another query which is not relevant to the problem
I want to create a table separating the operator (denoted by createdby) by stations where L1.A1 means Line 1 Station 1 for example. For me at the moment, Line is not relevant
My ideal data after I restructure it should look like this
| SerialNumber | Operator1 | Operator2 |
----------------------------------------
| 22191321572 | Jay | Allan |
| 22191321579 | Nathan | Jane |
I tried using this code to Join both tables:
Query#1
Declare #Operator1 Table(
SerialNumber Varchar(255),
Operator1 Varchar(255)
)
Insert Into #Operator1 (Serialnumber, Operator1)
Select
SerialNumber,
Createdby as Operator1
From [MSPWIP].[MSPWIP].[Event]
where StationName like '%01'
Declare #Operator2 Table(
SerialNumber Varchar(255),
Operator2 Varchar(255)
)
Insert Into #Operator2 (Serialnumber, Operator2)
Select
SerialNumber,
CreatedBy as Operator2
From [MSPWIP].[MSPWIP].[Event]
where StationName like '%02'
select
a.SerialNumber,
CreatedBy,
b.Operator2
From #Operator1 a
join #Operator2 b
On a.SerialNumber = b.SerialNumber
Where a.SerialNumber In ('22191321572', '22191321574')
Then I would like to join it with that other query using the code below:
Query#2
join #Operator1 i
on a.SerialNumber = i.SerialNumber
join #Operator2 j
on a.SerialNumber = j.SerialNumber
Note that a is a different table.
However with Query#1 it only managed to show the headings and not the data, and this also caused Query#2 to also display heading and nothing else.
Just wondering if there was something wrong with Query#1 where the data failed to be inserted into the columns?
============================================
Update:
Using the answer below (with Modifications) I came up with a code like this
Query#3
SELECT Distinct*
FROM (
SELECT distinct
SerialNumber,
Case When t.StationName like '%A1' then CreatedBy End Operator1,
Case When t.StationName like '%A2' then CreatedBy End Operator2
--, Max(CASE WHEN CAST(RIGHT(t.StationName, 1) AS Varchar(255)) = 1 THEN t.CreatedBy END) Operator1
--, Max(CASE WHEN CAST(RIGHT(t.StationName, 1) AS Varchar(255)) = 2 THEN t.CreatedBy END) Operator2
FROM [MSPWIP].[MSPWIP].[Event] t
where t.CreatedDate > '2019-05-30'
Group BY SerialNumber, StationName, Createdby
) d
However my results now became staggered like so:
| SerialNumber | Operator1 | Operator2 |
----------------------------------------
| 22191321572 | Jay | NULL |
| 22191321572 | NULL | Allan |
| 22191321579 | Nathan | NULL |
| 22191321579 | NULL | Jane |
Did i do something wrong here?

You can save your time by doing it in one run like this :
SELECT *
FROM (
SELECT
SerialNumber
, MAX(CASE WHEN RIGHT(t.StationName, 2) = '01' THEN t.Operator END) Operator1
, MAX(CASE WHEN RIGHT(t.StationName, 2) = '02' THEN t.Operator END) Operator2
FROM [MSPWIP].[MSPWIP].[Event] t
GROUP BY SerialNumber
) d
then you just join it with the required tables.
P.S : If your station part in the StationName is not always a number, then you can use SUBSTRING(t.StationName, CHARINDEX('.', t.StationName) + 1, LEN(t.StationName)) instead of RIGHT(t.StationName, 2) to get the station part (which is after the dot).

SQL query to get value having multiple in the same table in SQL Server

Let's say I have a table with many columns like col1, col2, col3, id, variantId, col4, col5 etc
However I am only interested in id, variantId which look like this:
+----------+-----------+
| id | variantId |
+----------+-----------+
| a | 11 |
| a | 12 |
| b | 31 |
| c | 41 |
| c | 54 |
| d | abc |
| e | xyz |
| e | xyz |
+----------+-----------+
I need distinct ids which having count of distinct variantId more than once
In this case I would only get a and c

You can use group by and having:
select id
from t
group by id
having min(variant_id) <> max(variant_id);
You can also use:
having count(distinct variant_id) > 1

Try with group by having clause
select id
from table
group by id
having count(distinct variant_id) > 1

You can do it more efficiently with EXISTS:
select distinct t.id
from tablename t
where exists (
select 1 from tablename
where id = t.id and variantid <> t.variantid
)

SUM On Column With Group By SQL

I have following data:
+----------------+--------------+-----+
| StgDescription | ID | Amt |
+----------------+--------------+-----+
| A | OA17 | 11 |
| A | OA17 | 11 |
| A | OA17 | 11 |
| A | OA17 | 11 |
| B | ZA47/ A | 12 |
| B | ZA47/ A | 12 |
| B | ZA47/ B | 10 |
| B | ZA47/ B | 10 |
| B | ZA48/ A | 14 |
| B | ZA48/ F | 10 |
| B | ZA48 /G | 13 |
| B | ZA48 /H | 10 |
| B | ZA48/ I | 15 |
| B | ZA48/ J | 10 |
| B | ZA48/ K | 16 |
| B | ZA48/ L | 10 |
| c | FA01LM100340 | 10 |
| c | PA53 AE | 10 |
+----------------+--------------+-----+
I want to generate report in following format. The amount should be sum for ID for same StgDescription.
+----------------+-----+
| StgDescription | Amt |
+----------------+-----+
| a | 11 |
| b | 120 |
| c | 20 |
+----------------+-----+
I've written following query to get this result:
WITH CTE AS(
SELECT
distinct
s.StgDescription
,p.ID
,Amt
FROM [DinDb].[dbo].[tblTvlTransaction] t
JOIN tblstgmaster s on t.StgId=s.StgId
JOIN tblProjDocSt p on t.TDocID=p.DocId
JOIN [PdasDb].[dbo].[tblIDmaster] f ON p.ID=f.ID
where OptAuthoDateTime between '2015-07-27 00:00:00' and '2015-09-01 00:00:00')
select StgDescription,sum(AMT) from cte group by StgDescription
Is there any other efficient alternative to do this?

First in cte remove duplicates, then GROUP BY like:
WITH cte AS (
SELECT DISTINCT StgDescription, ID, Amt
FROM your_tab
)
SELECT
StgDescription,
Amt = SUM(Amt)
FROM cte
GROUP BY StgDescription;
OR:
WITH cte AS (
SELECT StgDescription, ID, Amt
FROM your_tab
GROUP BY StgDescription, ID, Amt
)
SELECT
StgDescription,
Amt = SUM(Amt)
FROM cte
GROUP BY StgDescription;

I hope that you get the data from a query, not from a table. It would not be good to store data thus redundantly. And it would not be gould to name a column ID which is not the unique identifier for a row in a table.
Your problem with the data is that you have duplicates, which prevents you from getting the sum directly. So use DISTINCT to make your data unique first.
If this data is from a query then simply add DISTINCT after the SELECT keyword. If not, use a derived table (i.e. a subquery) where you select distinct records from the table.
select stgdescription, sum(amt)
from
(
select distinct stgdescription, id, amt
from mydata
) distinct_data
group by stgdescription;
You may want to replace stgdescription with lower(stgdescription), though, if stgdescription can be 'A' or 'a' and you want to treat them the same.

I'd keep it as simple as possible, like this:
select StgDescription, sum(Amt) from
(
select distinct StgDescription, ID, Amt from tablename
) a
group by StgDescription
Hope it helps!

I suspect your duplicates are coming from [tblTvlTransaction], therefore, I would remove this table as a JOIN and use EXISTS to just check a record is there. So essentially the only tables in the FROM clause are those you actually need data from:
SELECT s.StgDescription, p.ID, s.Amt
FROM tblstgmaster AS s
INNER JOIN tblProjDocSt p on
t.TDocID = p.DocId
INNER JOIN [PdasDb].[dbo].[tblIDmaster] AS f
ON p.ID = f.ID
WHERE EXISTS
( SELECT 1
FROM [DinDb].[dbo].[tblTvlTransaction] AS t
WHERE t.OptAuthoDateTime BETWEEN '2015-07-27 00:00:00' AND '2015-09-01 00:00:00'
AND t.StgId = s.StgId
);
The advantage of EXISTS is that it can use a semi-join, which essentially means rather than pulling back all the rows from the transaction table, it will stop the seek/scan as soon as it finds one matching record. This should leave you without duplicates so you can do the SUM directly:
SELECT s.StgDescription, Amount = SUM(s.Amt)
FROM tblstgmaster AS s
INNER JOIN tblProjDocSt p on
t.TDocID = p.DocId
INNER JOIN [PdasDb].[dbo].[tblIDmaster] AS f
ON p.ID = f.ID
WHERE EXISTS
( SELECT 1
FROM [DinDb].[dbo].[tblTvlTransaction] AS t
WHERE t.OptAuthoDateTime BETWEEN '2015-07-27 00:00:00' AND '2015-09-01 00:00:00'
AND t.StgId = s.StgId
)
GROUP BY s.StgDescription;

select unique rows based on single distinct column [duplicate]

This question already has answers here:
Get top 1 row of each group
(19 answers)
Closed 7 months ago.
I want to select rows that have a distinct email, see the example table below:
+----+---------+-------------------+-------------+
| id | title | email | commentname |
+----+---------+-------------------+-------------+
| 3 | test | rob#hotmail.com | rob |
| 4 | i agree | rob#hotmail.com | rob |
| 5 | its ok | rob#hotmail.com | rob |
| 6 | hey | rob#hotmail.com | rob |
| 7 | nice! | simon#hotmail.com | simon |
| 8 | yeah | john#hotmail.com | john |
+----+---------+-------------------+-------------+
The desired result would be:
+----+-------+-------------------+-------------+
| id | title | email | commentname |
+----+-------+-------------------+-------------+
| 3 | test | rob#hotmail.com | rob |
| 7 | nice! | simon#hotmail.com | simon |
| 8 | yeah | john#hotmail.com | john |
+----+-------+-------------------+-------------+
Where I don't care which id column value is returned.
What would be the required SQL?

Quick one in TSQL
SELECT a.*
FROM emails a
INNER JOIN
(SELECT email,
MIN(id) as id
FROM emails
GROUP BY email
) AS b
ON a.email = b.email
AND a.id = b.id;

I'm assuming you mean that you don't care which row is used to obtain the title, id, and commentname values (you have "rob" for all of the rows, but I don't know if that is actually something that would be enforced or not in your data model). If so, then you can use windowing functions to return the first row for a given email address:
select
id,
title,
email,
commentname
from
(
select
*,
row_number() over (partition by email order by id) as RowNbr
from YourTable
) source
where RowNbr = 1

If you are using MySql 5.7 or later, according to these links (MySql Official, SO QA), we can select one record per group by with out the need of any aggregate functions.
So the query can be simplified to this.
select * from comments_table group by commentname;
Try out the query in action here

Since you don't care which id to return I stick with MAX id for each email to simplify SQL query, give it a try
;WITH ue(id)
AS
(
SELECT MAX(id)
FROM table
GROUP BY email
)
SELECT * FROM table t
INNER JOIN ue ON ue.id = t.id

SELECT * FROM emails GROUP BY email;

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

MS SQL SERVER pivot table aggregation function - sql-server

Please find below code which works as expected : select [Doctor],[Professor],[Singer],[Actor] from ( select row_number() over (partition by occupation order by name)[A],name,occupation from occupations )src pivot ( max(Name) for occupation in ([Doctor],[Professor],[Singer],[Actor]) )piv;

Related

How to use last_value with group by with count in SQL Server?

SQL Server: Creating transposed table and joining with existing table

SQL query to get value having multiple in the same table in SQL Server

SUM On Column With Group By SQL

select unique rows based on single distinct column [duplicate]

Categories

Resources