Multiple Column Pivot in T-SQL - sql-server

I am working with a table where there are multiple rows that I need pivoted into columns. So the pivot is the perfect solution for this, and works well when all I need is one field. I am needing to return several fields based upon the pivot. Here is the pseudo code with specifics stripped out:
SELECT
field1,
[1], [2], [3], [4]
FROM
(
SELECT
field1,
field2,
(ROW_NUMBER() OVER(PARTITION BY field1 ORDER BY field2)) RowID
FROM tblname
) AS SourceTable
PIVOT
(
MAX(field2)
FOR RowID IN ([1], [2], [3], [4])
) AS PivotTable;
The above syntax works brilliantly, but what do I do when I need to get additional information found in field3, field4....?

Rewrite using MAX(CASE...) and GROUP BY:
select
field1
, [1] = max(case when RowID = 1 then field2 end)
, [2] = max(case when RowID = 2 then field2 end)
, [3] = max(case when RowID = 3 then field2 end)
, [4] = max(case when RowID = 4 then field2 end)
from (
select
field1
, field2
, RowID = row_number() over (partition by field1 order by field2)
from tblname
) SourceTable
group by
field1
From there you can add in field3, field4, etc.

The trick to doing multiple pivots over a row_number is to modify that row number sequence to store both the sequence and the field number. Here's an example that does what you want with multiple PIVOT statements.
-- populate some test data
if object_id('tempdb..#tmp') is not null drop table #tmp
create table #tmp (
ID int identity(1,1) not null,
MainField varchar(100),
ThatField int,
ThatOtherField datetime
)
insert into #tmp (MainField, ThatField, ThatOtherField)
select 'A', 10, '1/1/2000' union all
select 'A', 20, '2/1/2000' union all
select 'A', 30, '3/1/2000' union all
select 'B', 10, '1/1/2001' union all
select 'B', 20, '2/1/2001' union all
select 'B', 30, '3/1/2001' union all
select 'B', 40, '4/1/2001' union all
select 'C', 10, '1/1/2002' union all
select 'D', 10, '1/1/2000' union all
select 'D', 20, '2/1/2000' --union all
-- pivot over multiple columns using the 1.1, 1.2, 2.1, 2.2 sequence trick
select
MainField,
max([1.1]) as ThatField1,
max([1.2]) as ThatOtherField1,
max([2.1]) as ThatField2,
max([2.2]) as ThatOtherField2,
max([3.1]) as ThatField3,
max([3.2]) as ThatOtherField3,
max([4.1]) as ThatField4,
max([4.2]) as ThatOtherField4
from
(
select x.*,
cast(row_number() over (partition by MainField order by ThatField) as varchar(2)) + '.1' as ThatFieldSequence,
cast(row_number() over (partition by MainField order by ThatField) as varchar(2)) + '.2' as ThatOtherFieldSequence
from #tmp x
) a
pivot (
max(ThatField) for ThatFieldSequence in ([1.1], [2.1], [3.1], [4.1])
) p1
pivot (
max(ThatOtherField) for ThatOtherFieldSequence in ([1.2], [2.2], [3.2], [4.2])
) p2
group by
MainField

I am unsure if you are using MS SQL Server, but if you are... You may want to take a look at the CROSS APPLY functionality of the engine. Basically it will allow you to apply the results of a table-valued UDF to a result set. This would require you to put your pivot query into a table-valued result set.
http://weblogs.sqlteam.com/jeffs/archive/2007/10/18/sql-server-cross-apply.aspx

wrap your sql statement with something like:
select a.segment, sum(field2), sum(field3)
from (original select with case arguments) a
group by a.segment
It should collapse your results into one row, grouped on field1.

It is possible to pivot on multiple columns, but you need to be careful about reusing the pivot column across multiple pivots. Here is a good blog post on the subject:
http://pratchev.blogspot.com/2009/01/pivoting-on-multiple-columns.html

Related

Snowflake : IN operator

so I want something as below in my query
select * from table a
where a.id in(select id, max(date) from table a group by id)
I am getting error here , as IN is equivalent to = .
how to do it?
example :
id
date
1
2022-31-01
1
2022-21-03
2
2022-01-01
2
2022-02-01
I need to get only one record based on date(max). The table has more columns than just id and date
so I need to something like this in snowflake
select * from table a
where id in(select id,max(date) from table a group by id)
```-----------------------
All solutions are working , if i select from table .
but i have case statement in view where duplicate records are coming
example :
create or replace view v_test
as
select * from
(
select id,lastdatetime,*,
case when start_date < timestamp and timestamp < end
and move_date = '9999-12-31' then 'Y'
else 'N' end as IND
from table a
) a
so if any one select view where IND= 'Y', more than 1 records are coming
what i want is to select latest records for ID where IND='Y' and max(lastdatetime)
how to incorporate this logic in view?
I think you are trying to get the latest record for each id?
select *
from table a
qualify row_number() over (partition by id order by date desc) = 1
So if we look at your sub-select:
using this "data" for the examples:
with data (id, _date) as (
select column1, to_date(column2, 'yyyy-dd-mm') from values
(1, '2022-31-01'),
(1, '2022-21-03'),
(2, '2022-01-01'),
(2, '2022-02-01')
)
select id, max(_date)
from data
group by 1;
it gives:
ID
MAX(_DATE)
1
2022-03-21
2
2022-01-02
which makes it seem you want the "the last date, per id"
which can classically (ansi sql) be written:
select d.*
from data as d
join (
select
id,
max(_date) as max_date
from data
group by 1
) as c
on d.id = c.id and d._date = c.max_date
;
ID
_DATE
1
2022-03-21
2
2022-01-02
which gives you "all the rows values". BUT if you have many rows with the same last date, you will get those, in the output.
Another methods is to use a ROW_NUMBER to pick one and only one row, which is the style of answer Mike has given:
with data (id, _date, extra) as (
select column1, to_date(column2, 'yyyy-dd-mm'), column3 from values
(1, '2022-31-01', 'extra_a'),
(1, '2022-21-03', 'extra_b_double_a'),
(1, '2022-21-03', 'extra_b_double_b'),
(2, '2022-01-01', 'extra_c'),
(2, '2022-02-01', 'extra_d')
)
select *
from data
qualify row_number() over (partition by id order by _date desc) =1 ;
gives:
ID
_DATE
EXTRA
1
2022-03-21
extra_b_double_a
2
2022-01-02
extra_d
now if you want the "all rows of the last day" you method works, albeit the QUALIFY/ROW_NUMBER is faster. You can use RANK
with data (id, _date, extra) as (
select column1, to_date(column2, 'yyyy-dd-mm'), column3 from values
(1, '2022-31-01', 'extra_a'),
(1, '2022-21-03', 'extra_b_double_a'),
(1, '2022-21-03', 'extra_b_double_b'),
(2, '2022-01-01', 'extra_c'),
(2, '2022-02-01', 'extra_d')
)
select *
from data
qualify dense_rank() over (partition by id order by _date desc) =1 ;
ID
_DATE
EXTRA
1
2022-03-21
extra_b_double_a
1
2022-03-21
extra_b_double_b
2
2022-01-02
extra_d
Now the last thing that it almost seems you are asking for, is "how do find the ID with the most recent data (here 1) and get all rows for that"
with data (id, _date, extra) as (
select column1, to_date(column2, 'yyyy-dd-mm'), column3 from values
(1, '2022-31-01', 'extra_a'),
(1, '2022-21-03', 'extra_b_double_a'),
(1, '2022-21-03', 'extra_b_double_b'),
(2, '2022-01-01', 'extra_c'),
(2, '2022-02-01', 'extra_d')
)
select *
from data
qualify id = last_value(id) over (order by _date);
Here is an example of how to use the in operator with a subquery:
select * from table1 t1 where t1.id in (select t2.id from table2 t2);
Usage of IN is possible to match on both columns:
select *
from tab AS a
where (a.id, a.date) in (select id, max(date) from tab group by id);
For sample data:
CREATE TABLE tab (id, date)
AS
SELECT column1, to_date(column2, 'yyyy-dd-mm')
FROM VALUES
(1, '2022-31-01'),
(1, '2022-21-03'),
(2, '2022-01-01'),
(2, '2022-02-01');
Output:

TSQL: Group by one column, count all rows and keep value on second column based on row_number

I have a query that returns an Id, a Name and the Row_Number() based on some rules.
The query looks like that
SELECT
tm.id AS Id,
pn.Name AS Name,
ROW_NUMBER() OVER(PARTITION BY tm.id ORDER BY tm.CreatedDate ASC) AS Row
FROM
#tempTable AS tm
LEFT JOIN
names pn WITH (NOLOCK) ON tm.nameId = pn.NameId
WHERE ....
The output of the above query looks like the table below with the dummy data
CREATE TABLE people
(
id int,
name varchar(55),
row int
);
INSERT INTO people
VALUES (1, 'John', 1), (1, 'John', 2), (2, 'Mary', 1),
(3, 'Jeff', 1), (4, 'Bill', 1), (4, 'Bill', 2),
(4, 'Bill', 3), (4, 'Billy', 4), (5, 'Bobby', 1),
(5, 'Bob', 2), (5, 'Bob' , 3), (5, 'Bob' , 4);
What I try to do, is group by the id field, count all rows, but for the name, use the one with row = 1
My attempt is like this, but, obviously, I get different rows since I include the x.name in the group by.
SELECT
x.id,
x.name,
COUNT(*) AS Value
FROM
(SELECT
tm.id AS Id,
pn.Name AS Name,
ROW_NUMBER() OVER(PARTITION BY tm.id ORDER BY tm.CreatedDate ASC) AS Row
FROM
#tempTable AS tm
LEFT JOIN
names pn WITH (NOLOCK) ON tm.nameId = pn.NameId
WHERE ....
) x
GROUP BY
x.id, x.name
ORDER BY
COUNT(*) DESC
The desired results from the dummy data are:
id name count
------------------
1 John 2
2 Mary 1
3 Jeff 1
4 Bill 4
5 Bobby 4
You can use FIRST_VALUE() window function to get the name of the row with row number = 1 and with the keyword DISTINCT there is no need to GROUP BY:
SELECT DISTINCT tm.id AS Id
, FIRST_VALUE(pn.Name) OVER (PARTITION BY tm.id ORDER BY tm.CreatedDate ASC) AS Name
, COUNT(*) OVER (PARTITION BY tm.id) AS counter
FROM #tempTable AS tm
LEFT JOIN names pn WITH (NOLOCK) ON tm.nameId = pn.NameId
WHERE ....
If you can't use FIRST_VALUE() then you can do it with conditional aggregation:
SELECT id,
MAX(CASE WHEN Row = 1 THEN Name END) AS NAME,
COUNT(*) AS Counter
FROM (
SELECT tm.id AS Id
, pn.Name AS Name
, ROW_NUMBER() OVER(PARTITION BY tm.id ORDER BY tm.CreatedDate ASC) AS Row
FROM #tempTable AS tm
LEFT JOIN names pn WITH (NOLOCK) ON tm.nameId = pn.NameId
WHERE ....
) t
GROUP BY id
This could be one solution to your problem: group on both id and the target name (case when p.row = 1 then p.name end) for the counting. Adding a with rollup to the grouping will "roll up" the count aggregations. Another aggregation on just id can then be use to merge the row values from the intermediate data set (visible in fiddle).
with cte as
(
select p.id,
case when p.row = 1 then p.name end as name,
count(1) as cnt
from people p
group by p.id, case when p.row = 1 then p.name end with rollup
having grouping(p.id) = 0
)
select cte.id,
max(cte.name) as name,
max(cte.cnt) as [count]
from cte
group by cte.id;
Fiddle
This would be another solution: do a regular count query with grouping on id and fetch the required name afterwards with a cross apply.
with cte as
(
select p.id,
count(1) as cnt
from people p
group by p.id
)
select cte.id,
n.name,
cte.cnt as [count]
from cte
cross apply ( select p.name
from people p
where p.id = cte.id
and p.row = 1 ) n;
Fiddle

Query to Find Max row based on N fields

Given the following data:
ID Name, Value, TimeStamp
1, 'A', 7.00, 21/12/2017
2, 'A', 5.00, 21/12/2017
3, 'A', 6.00, 20/12/2017
4, 'B', 1.00, 21/12/2017
Result I want is:
Name, Value, TimeStamp
'A', 5.00, 21/12/2017
'B', 1.00, 21/12/2017
I.e. group by Name and take value with latest TimeStamp, if 2 or more have the same TimeStamp take the one with latest ID
I did seem to find an answer that was similar to another post:
SELECT ID, Name, Value, TimeStamp
FROM MyTable
JOIN ( SELECT Name, MAX(TimeStamp) As TimeStamp
FROM MyTable
GROUP BY Name ) m
ON MyTable.Name = m.Name and MyTable.TimeStamp = a.TimeStamp
This gives me the max timestamp so to get the id, I can repeat the process, i.e. I can use:
WITH CTE AS (
...
)
SELECT Name, Value, TimeStamp
FROM CTE
JOIN ( SELECT Name, MAX(ID)
FROM CTE
GROUP BY Name ) a
ON CTE.Name = a.Name AND CTE.ID = a.ID
However, what happens if I now want to scale it up to 3 fields. Is there an easier way to do this, without experimenting I was thinking recursive CTE. Trying to avoid dynamic sql.
I think you may want to use the ROW_NUMBER function for this. Below is an example.
SQL Example
;WITH
test_data
AS
(
SELECT tbl.* FROM (VALUES
( 1, 'A', 7.00, '21-Dec-2017')
, ( 2, 'A', 5.00, '21-Dec-2017')
, ( 3, 'A', 6.00, '20-Dec-2017')
, ( 4, 'B', 1.00, '21-Dec-2017')
) tbl ([ID], [Name], [Value], [TimeStamp])
)
,
test_data_order
AS
(
SELECT
[ID]
, [Name]
, [Value]
, [TimeStamp]
, EX_ROW_NUMBER = ROW_NUMBER() OVER
(
PARTITION BY
[Name]
ORDER BY
[TimeStamp] DESC
, [ID] DESC
)
FROM
test_data
)
SELECT
*
FROM
test_data_order
WHERE
EX_ROW_NUMBER = 1
db<>fiddle
Results
Try this
SELECT Name, Value, TimeStamp
FROM (
SELECT ID, Name, Value, TimeStamp
, ROW_NUMBER() OVER (PARTITION BY Name ORDER BY TimeStamp DESC, ID DESC) AS RowNum
) b
WHERE RowNum = 1

SQL Server or functionality in where condition

I have an SQL query where I want to get the rows with values "all" or "female" in [gender] column and value "A" in [group] column. If there are 2 rows with [group] = A and [gender] = all and the other [group] = A and [gender] = female I want to get only the row with [gender] = all. Now I use:
where group=A and (gender=all or gender=female)
But I get both rows
In the example table below I want to get only the row: A all
But if I use the where group=A and (gender=all or gender=female) query I will get both rows for group A
group gender
A all
A female
B all
C female
C all
You can use something like row_number() to prioritize the various subsets of records you're looking at and then select only one record from each. From the wording of your question I assume there is some other field in the table on which you're "grouping" records together—in other words, a field whose every distinct value should produce at most one record in the result set whose group and gender values match your criteria. In the following example I've assumed that this field is called Category; if you share the actual schema of your table then I can improve the example, but this should suffice to illustrate the idea.
declare #SampleData table
(
Category bigint,
[Group] char(1),
Gender varchar(16)
);
insert #SampleData values
(1, 'A', 'Female'), -- include
(2, 'B', 'Female'), -- exclude; wrong group
(3, 'A', 'Female'), -- exclude; right group and gender but superseded by (3, 'A', 'All')
(3, 'A', 'All'), -- include
(4, 'A', 'All'), -- include
(5, 'A', 'Male'); -- exclude; wrong gender
with PrioritizedData as
(
select
D.*,
[Priority] = row_number() over (partition by D.Category order by case D.Gender when 'All' then 0 else 1 end)
from
#SampleData D
where
D.[Group] = 'A' and
D.Gender in ('Female', 'All')
)
select * from PrioritizedData P where P.[Priority] = 1;
You can use the RANK() window function with results grouped by group and ordered by gender (this works because all is alphabetically before female or male. If your ordering gets more complex than that, you'll have to look at another way to order them.
/* TEST DATA */
; WITH a AS (
SELECT 'A' AS thegroup, 'all' AS gender UNION ALL
SELECT 'A' AS thegroup, 'all' AS gender UNION ALL
SELECT 'A' AS thegroup, 'female' AS gender UNION ALL
SELECT 'B' AS thegroup, 'all' AS gender UNION ALL
SELECT 'C' AS thegroup, 'female' AS gender UNION ALL
SELECT 'C' AS thegroup, 'all' AS gender UNION ALL
SELECT 'D' AS thegroup, 'female' AS gender
)
/* THE QUERY */
SELECT b.*
FROM (
SELECT thegroup, gender, RANK() OVER (PARTITION BY thegroup ORDER BY gender) AS rn /* Sets the ranked groups of 'thegroup' */
FROM a
) b
WHERE b.rn = 1 /* Gets first group. */
AND thegroup = 'A'
data script
declare #data table ([group] char(1), [gender] varchar(16));
insert into #data values ('A', 'all'), ('A', 'female') ,('B', 'all') ,('C', 'female') ,('C', 'all');
query
select
[group] = [d].[group]
,[gender] = [x].[gender]
from
#data as [d]
cross apply
(
select top 1 [gender] from #data where [group] = [d].[group] order by iif([gender] = 'all', 0, 1) asc
) as [x]
group by
[d].[group]
,[x].[gender];

Trying to pivot event dates in t-sql without using a cursor

I have the following table:
What I want is to get to this:
EventTypeId 1 and 3 are valid start events and EventTypeId of 2 is the only valid end event.
I have tried to do a pivot, but I don't believe a pivot will get me the multiple events for a person in the result set.
SELECT PersonId, [1],[3],[2]
FROM
(
SELECT PersonId, EventTypeId, EventDate
from #PersonEvent
) as SourceTable
PIVOT
(
count(EventDate) FOR EventTypeId
IN ([1],[3],[2])
) as PivotTable
Select PersonID,
Min(Case WHEN EventTypeId IN (1,3) THEN EventDate END) as StartDate,
Min(Case WHEN EventTypeId IN (2) THEN EventDate END) as EndDate
FROM #PersonEvent
group by personid
I can do a cursor, but my original table is over 90,000 rows, and this is to be for a report, so I don't think I can use that option. Any other thoughts that I might be missing?
Assuming the table is called [dbo].[PersonEventRecords] this will work...
With StartEvents As
(
Select *
From [dbo].[PersonEventRecords]
Where EventTypeId In (1,3)
), EndEvents As
(
Select *
From [dbo].[PersonEventRecords]
Where EventTypeId In (2)
)
Select IsNull(se.PersonId,ee.PersonId) As PersonId,
se.EventTypeId As StartEventTypeId,
se.EventDate As StartEventDate,
ee.EventTypeId As EndEventTypeId,
ee.EventDate As EndEventDate
From StartEvents se
Full Outer Join EndEvents ee
On se.PersonId = ee.PersonId
And se.EventSequence = ee.EventSequence - 1
Order By IsNull(se.PersonId,ee.PersonId),
IsNull(se.EventDate,ee.EventDate);
/**** TEST DATA ****/
If Object_ID('[dbo].[PersonEventRecords]') Is Not Null
Drop Table [dbo].[PersonEventRecords];
Create Table [dbo].[PersonEventRecords]
(
PersonId Int,
EventTypeId Int,
EventDate Date,
EventSequence Int
);
Insert [dbo].[PersonEventRecords]
Select 1,1,'2012-10-13',1
Union All
Select 1,2,'2012-10-20',2
Union All
Select 1,1,'2012-11-01',3
Union All
Select 1,2,'2012-11-13',4
Union All
Select 2,1,'2012-05-07',1
Union All
Select 2,2,'2012-06-01',2
Union All
Select 2,3,'2012-07-01',3
Union All
Select 2,2,'2012-08-30',4
Union All
Select 3,2,'2012-04-05',1
Union All
Select 3,1,'2012-05-04',2
Union All
Select 3,2,'2012-05-24',3
Union All
Select 4,1,'2013-01-03',1
Union All
Select 4,1,'2013-02-20',2
Union All
Select 4,2,'2013-03-20',3;
Try this
SELECT E1.PersonId, E1.EventTypeId, E1.EventDate, E2.EventTypeId, E2.EventDate
FROM PersonEvent AS E1
OUTER APPLY(
SELECT TOP 1 PersonEvent.EventTypeId, PersonEvent.EventDate
FROM PersonEvent
WHERE PersonEvent.PersonId = E1.PersonId
AND PersonEvent.EventSequence = E1.EventSequence + 1
AND PersonEvent.EventTypeId = 2
) AS E2
WHERE E1.EventTypeId = 1 OR E1.EventTypeId = 3
UNION
SELECT E3.PersonId, NULL, NULL, E3.EventTypeId, E3.EventDate
FROM PersonEvent E3
WHERE E3.EventTypeId = 2
AND NOT EXISTS(
SELECT *
FROM PersonEvent
WHERE PersonEvent.PersonId = E3.PersonId
AND PersonEvent.EventSequence = E3.EventSequence - 1)
It is not completely clear how do you want the result to be ordered – add order as needed.

Resources