Query to Find Max row based on N fields - sql-server

Given the following data:
ID Name, Value, TimeStamp
1, 'A', 7.00, 21/12/2017
2, 'A', 5.00, 21/12/2017
3, 'A', 6.00, 20/12/2017
4, 'B', 1.00, 21/12/2017
Result I want is:
Name, Value, TimeStamp
'A', 5.00, 21/12/2017
'B', 1.00, 21/12/2017
I.e. group by Name and take value with latest TimeStamp, if 2 or more have the same TimeStamp take the one with latest ID
I did seem to find an answer that was similar to another post:
SELECT ID, Name, Value, TimeStamp
FROM MyTable
JOIN ( SELECT Name, MAX(TimeStamp) As TimeStamp
FROM MyTable
GROUP BY Name ) m
ON MyTable.Name = m.Name and MyTable.TimeStamp = a.TimeStamp
This gives me the max timestamp so to get the id, I can repeat the process, i.e. I can use:
WITH CTE AS (
...
)
SELECT Name, Value, TimeStamp
FROM CTE
JOIN ( SELECT Name, MAX(ID)
FROM CTE
GROUP BY Name ) a
ON CTE.Name = a.Name AND CTE.ID = a.ID
However, what happens if I now want to scale it up to 3 fields. Is there an easier way to do this, without experimenting I was thinking recursive CTE. Trying to avoid dynamic sql.

I think you may want to use the ROW_NUMBER function for this. Below is an example.
SQL Example
;WITH
test_data
AS
(
SELECT tbl.* FROM (VALUES
( 1, 'A', 7.00, '21-Dec-2017')
, ( 2, 'A', 5.00, '21-Dec-2017')
, ( 3, 'A', 6.00, '20-Dec-2017')
, ( 4, 'B', 1.00, '21-Dec-2017')
) tbl ([ID], [Name], [Value], [TimeStamp])
)
,
test_data_order
AS
(
SELECT
[ID]
, [Name]
, [Value]
, [TimeStamp]
, EX_ROW_NUMBER = ROW_NUMBER() OVER
(
PARTITION BY
[Name]
ORDER BY
[TimeStamp] DESC
, [ID] DESC
)
FROM
test_data
)
SELECT
*
FROM
test_data_order
WHERE
EX_ROW_NUMBER = 1
db<>fiddle
Results

Try this
SELECT Name, Value, TimeStamp
FROM (
SELECT ID, Name, Value, TimeStamp
, ROW_NUMBER() OVER (PARTITION BY Name ORDER BY TimeStamp DESC, ID DESC) AS RowNum
) b
WHERE RowNum = 1

Related

TSQL: Group by one column, count all rows and keep value on second column based on row_number

I have a query that returns an Id, a Name and the Row_Number() based on some rules.
The query looks like that
SELECT
tm.id AS Id,
pn.Name AS Name,
ROW_NUMBER() OVER(PARTITION BY tm.id ORDER BY tm.CreatedDate ASC) AS Row
FROM
#tempTable AS tm
LEFT JOIN
names pn WITH (NOLOCK) ON tm.nameId = pn.NameId
WHERE ....
The output of the above query looks like the table below with the dummy data
CREATE TABLE people
(
id int,
name varchar(55),
row int
);
INSERT INTO people
VALUES (1, 'John', 1), (1, 'John', 2), (2, 'Mary', 1),
(3, 'Jeff', 1), (4, 'Bill', 1), (4, 'Bill', 2),
(4, 'Bill', 3), (4, 'Billy', 4), (5, 'Bobby', 1),
(5, 'Bob', 2), (5, 'Bob' , 3), (5, 'Bob' , 4);
What I try to do, is group by the id field, count all rows, but for the name, use the one with row = 1
My attempt is like this, but, obviously, I get different rows since I include the x.name in the group by.
SELECT
x.id,
x.name,
COUNT(*) AS Value
FROM
(SELECT
tm.id AS Id,
pn.Name AS Name,
ROW_NUMBER() OVER(PARTITION BY tm.id ORDER BY tm.CreatedDate ASC) AS Row
FROM
#tempTable AS tm
LEFT JOIN
names pn WITH (NOLOCK) ON tm.nameId = pn.NameId
WHERE ....
) x
GROUP BY
x.id, x.name
ORDER BY
COUNT(*) DESC
The desired results from the dummy data are:
id name count
------------------
1 John 2
2 Mary 1
3 Jeff 1
4 Bill 4
5 Bobby 4
You can use FIRST_VALUE() window function to get the name of the row with row number = 1 and with the keyword DISTINCT there is no need to GROUP BY:
SELECT DISTINCT tm.id AS Id
, FIRST_VALUE(pn.Name) OVER (PARTITION BY tm.id ORDER BY tm.CreatedDate ASC) AS Name
, COUNT(*) OVER (PARTITION BY tm.id) AS counter
FROM #tempTable AS tm
LEFT JOIN names pn WITH (NOLOCK) ON tm.nameId = pn.NameId
WHERE ....
If you can't use FIRST_VALUE() then you can do it with conditional aggregation:
SELECT id,
MAX(CASE WHEN Row = 1 THEN Name END) AS NAME,
COUNT(*) AS Counter
FROM (
SELECT tm.id AS Id
, pn.Name AS Name
, ROW_NUMBER() OVER(PARTITION BY tm.id ORDER BY tm.CreatedDate ASC) AS Row
FROM #tempTable AS tm
LEFT JOIN names pn WITH (NOLOCK) ON tm.nameId = pn.NameId
WHERE ....
) t
GROUP BY id
This could be one solution to your problem: group on both id and the target name (case when p.row = 1 then p.name end) for the counting. Adding a with rollup to the grouping will "roll up" the count aggregations. Another aggregation on just id can then be use to merge the row values from the intermediate data set (visible in fiddle).
with cte as
(
select p.id,
case when p.row = 1 then p.name end as name,
count(1) as cnt
from people p
group by p.id, case when p.row = 1 then p.name end with rollup
having grouping(p.id) = 0
)
select cte.id,
max(cte.name) as name,
max(cte.cnt) as [count]
from cte
group by cte.id;
Fiddle
This would be another solution: do a regular count query with grouping on id and fetch the required name afterwards with a cross apply.
with cte as
(
select p.id,
count(1) as cnt
from people p
group by p.id
)
select cte.id,
n.name,
cte.cnt as [count]
from cte
cross apply ( select p.name
from people p
where p.id = cte.id
and p.row = 1 ) n;
Fiddle

Only show value of Max rows with partition by?

the title might be a bit off however i'm trying to remove the values of a row without removing the actual row.
This is my table:
SELECT ID,CustomerID,Weight FROM Orders
What am i trying to accomplish is this:
The MAX() value of ID Group By CustomerID that would give me null values in Weight where max and group by is not set
Is it possible to do this in one line? with a partiton by?
Something like:
SELECT MAX(ID) over (partition by CustomerID,Weight).... I know this is wrong but if possible to do without a join or CTE and only in one line in the select statement that would be great.
One possible approach is using ROW_NUMBER:
SELECT
ID,
CustomerID,
CASE
WHEN ROW_NUMBER() OVER (PARTITION BY CustomerId ORDER BY ID DESC) = 1 THEN [Weight]
ELSE Null
END AS [Weight]
FROM #Orders
ORDER BY ID
Input:
CREATE TABLE #Orders (
ID int,
CustomerID int,
[Weight] int
)
INSERT INTO #Orders
(ID, CustomerID, [Weight])
VALUES
(1, 11, 100),
(2, 11, 17),
(3, 11, 35),
(4, 22, 26),
(5, 22, 78),
(6, 22, 10030)
Output:
ID CustomerID Weight
1 11 NULL
2 11 NULL
3 11 35
4 22 NULL
5 22 NULL
6 22 10030
Try this
;WITH CTE
AS
(
SELECT
MAX_ID = MAX(ID) OVER(PARTITION BY CustomerId),
ID,
CustomerId,
Weight
FROM Orders
)
SELECT
ID,
CustomerId,
Weight = CASE WHEN ID = MAX_ID THEN Weight ELSE NULL END
FROM CTE
You can try this.
SELECT ID,CustomerId,CASE WHEN ID= MAX(ID) OVER(PARTITION BY CustomerId) THEN Weight ELSE NULL END AS Weight FROM Orders

SQL Server or functionality in where condition

I have an SQL query where I want to get the rows with values "all" or "female" in [gender] column and value "A" in [group] column. If there are 2 rows with [group] = A and [gender] = all and the other [group] = A and [gender] = female I want to get only the row with [gender] = all. Now I use:
where group=A and (gender=all or gender=female)
But I get both rows
In the example table below I want to get only the row: A all
But if I use the where group=A and (gender=all or gender=female) query I will get both rows for group A
group gender
A all
A female
B all
C female
C all
You can use something like row_number() to prioritize the various subsets of records you're looking at and then select only one record from each. From the wording of your question I assume there is some other field in the table on which you're "grouping" records together—in other words, a field whose every distinct value should produce at most one record in the result set whose group and gender values match your criteria. In the following example I've assumed that this field is called Category; if you share the actual schema of your table then I can improve the example, but this should suffice to illustrate the idea.
declare #SampleData table
(
Category bigint,
[Group] char(1),
Gender varchar(16)
);
insert #SampleData values
(1, 'A', 'Female'), -- include
(2, 'B', 'Female'), -- exclude; wrong group
(3, 'A', 'Female'), -- exclude; right group and gender but superseded by (3, 'A', 'All')
(3, 'A', 'All'), -- include
(4, 'A', 'All'), -- include
(5, 'A', 'Male'); -- exclude; wrong gender
with PrioritizedData as
(
select
D.*,
[Priority] = row_number() over (partition by D.Category order by case D.Gender when 'All' then 0 else 1 end)
from
#SampleData D
where
D.[Group] = 'A' and
D.Gender in ('Female', 'All')
)
select * from PrioritizedData P where P.[Priority] = 1;
You can use the RANK() window function with results grouped by group and ordered by gender (this works because all is alphabetically before female or male. If your ordering gets more complex than that, you'll have to look at another way to order them.
/* TEST DATA */
; WITH a AS (
SELECT 'A' AS thegroup, 'all' AS gender UNION ALL
SELECT 'A' AS thegroup, 'all' AS gender UNION ALL
SELECT 'A' AS thegroup, 'female' AS gender UNION ALL
SELECT 'B' AS thegroup, 'all' AS gender UNION ALL
SELECT 'C' AS thegroup, 'female' AS gender UNION ALL
SELECT 'C' AS thegroup, 'all' AS gender UNION ALL
SELECT 'D' AS thegroup, 'female' AS gender
)
/* THE QUERY */
SELECT b.*
FROM (
SELECT thegroup, gender, RANK() OVER (PARTITION BY thegroup ORDER BY gender) AS rn /* Sets the ranked groups of 'thegroup' */
FROM a
) b
WHERE b.rn = 1 /* Gets first group. */
AND thegroup = 'A'
data script
declare #data table ([group] char(1), [gender] varchar(16));
insert into #data values ('A', 'all'), ('A', 'female') ,('B', 'all') ,('C', 'female') ,('C', 'all');
query
select
[group] = [d].[group]
,[gender] = [x].[gender]
from
#data as [d]
cross apply
(
select top 1 [gender] from #data where [group] = [d].[group] order by iif([gender] = 'all', 0, 1) asc
) as [x]
group by
[d].[group]
,[x].[gender];

Query to get date rows older than a start date (not a simple WHERE)

I have a feeling this is quite simple, but I can't put my finger on the query. I'm trying to find all of the activities of an employee which corresponds to their start date in a specific location.
create table Locations (EmployeeID int, LocationID int, StartDate date);
create table Activities (EmployeeID int, ActivityID int, [Date] date);
insert into Locations values
(1, 10, '01-01-2010')
, (1, 11, '01-01-2012')
, (1, 11, '01-01-2013');
insert into Activities values
(1, 1, '02-01-2010')
, (1, 2, '04-01-2010')
, (1, 3, '06-06-2014');
Expected result:
EmployeeID LocationID StartDate EmployeeID ActivityID Date
1 10 '01-01-2010' 1 1 '02-01-2010'
1 10 '01-01-2010' 1 2 '04-01-2010'
1 11 '01-01-2013' 1 3 '06-06-2014'
So far, I have this, but it's not quite giving me the result I was hoping for. I somehow have to reference only the information from the most recent Location, which the la.StartDate <= a.Date does not filter out and includes information from older locations as well.
select *
from Locations la
inner join Activities a on la.EmployeeID = a.EmployeeID
and la.StartDate <= a.Date
Give this one a try:
with Locations as (
select
*
from (values
(1, 10, '01-01-2010')
, (1, 11, '01-01-2012')
, (1, 11, '01-01-2013')
) la (EmployeeID, LocationID, StartDate)
),
Activities as (
select
*
from (
values
(1, 1, '02-01-2010')
, (1, 2, '04-01-2010')
, (1, 3, '06-06-2014')
) a (EmployeeID, ActivityID, [Date])
)
select
la.*,
a.*
from Activities a
cross apply (
select
*
from (
select
la.*,
ROW_NUMBER() OVER (
PARTITION BY
EMPLOYEEID
ORDER BY
DATE DESC
) seqnum
from Locations la
where
la.EmployeeID = a.EmployeeID and
la.StartDate <= a.Date
) la
where
la.seqnum = 1
) la
Thank you all, but I managed to find the answer:
select *
from LocationAssociations la
inner join Activities a on la.EmployeeID = a.EmployeeID
and la.StartDate = (select max(StartDate) from LocationAssociations where StartDate >= la.StartDate and StartDate <= a.Date)

Multiple Column Pivot in T-SQL

I am working with a table where there are multiple rows that I need pivoted into columns. So the pivot is the perfect solution for this, and works well when all I need is one field. I am needing to return several fields based upon the pivot. Here is the pseudo code with specifics stripped out:
SELECT
field1,
[1], [2], [3], [4]
FROM
(
SELECT
field1,
field2,
(ROW_NUMBER() OVER(PARTITION BY field1 ORDER BY field2)) RowID
FROM tblname
) AS SourceTable
PIVOT
(
MAX(field2)
FOR RowID IN ([1], [2], [3], [4])
) AS PivotTable;
The above syntax works brilliantly, but what do I do when I need to get additional information found in field3, field4....?
Rewrite using MAX(CASE...) and GROUP BY:
select
field1
, [1] = max(case when RowID = 1 then field2 end)
, [2] = max(case when RowID = 2 then field2 end)
, [3] = max(case when RowID = 3 then field2 end)
, [4] = max(case when RowID = 4 then field2 end)
from (
select
field1
, field2
, RowID = row_number() over (partition by field1 order by field2)
from tblname
) SourceTable
group by
field1
From there you can add in field3, field4, etc.
The trick to doing multiple pivots over a row_number is to modify that row number sequence to store both the sequence and the field number. Here's an example that does what you want with multiple PIVOT statements.
-- populate some test data
if object_id('tempdb..#tmp') is not null drop table #tmp
create table #tmp (
ID int identity(1,1) not null,
MainField varchar(100),
ThatField int,
ThatOtherField datetime
)
insert into #tmp (MainField, ThatField, ThatOtherField)
select 'A', 10, '1/1/2000' union all
select 'A', 20, '2/1/2000' union all
select 'A', 30, '3/1/2000' union all
select 'B', 10, '1/1/2001' union all
select 'B', 20, '2/1/2001' union all
select 'B', 30, '3/1/2001' union all
select 'B', 40, '4/1/2001' union all
select 'C', 10, '1/1/2002' union all
select 'D', 10, '1/1/2000' union all
select 'D', 20, '2/1/2000' --union all
-- pivot over multiple columns using the 1.1, 1.2, 2.1, 2.2 sequence trick
select
MainField,
max([1.1]) as ThatField1,
max([1.2]) as ThatOtherField1,
max([2.1]) as ThatField2,
max([2.2]) as ThatOtherField2,
max([3.1]) as ThatField3,
max([3.2]) as ThatOtherField3,
max([4.1]) as ThatField4,
max([4.2]) as ThatOtherField4
from
(
select x.*,
cast(row_number() over (partition by MainField order by ThatField) as varchar(2)) + '.1' as ThatFieldSequence,
cast(row_number() over (partition by MainField order by ThatField) as varchar(2)) + '.2' as ThatOtherFieldSequence
from #tmp x
) a
pivot (
max(ThatField) for ThatFieldSequence in ([1.1], [2.1], [3.1], [4.1])
) p1
pivot (
max(ThatOtherField) for ThatOtherFieldSequence in ([1.2], [2.2], [3.2], [4.2])
) p2
group by
MainField
I am unsure if you are using MS SQL Server, but if you are... You may want to take a look at the CROSS APPLY functionality of the engine. Basically it will allow you to apply the results of a table-valued UDF to a result set. This would require you to put your pivot query into a table-valued result set.
http://weblogs.sqlteam.com/jeffs/archive/2007/10/18/sql-server-cross-apply.aspx
wrap your sql statement with something like:
select a.segment, sum(field2), sum(field3)
from (original select with case arguments) a
group by a.segment
It should collapse your results into one row, grouped on field1.
It is possible to pivot on multiple columns, but you need to be careful about reusing the pivot column across multiple pivots. Here is a good blog post on the subject:
http://pratchev.blogspot.com/2009/01/pivoting-on-multiple-columns.html

Resources