This is a possible duplicate of other Partition By + Rank questions but I found most of those questions/answers to be too specific to their particular business logic. What I'm looking for is a more general LINQ version of the following type of query:
SELECT id,
field1,
field2,
ROW_NUMBER() OVER (PARTITION BY id
ORDER BY field1 desc) ROWNUM
FROM someTable;
A very common thing we do with this is to wrap it like in something like this:
SELECT id,
field1,
field2
FROM (SELECT id,
field1,
field2,
ROW_NUMBER() OVER (PARTITION BY id
ORDER BY field1 desc) ROWNUM
FROM someTable)
WHERE ROWNUM = 1;
Which returns the row containing the highest value in field1 for each id. Changing the order by to asc of course would return the lowest value or changing the rank to 2 will get the second highest/lowest value etc, etc. Is there a way to write a LINQ query that can be executed server side that gives us the same sort of functionality? Ideally, one that as performant as the above.
Edit:
I've tried numerous different solutions after scouring the web and they all end up giving me the same problem that Reed's answer below does because the SQL generated includes an APPLY.
A couple examples I tried:
from p in db.someTable
group p by p.id into g
let mostRecent = g.OrderByDescending(o => o.field1).FirstOrDefault()
select new {
g.Key,
mostRecent
};
db.someTable
.GroupBy(g => g.id, (a, b) => b.OrderByDescending(o => o.field1).Take(1))
.SelectMany(m => m);
Both of these result in very similar, if not identical, SQL code which uses an OUTER APPLY that Oracle does not support.
You should be able to do something like:
var results = someTable
.GroupBy(row => row.id)
.Select(group => group.OrderByDescending(r => r.id).First());
If you wanted the third highest value, you could do something like:
var results = someTable
.GroupBy(row => row.id)
.Select(group => group.OrderByDescending(r => r.id).Skip(2).FirstOrDefault())
.Where(r => r != null); // Remove the groups that don't have 3 items
an alternative way, by using a subquery which separately gets the maximum field1 for each ID.
SELECT a.*
FROM someTable a
INNER JOIN
(
SELECT id, max(field1) max_field
FROM sometable
GROUP BY id
) b ON a.id = b.ID AND
a.field1 = b.max_field
when converted to LINQ:
from a in someTable
join b in
(
from o in someTable
group o by new {o.ID} into g
select new
{
g.Key.ID,
max_field = g.Max(p => p.field1)
}
) on new {a.ID, a.field1} equals new {b.ID, field1 = b.max_field}
select a
Related
Let's suppose we have to retrieve from the DB one record from BlogPost (Id = 123) and its most 5 recent comments, we would probably write something like:
var test = _db.BlogPost.Select(x => new blogDTO()
{
Id = x.Id,
PostText = x.PostText,
Comments = x.Comments.OrderByDescending(o => o.Ts).Take(5).Select(c => new commentDTO() { Text = c.CommentText }).ToList()
}).Where(x => x.Id == 123).ToList();
This would be translated in a very inefficient SQL:
SELECT [i].[Id], [i].[PostText], [t0].[CommentText], [t0].[Id]
FROM [BlogPost] AS [i]
LEFT JOIN (
SELECT [t].[CommentText], [t].[Id], [t].[BlogPostId], [t].[Ts]
FROM (
SELECT [i0].[CommentText] AS [CommentText], [i0].[Id], [i0].[BlogPostId], ROW_NUMBER() OVER(PARTITION BY [i0].[BlogPostId] ORDER BY [i0].[Ts] DESC) AS [row], [i0].[Ts]
FROM [Comments] AS [i0]
) AS [t]
WHERE [t].[row] <= 5
) AS [t0] ON [i].[Id] = [t0].[BlogPostId]
WHERE [i].[Id] = 813
ORDER BY [i].[Id], [t0].[BlogPostId], [t0].[Ts] DESC
While probably the SQL should read something like:
SELECT Id, CommentText, Ts from (
SELECT [i].[Id], ic.CommentText, ROW_NUMBER() OVER(PARTITION BY [ic].[BlogPostId] ORDER BY [ic].[Ts] DESC) AS [row], [ic].[Ts]
FROM [BlogPost] AS [i]
left join Comments as ic on ic.BlogPostId = i.Id
WHERE [i].[Id] = 813 ) as res
where res.row <= 5
Do you know how the LINQ to Entities code could be better written?
Or do you feel this is something the EF team should look into?
Id Mshp_Id Action
1 9029 Register
2 9029 Create CV
3 8476 Register
4 8476 Create CV
5 8476 JOB SEARCH
I want to return the two membership ID's and their latest action.
so what would be left is ID 2 AND 5 ONLY.
If you are using SQL Server 2012+, you can use LAST_VALUE
SELECT ID,
,mshp_id
,action
FROM (
SELECT *,LAST_VALUE(id) OVER (PARTITION BY mshp_id
ORDER BY ID
ROWS BETWEEN UNBOUNDED PRECEDING
AND UNBOUNDED FOLLOWING
) last_val
FROM YOUR_TABLE
) a
WHERE id = last_val
ORDER BY ID
Check Demo here
Output
Last action per member can be fetched through the following ways
Solution 1:
select Id, Mshp_Id, Action from (
select *, row_number() over (partition by Mshp_Id order by id desc) r from user_action
) a
where a.r = 1
order by id
Solution 2
select u.* from user_action u
join (select Mshp_Id, max(id) id from user_action
group by Mshp_Id ) a
on a.Mshp_Id = u.Mshp_Id and a.id = u.id
order by u.id
Good luck with your work !
How would I filter out a table that it only includes one value for a column (it does not matter which one).
The SQL query used to create the below looks like this :
SELECT DISTINCT
S.Id AS ReferenceID,
M.NewModuleID AS ModuleId,
SM.Compulsory
FROM
Struct S
INNER JOIN
StructModule SM
ON SM.StructId = S.Id
INNER JOIN
ModuleMap M
ON M.StructId = S.Id
AND SM.ModuleId = M.OldModuleId
However this does not return the values in the way that I need it. the return table looks like this:
ReferenceID NewModuleID Compulsory
1 100 1
1 210 0
2 251 1
2 251 0
However I would like the SQL query to return a unique value for the NewModuleID field. Ideally taking the first occurrence of a value
the relevant columns of the above tables are as follows:
Struct:
ID (INT)
StructModule:
ID (INT)
StructID (INT)
ModuleID (INT)
Compulsory (BIT)
ModuleMap:
ID (INT)
OldModuleId (INT)
StructID (INT)
NewModuleID (INT)
Your question is not very clear, but after reading following statement.
However I would like the SQL query to return a unique value for the
NewModuleID field. Ideally taking the first occurrence of a value
I can guess that you are looking for something like following query.
SELECT * FROM
(
SELECT
S.Id AS ReferenceID,
M.NewModuleID AS ModuleId,
SM.Compulsory ,
ROW_NUMBER() OVER(PARTITION BY S.ID, M.NewModuleID ORDER BY M.NewModuleID) RN
FROM
Struct S
INNER JOIN
StructModule SM
ON SM.StructId = S.Id
INNER JOIN
ModuleMap M
ON M.StructId = S.Id
AND SM.ModuleId = M.OldModuleId
)T
WHERE RN=1
Note : You don't need distinct if you are using RN=1 condition.
I'm trying to get some individual stats from a score keeping system. In essence, teams are scheduled into matches
Match
---------
Matchid (uniqueidentifier)
SessionId (int)
WeekNum (int)
Those matches are broken into sets, where two particular players from a team play each other
MatchSet
-----------
SetId (int)
Matchid (uniqueidentifier)
HomePlayer (int)
AwayPlayer (int)
WinningPlayer (int)
LosingPlayer (int)
WinningPoints (int)
LosingPoints (int)
MatchEndTime (datetime)
In order to allow for player absences, players are allowed to play twice per Match. The points from each set will count for their team totals, but for the individual awards, only the first time that a player plays should be counted.
I had been trying to make use of a CTE to number the rows
;WITH cte AS
(
SELECT *,
ROW_NUMBER() OVER (PARTITION BY MatchId ORDER BY MatchEndTime) AS rn
FROM
(SELECT
SetId, MS.MatchId, WinningPlayer, LosingPlayer,
HomePlayer, AwayPlayer, WinningPoints, LosingPoints, MatchEndTime
FROM
MatchSet MS
INNER JOIN
[Match] M ON M.MatchId = MS.MatchId AND M.[Session] = #SessionId
)
but I'm struggling as the player could be either the home player or away player in a given set (also, could either be the winner or the loser)
Ideally, this result could then be joined based on either WinningPlayer or LosingPlayer back to the players table, which would let me get a list of individual standings
I think the first step is to write a couple CTEs that get the data into a structure where you can evaluate player points regardless of win/loss. Here's a possible start:
;with PlayersPoints as
(
select m.MatchId
,m.SessionId
,m.WeekNum
,ms.SetId
,ms.WinningPlayer as PlayerId
,ms.WinningPoints as Points
,'W' as Outcome
,ms.MatchEndTime
from MatchSet ms
join Match m on on ms.MatchId = m.MatchId
and m.SessionId = #SessionId
union all
select m.MatchId
,m.SessionId
,m.WeekNum
,ms.SetId
,ms.LosingPlayer as PlayerId
,ms.LosingPoints as Points
,'L' as Outcome
,ms.MatchEndTime
from MatchSet ms
join Match m on on ms.MatchId = m.MatchId
and m.SessionId = #SessionId
)
, PlayerMatch as
(
select SetId
,WeekNum
,MatchId
,PlayerId
,row_number() over (partition by PlayerId, WeekNum order by MatchEndTime) as PlayerMatchSequence
from PlayerPoints
)
....
The first CTE pulls out the points for each player, and the second CTE identifies which match it is. So for calculating individual points, you'd look for PlayerMatchSequence = 1.
Perhaps you could virtualize a normalized view of your data and key off of it instead of the MatchSet table.
;WITH TeamPlayerMatch AS
(
SELECT TeamID,PlayerID=WinnningPlayer,MatchID,Points = MS.WinningPoints, IsWinner=1 FROM MatchSet MS INNER JOIN TeamPlayer T ON T.PlayerID=HomePlayer
UNION ALL
SELECT TeamID,PlayerID=LosingPlayer,MatchID,Points = MS.LosingPoints, IsWinner=0 FROM MatchSet MS INNER JOIN TeamPlayer T ON T.PlayerID=AwayPlayer
)
,cte AS
(
SELECT *,
ROW_NUMBER() OVER (PARTITION BY MatchId ORDER BY MatchEndTime) AS rn
FROM
(SELECT
SetId, MS.MatchId, PlayerID, TeamID, Points, MatchEndTime, IsWinner
FROM
TeamPlayerMatch MS
INNER JOIN
[Match] M ON M.MatchId = MS.MatchId AND M.[Session] = #SessionId
WHERE
IsWinner=1
)
I inherit an old SQL script that I want to optimize but after several tests, I must admit that all my tests only creates huge SQL with repetitive blocks. I would like to know if someone can propose a better code for the following pattern (see code below). I don't want to use temporary table (WITH). For simplicity, I only put 3 levels (table TMP_C, TMP_D and TMP_E) but the original SQL have 8 levels.
WITH
TMP_A AS (
SELECT
ID,
Field_X
FROM A
TMP_B AS(
SELECT DISTINCT
ID,
Field_Y,
CASE
WHEN Field_Z IN ('TEST_1','TEST_2') THEN 'CATEG_1'
WHEN Field_Z IN ('TEST_3','TEST_4') THEN 'CATEG_2'
WHEN Field_Z IN ('TEST_5','TEST_6') THEN 'CATEG_3'
ELSE 'CATEG_4'
END AS CATEG
FROM B
INNER JOIN TMP_A
ON TMP_A.ID=TMP_B.ID),
TMP_C AS (
SELECT DISTINCT
ID,
CATEG
FROM TMP_B
WHERE CATEG='CATEG_1'),
TMP_D AS (
SELECT DISTINCT
ID,
CATEG
FROM TMP_B
WHERE CATEG='CATEG_2' AND ID NOT IN (SELECT ID FROM TMP_C)),
TMP_E AS (
SELECT DISTINCT
ID,
CATEG
FROM TMP_B
WHERE CATEG='CATEG_3'
AND ID NOT IN (SELECT ID FROM TMP_C)
AND ID NOT IN (SELECT ID FROM TMP_D))
SELECT * FROM TMP_C
UNION
SELECT * FROM TMP_D
UNION
SELECT * FROM TMP_E
Many thanks in advance for your help.
First off, select DISTINCT will prevent duplicates from the result set, so you are overworking the condition. By adding the "WITH" definitions and trying to nest their use makes it more confusing to follow. The data is ultimately all coming from the "B" table where also has key match in "A". Lets start with just that... And since you are not using anything from the (B)Field_Y or (A)Field_X in your result set, don't add them to the mix of confusion.
SELECT DISTINCT
B.ID,
CASE WHEN B.Field_Z IN ('TEST_1','TEST_2') THEN 'CATEG_1'
WHEN B.Field_Z IN ('TEST_3','TEST_4') THEN 'CATEG_2'
WHEN B.Field_Z IN ('TEST_5','TEST_6') THEN 'CATEG_3'
ELSE 'CATEG_4'
END AS CATEG
FROM
B JOIN A ON B.ID = A.ID
WHERE
B.Field_Z IN ( 'TEST_1', 'TEST_2', 'TEST_3', 'TEST_4', 'TEST_5', 'TEST_6' )
The where clause will only include those category qualifying values you want and still have the results per each category.
Now, if you actually needed other values from your "Field_Y" or "Field_X", then that would generate a different query. However, your Tmp_C, Tmp_D and Tmp_E are only asking for the ID and CATEG columns anyhow.
This may perform better
SELECT DISTINCT B.ID, 'CATEG_1'
FROM
B JOIN A ON B.ID = A.ID
WHERE
B.Field_Z IN ( 'TEST_1', 'TEST_2')
UNION
SELECT DISTINCT B.ID, 'CATEG_2'
FROM
B JOIN A ON B.ID = A.ID
WHERE
B.Field_Z IN ( 'TEST_3', 'TEST_4')
...