How to optimize LINQ to Entities retrieving top x records from a navigation property - sql-server

Let's suppose we have to retrieve from the DB one record from BlogPost (Id = 123) and its most 5 recent comments, we would probably write something like:
var test = _db.BlogPost.Select(x => new blogDTO()
{
Id = x.Id,
PostText = x.PostText,
Comments = x.Comments.OrderByDescending(o => o.Ts).Take(5).Select(c => new commentDTO() { Text = c.CommentText }).ToList()
}).Where(x => x.Id == 123).ToList();
This would be translated in a very inefficient SQL:
SELECT [i].[Id], [i].[PostText], [t0].[CommentText], [t0].[Id]
FROM [BlogPost] AS [i]
LEFT JOIN (
SELECT [t].[CommentText], [t].[Id], [t].[BlogPostId], [t].[Ts]
FROM (
SELECT [i0].[CommentText] AS [CommentText], [i0].[Id], [i0].[BlogPostId], ROW_NUMBER() OVER(PARTITION BY [i0].[BlogPostId] ORDER BY [i0].[Ts] DESC) AS [row], [i0].[Ts]
FROM [Comments] AS [i0]
) AS [t]
WHERE [t].[row] <= 5
) AS [t0] ON [i].[Id] = [t0].[BlogPostId]
WHERE [i].[Id] = 813
ORDER BY [i].[Id], [t0].[BlogPostId], [t0].[Ts] DESC
While probably the SQL should read something like:
SELECT Id, CommentText, Ts from (
SELECT [i].[Id], ic.CommentText, ROW_NUMBER() OVER(PARTITION BY [ic].[BlogPostId] ORDER BY [ic].[Ts] DESC) AS [row], [ic].[Ts]
FROM [BlogPost] AS [i]
left join Comments as ic on ic.BlogPostId = i.Id
WHERE [i].[Id] = 813 ) as res
where res.row <= 5
Do you know how the LINQ to Entities code could be better written?
Or do you feel this is something the EF team should look into?

Related

VIEWS with SELECT inside conditions delaying the query

In one of my SQL views I am using an inline select statement with a where clause.
The outline of my view is like
ALTER VIEW [dbo].[vw_autumn]
AS
SELECT
BookNumber, Title, shopNo
FROM
(SELECT
BookNumber, Title, shopNO
FROM
(SELECT DISTINCT
(sum_vnr) AS BookNumber,
navn1 AS Title,
tik AS ShopNO,
ROW_NUMBER() OVER (PARTITION BY sum_vnr, tik ORDER BY sum_vnr DESC) AS rownumber
FROM
sum s
INNER JOIN
hod h ON s.tik = h.tik
WHERE
s.aar = (SELECT currentyear
FROM SemesterInfo
WHERE SemName = 'Autumn')
AND CAST(s.sum_vnr AS BIGINT) > 10000
AND (s.id LIKE 'h%' OR s.id LIKE 'H%' OR s.id LIKE 'j%'
OR s.id LIKE 'J%')) a
WHERE rownumber = 1
) b
LEFT JOIN (
------
) p ON b.ShopNO = p.tikk
AND b.ISBN = p.vnr
LEFT JOIN table_k k ON p.aar = k.aar
GO
And if I remove the WHERE clause of
WHERE
s.aar = (SELECT currentyear
FROM SemesterInfo
WHERE SemName = 'Autumn')
and shorten it to
WHERE s.aar =19
I am getting the result of view very quickly. But I am trying to add some dynamic nature to this query and selecting this constant from a settings table
Any thoughts on this? Why is the query taking an indefinite time to load with an inline Where clause?
:try with IN insted of =
WHERE
s.aar in (SELECT currentyear
FROM SemesterInfo
WHERE SemName = 'Autumn')
Rewrite the subquery as a join.
INNER JOIN SemesterInfo si
ON s.aer = si.currentYear
WHERE si.SemName = 'Autumn'
If that doesn't do it, consider keeping this syntax and creating an index on SemName

How to do multiple select using PostgresSQL

I have two tables:
destination and weather_forecast and I am getting lastest weather_forecast (order by reference_time) like this:
SELECT destination_id, reference_time FROM weather_forecast
WHERE destination_id = (SELECT id FROM destination WHERE slug = 'prague')
AND reference_time < now()
ORDER BY reference_time DESC
LIMIT 1;
For slug prague (Prague city).
I need to do this query for a thousand cities...
Definitely it is not optimal to call this using loop:
const SLUG_LIST = ['prague', 'new-york', .... next 1000 items]
const weather = db.select...
Is there any better way how to do it using some optimal way? Some select base on a list of items from array?
Thank you!
You can use ROW_NUMBER() to rank weather forecasts by descending reference_time for each destination, and then filter on the most recent forecast:
SELECT *
FROM (
SELECT
d.slug,
w.destination_id,
w.reference_time,
ROW_NUMBER() OVER(PARTITION BY w.destination_id ORDER BY w.reference_time DESC) rn
FROM weather_forecast w
INNER JOIN destination d ON d.id = w.destination_id
WHERE w.reference_time < now()
) x
WHERE rn = 1

Entitiy Framework parametrize List<long>

We are using EF 6.2 for .NET. We are trying the next query:
dbContext.memberScales
.Where(s => members.Contains(s.idMember))
.OrderByDescending(s => s.dateScale)
.Select(s => new MemberScaleBasicFields
{
id = s.idMember,
dateScale = s.dateScale,
weight = s.weight,
massMuscle = s.massMuscle,
massFat = s.massFat,
massBone = s.massBone,
imc = s.imc,
water = s.water,
dailyCalories = s.dailyCalories,
tmd = s.tmd,
physicalValuation = s.physicalValuation,
adiposity = s.adiposity,
assessment = s.assessment,
ageMetabolica = s.ageMetabolica
})
.ToList();
In the where clause, we have to filter for a list of long.
members.Contains(s.idMember)
When we see the query text, we get a query with hardcore values:
SELECT
[Extent1].[ValoracionFisica] AS [ValoracionFisica],
[Extent1].[IdSocio] AS [IdSocio],
[Extent1].[Fecha] AS [Fecha],
[Extent1].[Peso] AS [Peso],
[Extent1].[MasaMagra] AS [MasaMagra],
[Extent1].[MasaGrasa] AS [MasaGrasa],
[Extent1].[MasaOsea] AS [MasaOsea],
[Extent1].[IMC] AS [IMC],
[Extent1].[Agua] AS [Agua],
[Extent1].[CaloriasDiarias] AS [CaloriasDiarias],
[Extent1].[TMB] AS [TMB],
[Extent1].[Adiposidad] AS [Adiposidad],
[Extent1].[Valoracion] AS [Valoracion],
[Extent1].[EdadMetabolica] AS [EdadMetabolica]
FROM [dbo].[Socios_Bascula] AS [Extent1]
WHERE [Extent1].[IdSocio] IN (cast(1225789 as bigint), cast(1228549 as bigint), cast(1228557 as bigint), cast(1230732 as bigint)....
We want to know how to make this query parametrized. Or if there is any alternative to make this query without dropping cache plan. For example,
[Extent1].[IdSocio] IN (#value1, #value2...)
If we try make the
members.Any(x => x == s.idMember)
we get the next text
SELECT
[Extent1].[ValoracionFisica] AS [ValoracionFisica],
[Extent1].[IdSocio] AS [IdSocio],
[Extent1].[Fecha] AS [Fecha],
[Extent1].[Peso] AS [Peso],
[Extent1].[MasaMagra] AS [MasaMagra],
[Extent1].[MasaGrasa] AS [MasaGrasa],
[Extent1].[MasaOsea] AS [MasaOsea],
[Extent1].[IMC] AS [IMC],
[Extent1].[Agua] AS [Agua],
[Extent1].[CaloriasDiarias] AS [CaloriasDiarias],
[Extent1].[TMB] AS [TMB],
[Extent1].[Adiposidad] AS [Adiposidad],
[Extent1].[Valoracion] AS [Valoracion],
[Extent1].[EdadMetabolica] AS [EdadMetabolica]
FROM [dbo].[Socios_Bascula] AS [Extent1]
WHERE EXISTS (SELECT
1 AS [C1]
FROM (SELECT
cast(1225789 as bigint) AS [C1]
FROM ( SELECT 1 AS X ) AS [SingleRowTable1]
UNION ALL
SELECT
cast(1228549 as bigint) AS [C1]
FROM ( SELECT 1 AS X ) AS [SingleRowTable2]....

LINQ (to Oracle) - Row_Number() Over Partition By

This is a possible duplicate of other Partition By + Rank questions but I found most of those questions/answers to be too specific to their particular business logic. What I'm looking for is a more general LINQ version of the following type of query:
SELECT id,
field1,
field2,
ROW_NUMBER() OVER (PARTITION BY id
ORDER BY field1 desc) ROWNUM
FROM someTable;
A very common thing we do with this is to wrap it like in something like this:
SELECT id,
field1,
field2
FROM (SELECT id,
field1,
field2,
ROW_NUMBER() OVER (PARTITION BY id
ORDER BY field1 desc) ROWNUM
FROM someTable)
WHERE ROWNUM = 1;
Which returns the row containing the highest value in field1 for each id. Changing the order by to asc of course would return the lowest value or changing the rank to 2 will get the second highest/lowest value etc, etc. Is there a way to write a LINQ query that can be executed server side that gives us the same sort of functionality? Ideally, one that as performant as the above.
Edit:
I've tried numerous different solutions after scouring the web and they all end up giving me the same problem that Reed's answer below does because the SQL generated includes an APPLY.
A couple examples I tried:
from p in db.someTable
group p by p.id into g
let mostRecent = g.OrderByDescending(o => o.field1).FirstOrDefault()
select new {
g.Key,
mostRecent
};
db.someTable
.GroupBy(g => g.id, (a, b) => b.OrderByDescending(o => o.field1).Take(1))
.SelectMany(m => m);
Both of these result in very similar, if not identical, SQL code which uses an OUTER APPLY that Oracle does not support.
You should be able to do something like:
var results = someTable
.GroupBy(row => row.id)
.Select(group => group.OrderByDescending(r => r.id).First());
If you wanted the third highest value, you could do something like:
var results = someTable
.GroupBy(row => row.id)
.Select(group => group.OrderByDescending(r => r.id).Skip(2).FirstOrDefault())
.Where(r => r != null); // Remove the groups that don't have 3 items
an alternative way, by using a subquery which separately gets the maximum field1 for each ID.
SELECT a.*
FROM someTable a
INNER JOIN
(
SELECT id, max(field1) max_field
FROM sometable
GROUP BY id
) b ON a.id = b.ID AND
a.field1 = b.max_field
when converted to LINQ:
from a in someTable
join b in
(
from o in someTable
group o by new {o.ID} into g
select new
{
g.Key.ID,
max_field = g.Max(p => p.field1)
}
) on new {a.ID, a.field1} equals new {b.ID, field1 = b.max_field}
select a

paging over SELECT UNION super slow and killing my server

I have an SP that returns paged data from a query that contains a UNION. This is killing my DB and taking 30 seconds to run sometimes, am I missing something obvious here? What can I do to improve it's performance?
Tables Involved: Products, Categories, CategoryProducts
Goal:
Any Products that are not in a Category or have been deleted from a category UNION all Products currently in a category and page over them for a web service.
I have Indexes on all columns that I am joining on and there are 427,996 Products, 6148 Categories and 409,691 CategoryProducts in the database.
Here is my query that is taking between 6, and 30 seconds to run:
SELECT * FROM (
SELECT ROW_NUMBER() OVER(ORDER BY Products.ItemID, Products.ManufacturerID) AS RowNum, *
FROM
(
SELECT Products.*,
CategoryID = NULL, CategoryName = NULL,
CategoryProductID = NULL,
ContainerMinimumQuantity =
CASE COALESCE(Products.ContainerMinQty, 0)
WHEN 0 THEN Products.OrderMinimumQuantity
ELSE Products.ContainerMinQty
END
Products.IsDeleted,
SortOrder = NULL
FROM CategoryProducts RIGHT OUTER JOIN Products
ON CategoryProducts.ManufacturerID = Products.ManufacturerID
AND CategoryProducts.ItemID = Products.ItemID
WHERE (Products.ManufacturerID = #ManufacturerID)
AND (Products.ModifiedOn > #tStamp )
AND ((CategoryProducts.IsDeleted = 1) OR (CategoryProducts.IsDeleted IS NULL))
UNION
SELECT Products.*,
CategoryProducts.CategoryID , CategoryProducts.CategoryName,
CategoryProducts.CategoryProductID ,
ContainerMinimumQuantity =
CASE COALESCE(Products.ContainerMinQty, 0)
WHEN 0 THEN Products.OrderMinimumQuantity
ELSE Products.ContainerMinQty
END
CategoryProducts.IsDeleted,
CategoryProducts.SortOrder
FROM Categories INNER JOIN
CategoryProducts ON Categories.CategoryID = CategoryProducts.CategoryID INNER JOIN
Products ON CategoryProducts.ManufacturerID = Products.ManufacturerID
AND CategoryProducts.ItemID = Products.ItemID
WHERE (Products.ManufacturerID = #ManufacturerID)
AND (Products.ModifiedOn > #tStamp OR CategoryProducts.ModifiedOn > #tStamp))
AS Products) AS C
WHERE RowNum >= #StartRow AND RowNum <= #EndRow
Any insight would be greatly appreciated.
If I read your situation correctly, the only reason for having two distinct queries is treatment of missing/deleted CategoryProducts. I tried to address this issue by left join with IsDeleted = 0 to bring all deleted CategoryProducts to nulls, so I don't have to test them again. ModifiedOn part got another test for null for missing/deleted Categoryproducts you wish to retrieve.
select *
from (
SELECT
Products.*,
-- Following three columns will be null for deleted/missing categories
CategoryProducts.CategoryID,
CategoryProducts.CategoryName,
CategoryProducts.CategoryProductID ,
ContainerMinimumQuantity = COALESCE(nullif(Products.ContainerMinQty, 0),
Products.OrderMinimumQuantity),
CategoryProducts.IsDeleted,
CategoryProducts.SortOrder,
ROW_NUMBER() OVER(ORDER BY Products.ItemID,
Products.ManufacturerID) AS RowNum
FROM Products
LEFT JOIN CategoryProducts
ON CategoryProducts.ManufacturerID = Products.ManufacturerID
AND CategoryProducts.ItemID = Products.ItemID
-- Filter IsDeleted in join so we get nulls for deleted categories
-- And treat them the same as missing ones
AND CategoryProducts.IsDeleted = 0
LEFT JOIN Categories
ON Categories.CategoryID = CategoryProducts.CategoryID
WHERE Products.ManufacturerID = #ManufacturerID
AND (Products.ModifiedOn > #tStamp
-- Deleted/missing categories
OR CategoryProducts.ModifiedOn is null
OR CategoryProducts.ModifiedOn > #tStamp)
) C
WHERE RowNum >= #StartRow AND RowNum <= #EndRow
On a third look I don't see that Category is used at all except as a filter to CategoryProducts. If this is the case second LEFT JOIN should be changed to INNER JOIN and this section should be enclosed in parenthessis.

Resources