When I include multiple properties in EF it generates 'UNION ALL' SQL queries to include them. If I have 3 properties to include, it is almost 3 times slower to run it.
If I create the same query with 3 left joins it is performing much better...
Is it possible to use left join instead of union when linq is generating the query?
If it is not possible what is the reason to use union all, which seems slow?
First of all, you can perform left joins in LINQ to entities using DefaultIfEmpty.
Second of all, if your navigation properties are 1:N (and not 1:1 or 1:0..1), then using 3 joins will severely increase the output volume from the database and this will reduce the performance, in this case, running one query to retrieve the main entities, then 3 additional queries based on the IDs from the first query, ran in a single DB call using the .Future() extension, will result in better performance.
E.g.
var entities = context.Entities.AsQueryable().Where(...).ToList();
var ids = entities.select(e => e.Id).ToList();
var subEntities1Query = context.SubEntities1.AsQueryable().Where(se1 =>
ids.Contains(se1.ParentId)).Future();
var subEntities2Query = context.SubEntities2.AsQueryable().Where(se2 =>
ids.Contains(se2.ParentId)).Future();
var subEntities3Query = context.SubEntities3.AsQueryable().Where(se3 =>
ids.Contains(se3.ParentId)).Future();
var subEntities1 = subEntities1Query.ToList();
var subEntities2 = subEntities2Query.ToList();
var subEntities3 = subEntities3Query.ToList();
foreach (var entity in entities)
{
entity.SubEntities1 = subEntities1.Where(se1 =>
se1.ParentId = entity.Id).ToList();
entity.SubEntities2 = subEntities2.Where(se2 =>
se2.ParentId = entity.Id).ToList();
entity.SubEntities3 = subEntities3.Where(se3 =>
se3.ParentId = entity.Id).ToList();
}
Related
I'm trying to achieve a query similar to this:
SELECT r.*, (SELECT COUNT(UserID) FROM RoleUsers ru WHERE ru.RoleId = r.Id) AS Assignments
FROM Roles r
To retrieve the number of the users per each role.
The simplest and the most straightforward option to implement desired output:
this.DbContext.Set<Role>().Include(x => x.RoleUser)
.Select(x => new { x, Assignments = x.RoleUsers.Count() });
Retrieves all the roles, and then N queries to retrieve count:
SELECT COUNT(*)
FROM [dbo].[RoleUsers] AS [r0]
WHERE #_outer_Id = [r0].[RoleId]
Which is not an option at all. I tried also to use GroupJoin, but it loads all the required data set in one query and performs grouping in memory:
this.DbContext.Set<Role>().GroupJoin(this.DbContext.Set<RoleUser>(), role => role.Id,
roleUser => roleUser.RoleId, (role, roleUser) => new
{
Role = role,
Assignments = roleUser.Count()
});
Generated query:
SELECT [role].[Id], [role].[CustomerId], [role].[CreateDate], [role].[Description], [role].[Mask], [role].[ModifyDate], [role].[Name], [assignment].[UserId], [assignment].[CustomerId], [assignment].[RoleId]
FROM [dbo].[Roles] AS [role]
LEFT JOIN [dbo].[RoleUser] AS [assignment] ON [role].[Id] = [assignment].[RoleId]
ORDER BY [role].[Id]
Also, I was looking into a way, to use windowing functions, where I can just split count by partition and use distinct roles, but I have no idea how to wire up windowing function in EF:
SELECT DISTINCT r.*, COUNT(ra.UserID) OVER(PARTITION BY ru.RoleId)
FROM RoleUsers ru
RIGHT JOIN Roles r ON r.Id = ru.RoleId
So, is there any way to avoid EntitySQL?
Currently there is a defect in EF Core query aggregate translation to SQL when the query projection contains a whole entity, like
.Select(role => new { Role = role, ...}
The only workaround I'm aware of is to project to new entity (at least this is supported by EF Core) like
var query = this.DbContext.Set<Role>()
.Select(role => new
{
Role = new Role { Id = role.Id, Name = role.Name, /* all other Role properies */ },
Assignments = role.RoleUsers.Count()
});
This translates to single SQL query. The drawback is that you have to manually project all entity properties.
this.DbContext.Set<Role>()
.Select(x => new { x, Assignments = x.RoleUsers.Count() });
you dont need to add include for RoleUser since you are using Select statement. Furhtermore, I guess that you are using LazyLoading where this is expected behavior. If you use eager loading the result of your LINQ will run in one query.
you can use context.Configuration.LazyLoadingEnabled = false; before your LINQ query to disable lazy loading specifically for this operation
I have a program that downloads data from server database to client database. server database keeps growing recently.
in that program, there is an option to select download all data OR download data for a specific time period (can select backward days from today). if the user selects all, I wrote the program to truncate client database table and insert all data using bulk copy. that part is ok.
but the problem is when user select a specific time period (each recode has created data time ) program has to compare two tables and divide recodes (server data) in two tables. one is, not exist data and the second one is not existing data. and what I'm going to do is,
not existing data directly insert into client DB (i'm using bulk insert) and Existing data inserting into a tempory table using bulkcopy and after update client's table using the above tempory table. My actual problem occurs when dividing server's table. this is how I did it
updateTable = (From c In dt_from_server.AsEnumerable()
Join o In Dt_from_client.AsEnumerable()
On c.Field(Of String)("BARCODE").Trim() Equals o.Field(Of String)("BARCODE").Trim()
And c.Field(Of String)("ITEM_CODE").Trim() Equals o.Field(Of String)("ITEM_CODE").Trim()
Select c).CopyToDataTable()
insertTable = dt_server.AsEnumerable()
.Except(updateTable.AsEnumerable(), DataRowComparer.Default)
.CopyToDataTable()
(normally there is over 1M recodes in the server table )
when there is over 1 Milion recodes, Update part taking acceptable time like 10 minutes (Yes it taking 5GB space from Ram - in this case, it's ok when considering performance )
but insert part seams taking days, just to assing the insertTable(datatable). this is the issue.
AsEnumerable().Except() part taking long time and I couldn't find a solution speedup this process. I'm not sure I explained this correctly. Could anyone can give me some advice for this?
Since you have commented that dt_from_server and dt_server are actually the same DataTable you don't need to compare all values of all DataRows with each other, which is what DataRowComparer.Default does. You can use Except without second parameter for the comparer, then only references are compared which is much faster.
You also don't need two CopyToDataTable which creates two additonal big DataTables in memory, process the rows one after the other.
Here is a different approach using Linq's left-outer join, which is more efficient:
Dim query = from rServ in dt_from_server.AsEnumerable()
group join rClient in Dt_from_client.AsEnumerable()
On New With{
Key .BarCode = rServ.Field(Of String)("BARCODE").Trim(),
Key .ItemCode = rServ.Field(Of String)("ITEM_CODE").Trim()
} Equals New With{
Key .BarCode = rClient.Field(Of String)("BARCODE").Trim(),
Key .ItemCode = rClient.Field(Of String)("ITEM_CODE").Trim()
} into Group
From client In Group.DefaultIfEmpty()
Select new With { .ServerRow = rServ, .InsertRow = client is Nothing }
Dim insertOrUpdateRows = query.ToLookup(Function(x) x.InsertRow, Function(x) x.ServerRow)
Dim insertRows = insertOrUpdateRows(true).CopyToDataTable() 'CopyToDataTable redundant if you process rows immediately now'
Dim updateRows = insertOrUpdateRows(false).CopyToDataTable() 'CopyToDataTable redundant if you process rows immediately now'
But in general the most scalable and efficient approach would be to not load all into memory at once and then process all, but to use database paging(or a stored-procedure) to process only parts of it in memory, otherwise it's likely that you will encounter a OutOfMemoryException sooner or later.
C# as requested:
var query = from rServ in dt_from_server.AsEnumerable()
join rClient in Dt_from_client.AsEnumerable()
on new { BarCode = rServ.Field<string>("BARCODE").Trim(), ItemCode = rServ.Field<string>("ITEM_CODE").Trim() }
equals new { BarCode = rClient.Field<string>("BARCODE").Trim(), ItemCode = rClient.Field<string>("ITEM_CODE").Trim() }
into clientGroup
from client in clientGroup.DefaultIfEmpty()
select new { ServerRow = rServ, InsertRow = client == null };
var insertOrUpdateRows = query.ToLookup(x => x.InsertRow, x => x.ServerRow);
var insertRows = insertOrUpdateRows[true].CopyToDataTable(); // CopyToDataTable redundant if you process rows immediately now
var updateRows = insertOrUpdateRows[false].CopyToDataTable(); // CopyToDataTable redundant if you process rows immediately now
Currently we have a page where we need pagination so for that i need 2 info
1. Get total number of rows
2. Fetch 'N' number of rows
Currently i am doing it with 2 query, for step 1 something like
count = db.Transactions
.AsNoTracking()
.Where(whereClause
.Count();
And then
db.Transactions
.AsNoTracking()
.Where(whereClause
.Skip(skipRows)
.Take(pagesize)
.ToList();
Is there is any way to optimize it?
You can try using Local Data:
// Load all Transactions with filtering criteria into the context
db.Transactions.AsNoTracking().Where(whereClause).Load();
// Get Count
var transcationsCount = db.Transactions.Local.Count;
// Paging
var pagedTranscations = db.Transactions.Local.Skip(skipRows).Take(pageSize).ToList();
This should only result in one database query being fired to the database in the initial Load() call.
This will return you an IQueryable and you can do queries against that, each will be executed to the db. This way you don't have to rewrite the query each time you want to query something.
var query = (from t in db.Transactions
where whereClause
select t);
var count = query.Count();
var items = query
.Skip(skipRows)
.Take(pagesize)
.ToList();
I currently have an issue in that there are thousands of plans created on the DB for 1 style of EF query.
The query itself is parametrized, but the parameter name keeps changing and hence the text is different, and so results in a fresh new compile for each query hitting my DB server.
So the query looks like this.
(#p__linq__100687 int)SELECT [Extent1].[MyColumn] From MyTable [Extent1]
Where Column1 = #p__linq__100687
and the next one looks like this
(#p__linq__100688 int)SELECT [Extent1].[MyColumn] From MyTable [Extent1]
Where Column1 = #p__linq__100688
What I would like EF to do is
(#p__linq__1 int)SELECT [Extent1].[MyColumn] From MyTable [Extent1]
Where Column1 = #p__linq__1
And then keep reusing the query above instead of incrementing and then being
forced to create a new plan.
So when I trawl through the plan cache on the DB I get a total of 7GB of plans which have only been used once.
I am a DBA and need to figure out what to tell the Vendor Devs since they are adamant that this is the correct way to implement EF as per MS.
I have searched google and asked a couple of dev friends around the construction of the code in the background but the answer still eludes.
I just did some tests using VS 2017 and EF 6 and had no problem with this, even trying some bad c# codes.
I tried this:
AdventureWorks2016Entities ctx = new AdventureWorks2016Entities();
for (var i = 1; i <= 50; i++)
{
var query = (from x in ctx.bigProduct.AsNoTracking()
where x.ReorderPoint == i
select x);
var result = query.ToList();
}
and this :
for (var i = 1; i <= 50; i++)
{
AdventureWorks2016Entities ctx = new AdventureWorks2016Entities();
var query = (from x in ctx.bigProduct.AsNoTracking()
where x.ReorderPoint == i
select x);
var result = query.ToList();
}
The first code block is re-creating the query in each iteration and the 2nd code block is also re-creating the context in each iteration. None of these options created the plan cache bloat you are suffering.
What's the EF version your application is using ?
However, I found this problem with the decimal type in EF. I just posted a question about this, you can find it here - How to avoid plan cache bloat using queries in entity framework
I need to convert this SQL Query to LINQ Query, also I need to expose the SQL Select properties:
SELECT Problem.ProblemID, ProblemFactory.ObjectiveID, Objective.Name, ProblemFactory.Time, ProblemType.ProblemTypeName, ProblemFactory.OperationID,
ProblemFactory.Range1ID, ProblemFactory.Range2ID, ProblemFactory.Range3ID, ProblemFactory.Range4ID,
ProblemFactory.MissingNumber
FROM Problem INNER JOIN ProblemFactory ON Problem.ProblemFactoryID = ProblemFactory.ProblemFactoryID
INNER JOIN ProblemType ON ProblemFactory.ProblemTypeID = ProblemType.ProblemTypeID
INNER JOIN Objective ON Objective.ObjectiveID = ProblemFactory.ObjectiveID
UPDATE 1:
This is what I have:
var query = from problem in dc.Problem2s
from factory
in dc.ProblemFactories
.Where(v => v.ProblemFactoryID == problem.ProblemFactoryID)
.DefaultIfEmpty()
from ...
And I'm using this example: What is the syntax for an inner join in LINQ to SQL?
Something like this?
var query =
from p in ctx.Problem
join pf in ctx.ProblemFactory on p.ProblemFactoryID equals pf.ProblemFactoryID
join pt in ctx.ProblemType on pf.ProblemTypeID equals pt.ProblemTypeID
join o in ctx.Objective on pf.ObjectiveID equals o.ObjectiveID
select new
{
p.ProblemID,
pf.ObjectiveID,
o.Name,
pf.Time,
pt.ProblemTypeName,
pf.OperationID,
pf.Range1ID,
pf.Range2ID,
pf.Range3ID,
pf.Range4ID,
pf.MissingNumber,
};
But what do you mean by the "SQL Select properties"?
One of the benefits of an ORM like Linq-to-SQL is that we don't have to flatten our data to retrieve it from the database. If you map your objects in the designer (i.e. if you have their relationships mapped), you should be able to retrieve just the Problems and then get their associated properties as required...
var problems = from problem in dc.Problem2s select problem;
foreach (var problem in problems)
{
// you can work with the problem, its objective, and its problem type.
problem.DoThings();
var objective = problem.Objective;
var type = problem.ProblemType;
}
Thus you retain a logical data structure in your data layer, rather than anonymous types that can't easily be passed around.