I currently have an issue in that there are thousands of plans created on the DB for 1 style of EF query.
The query itself is parametrized, but the parameter name keeps changing and hence the text is different, and so results in a fresh new compile for each query hitting my DB server.
So the query looks like this.
(#p__linq__100687 int)SELECT [Extent1].[MyColumn] From MyTable [Extent1]
Where Column1 = #p__linq__100687
and the next one looks like this
(#p__linq__100688 int)SELECT [Extent1].[MyColumn] From MyTable [Extent1]
Where Column1 = #p__linq__100688
What I would like EF to do is
(#p__linq__1 int)SELECT [Extent1].[MyColumn] From MyTable [Extent1]
Where Column1 = #p__linq__1
And then keep reusing the query above instead of incrementing and then being
forced to create a new plan.
So when I trawl through the plan cache on the DB I get a total of 7GB of plans which have only been used once.
I am a DBA and need to figure out what to tell the Vendor Devs since they are adamant that this is the correct way to implement EF as per MS.
I have searched google and asked a couple of dev friends around the construction of the code in the background but the answer still eludes.
I just did some tests using VS 2017 and EF 6 and had no problem with this, even trying some bad c# codes.
I tried this:
AdventureWorks2016Entities ctx = new AdventureWorks2016Entities();
for (var i = 1; i <= 50; i++)
{
var query = (from x in ctx.bigProduct.AsNoTracking()
where x.ReorderPoint == i
select x);
var result = query.ToList();
}
and this :
for (var i = 1; i <= 50; i++)
{
AdventureWorks2016Entities ctx = new AdventureWorks2016Entities();
var query = (from x in ctx.bigProduct.AsNoTracking()
where x.ReorderPoint == i
select x);
var result = query.ToList();
}
The first code block is re-creating the query in each iteration and the 2nd code block is also re-creating the context in each iteration. None of these options created the plan cache bloat you are suffering.
What's the EF version your application is using ?
However, I found this problem with the decimal type in EF. I just posted a question about this, you can find it here - How to avoid plan cache bloat using queries in entity framework
Related
I have a program that downloads data from server database to client database. server database keeps growing recently.
in that program, there is an option to select download all data OR download data for a specific time period (can select backward days from today). if the user selects all, I wrote the program to truncate client database table and insert all data using bulk copy. that part is ok.
but the problem is when user select a specific time period (each recode has created data time ) program has to compare two tables and divide recodes (server data) in two tables. one is, not exist data and the second one is not existing data. and what I'm going to do is,
not existing data directly insert into client DB (i'm using bulk insert) and Existing data inserting into a tempory table using bulkcopy and after update client's table using the above tempory table. My actual problem occurs when dividing server's table. this is how I did it
updateTable = (From c In dt_from_server.AsEnumerable()
Join o In Dt_from_client.AsEnumerable()
On c.Field(Of String)("BARCODE").Trim() Equals o.Field(Of String)("BARCODE").Trim()
And c.Field(Of String)("ITEM_CODE").Trim() Equals o.Field(Of String)("ITEM_CODE").Trim()
Select c).CopyToDataTable()
insertTable = dt_server.AsEnumerable()
.Except(updateTable.AsEnumerable(), DataRowComparer.Default)
.CopyToDataTable()
(normally there is over 1M recodes in the server table )
when there is over 1 Milion recodes, Update part taking acceptable time like 10 minutes (Yes it taking 5GB space from Ram - in this case, it's ok when considering performance )
but insert part seams taking days, just to assing the insertTable(datatable). this is the issue.
AsEnumerable().Except() part taking long time and I couldn't find a solution speedup this process. I'm not sure I explained this correctly. Could anyone can give me some advice for this?
Since you have commented that dt_from_server and dt_server are actually the same DataTable you don't need to compare all values of all DataRows with each other, which is what DataRowComparer.Default does. You can use Except without second parameter for the comparer, then only references are compared which is much faster.
You also don't need two CopyToDataTable which creates two additonal big DataTables in memory, process the rows one after the other.
Here is a different approach using Linq's left-outer join, which is more efficient:
Dim query = from rServ in dt_from_server.AsEnumerable()
group join rClient in Dt_from_client.AsEnumerable()
On New With{
Key .BarCode = rServ.Field(Of String)("BARCODE").Trim(),
Key .ItemCode = rServ.Field(Of String)("ITEM_CODE").Trim()
} Equals New With{
Key .BarCode = rClient.Field(Of String)("BARCODE").Trim(),
Key .ItemCode = rClient.Field(Of String)("ITEM_CODE").Trim()
} into Group
From client In Group.DefaultIfEmpty()
Select new With { .ServerRow = rServ, .InsertRow = client is Nothing }
Dim insertOrUpdateRows = query.ToLookup(Function(x) x.InsertRow, Function(x) x.ServerRow)
Dim insertRows = insertOrUpdateRows(true).CopyToDataTable() 'CopyToDataTable redundant if you process rows immediately now'
Dim updateRows = insertOrUpdateRows(false).CopyToDataTable() 'CopyToDataTable redundant if you process rows immediately now'
But in general the most scalable and efficient approach would be to not load all into memory at once and then process all, but to use database paging(or a stored-procedure) to process only parts of it in memory, otherwise it's likely that you will encounter a OutOfMemoryException sooner or later.
C# as requested:
var query = from rServ in dt_from_server.AsEnumerable()
join rClient in Dt_from_client.AsEnumerable()
on new { BarCode = rServ.Field<string>("BARCODE").Trim(), ItemCode = rServ.Field<string>("ITEM_CODE").Trim() }
equals new { BarCode = rClient.Field<string>("BARCODE").Trim(), ItemCode = rClient.Field<string>("ITEM_CODE").Trim() }
into clientGroup
from client in clientGroup.DefaultIfEmpty()
select new { ServerRow = rServ, InsertRow = client == null };
var insertOrUpdateRows = query.ToLookup(x => x.InsertRow, x => x.ServerRow);
var insertRows = insertOrUpdateRows[true].CopyToDataTable(); // CopyToDataTable redundant if you process rows immediately now
var updateRows = insertOrUpdateRows[false].CopyToDataTable(); // CopyToDataTable redundant if you process rows immediately now
Using SQL Server Management
Using MVC VS 2013 for Web
Being in a Controller
Here materialnumb it's a LINQ query that always return only one value.
Being the following...
var materialnumb = (from r in db.MaterialNumber
where r.MaterialNumber == 80254842
select r.MaterialNumber);
I have another LINQ query from a SQL view that involves several other tables with inner join statements and so on (which includes the previous table db.MaterialNumber) that goes like this:
var query = (from r in db.SQLViewFinalTable
where r.MaterialNumber == Convert.ToInt32(materialnumb.MaterialNumber)
select r
I want to sort all the materials by the retrieved material number from the first query but it drops the following error when I try to pass the query as a model for my View:
LINQ to Entities does not recognize the method 'Int32
ToInt32(System.String)' method, and this method cannot be translated
into a store expression.
I assume this is because the query is an object even if its has just one value so it can't be converted into a single Int32.
Even more, the query it's not being executed, it's just a query...
So, how can achieve my goal?
Additional information: I tried to convert the query outside the "final" query. It still doesn't work.
Additional information: This is just an example, the true query actually has several more other querys embedded and this other querys have also other querys in them, so I need a practical way.
Additional information: I have also tried to convert the query into a string and then again into an int.
Try this:
var materialnumb = (from r in db.MaterialNumber
where r.MaterialNumber == 80254842
select r.MaterialNumber).FirstOrDefault();
var query = from r in db.SQLViewFinalTable
where r.MaterialNumber == materialnumb
select r
But I can not get whay are you filtering by 80254842 and selecting the same value? You can do directly:
var query = from r in db.SQLViewFinalTable
where r.MaterialNumber == 80254842
select r
I'm using dbContext and I am running a SQL query that is rather complex (just showing a simple example below), so to avoid having to run the query twice to get a count, I am using COUNT AS to return the total number of records as per other advice on this site.
But, I haven't been able to figure out how to access the resulting property:
using (var db = new DMSContext())
{
string queryString = "select *, COUNT(1) OVER() AS TotalRecords FROM DMSMetas";
var Metas = db.DMSMetas.SqlQuery(queryString).ToList();
for (int i = 0; i <= Metas.Count - 1; i++)
{
var Item = Metas[i];
if (i == 0)
{
//Want to do this, but TotalRecords not part of the DMSMeta class. How to access the created column?
Console.WriteLine("Total records found: " + Item.TotalRecords);
}
}
}
In the sample above, the SQL query generates the extra field TotalRecords. When I run the query in Management Studio, the results are as expected. But how do I access the TotalRecords field through dbContext?
I also tried including the TotalRecords field as part of the DMSMeta class, but then the SQL query fails with the error that the TotalRecords field is specified twice. I tried creating a partial class for DMSMeta containing the TotalRecords field, but then the value remains the default value and is not updated during the query.
I also tried the following:
db.Entry(Item).Property("TotalRecords").CurrentValue
But that generated an error too. Any help would be much appreciated - I am sure I am missing something obvious! All I want is to figure out a way to access the total number of records returned by the query
you have to create a new class (not an entity class but a pure DAO class) DMSMetaWithCount (self explanatory ?) and then
context.Database.SqlQuery<DMSMetaWithCount>("select *, COUNT(1) OVER() AS TotalRecords FROM DMSMetas");
please note that
imho, select * is ALWAYS a bad practice.
you will have no tracking on the not entity new class
I've been researching a solution to a performance problem with a system I'm responsible for, and I think at least part of the problem is due to database query performance. We use stored procedures to query "pages" of data in a pretty standard way. However, this paging appears to be more costly when the datasets get large.
Given this simple table populated with sample data:
create table Data (
Value uniqueidentifier not null,
constraint PK_Data primary key clustered (Value)
)
insert into Data
-- SeedTable has ~2M rows
select newid() from SeedTable
And this stored procedure to return paged data: (this requires Sql2012 apparently, though the Sql2008 style of using ROW_NUMBER() behaves the same):
create proc
GetDataPage #Offset int, #Count int
as
select Value
from Data
order by Value
offset #Offset rows
fetch next #Count rows only
I then test the performance of this sproc with this C# code:
const int PageSize = 50;
const int MaxCount = 50000;
using (var conn = new SqlConnection("Data Source=.;Initial Catalog=TestDB;Integrated Security=true;")) {
conn.Open();
int a = 0;
for (int i = 0; ; i += PageSize) {
using (var cmd = conn.CreateCommand()) {
cmd.CommandType = System.Data.CommandType.StoredProcedure;
cmd.CommandText = "GetDataPage";
var oid = cmd.CreateParameter();
var offset = cmd.CreateParameter();
offset.Value = i;
offset.ParameterName = "Offset";
cmd.Parameters.Add(offset);
var count = cmd.CreateParameter();
count.Value = PageSize;
count.ParameterName = "Count";
cmd.Parameters.Add(count);
var sw = Stopwatch.StartNew();
int c = 0;
using(var reader = cmd.ExecuteReader()) {
while (reader.Read()) {
c++;
}
}
a += c;
sw.Stop();
Console.WriteLine(sw.ElapsedTicks + "\t" + a);
if (c < PageSize || a >= MaxCount)
break;
}
}
}
When I chart the output of this code I get the following:
I would have expected that paging like this in SQL would have constant time performance, or perhaps logarithmic at worst, but it is pretty clear from the chart that performance is linear.
Are there any special tricks (hints) to make this work better?
Is there another approach to this that might be faster?
Do other databases behave the same way?
Changing the experimental code to use the "page from" technique, that Kevin Suchlicki suggests, results in the following:
Very impressive. This performance looks more like what I would expect/want. Now I just need to figure out if I can apply this to my real problem. The potential issue being that it doesn't allow for "random access" of the data, but a forward only cursor-like access. I'm aware that it must look like what I'm doing violates every notion of good database design.
The most obvious possibility is in the app design itself. Offer your users filter criteria. Users usually have some idea what they are looking for and would rather not page thru 1000 pages of returned results. How often do you pass page 10 on a google search?
Having said that, you could try storing the id (clustered index value) of the last row returned on the previous page and use that in your SQL where clause. If you need to allow sorting on different keys (e.g. last name), then store the clustered index id value and the final last name of the previous page. Then write your SQL like this (you always need to order on your key field and clustered id value in order to deterministically order the records in the case of duplicate key values):
select top (#count) Id, LastName, FirstName
from Data
where LastName >= #previousLastName and Id > #previousId
order by LastName, Id
You would also want to index all the fields that could be sort keys. Not sure how the above would perform but I would expect the search on indexed fields would perform O(log n).
Another option might be to persist the full list, in order, with row value, every time the source data changes, behind the scenes, and have the app pull from the persisted table.
Good question... Let us know how it turns out please!
I am trying to use Dapper support my data access for my server app.
My server app has another application that drops records into my database at a rate of 400 per minute.
My app pulls them out in batches, processes them, and then deletes them from the database.
Since data continues to flow into the database while I am processing, I don't have a good way to say delete from myTable where allProcessed = true.
However, I do know the PK value of the rows to delete. So I want to do a delete from myTable where Id in #listToDelete
Problem is that if my server goes down for even 6 mintues, then I have over 2100 rows to delete.
Since Dapper takes my #listToDelete and turns each one into a parameter, my call to delete fails. (Causing my data purging to get even further behind.)
What is the best way to deal with this in Dapper?
NOTES:
I have looked at Tabled Valued Parameters but from what I can see, they are not very performant. This piece of my architecture is the bottle neck of my system and I need to be very very fast.
One option is to create a temp table on the server and then use the bulk load facility to upload all the IDs into that table at once. Then use a join, EXISTS or IN clause to delete only the records that you uploaded into your temp table.
Bulk loads are a well-optimized path in SQL Server and it should be very fast.
For example:
Execute the statement CREATE TABLE #RowsToDelete(ID INT PRIMARY KEY)
Use a bulk load to insert keys into #RowsToDelete
Execute DELETE FROM myTable where Id IN (SELECT ID FROM #RowsToDelete)
Execute DROP TABLE #RowsToDelte (the table will also be automatically dropped if you close the session)
(Assuming Dapper) code example:
conn.Open();
var columnName = "ID";
conn.Execute(string.Format("CREATE TABLE #{0}s({0} INT PRIMARY KEY)", columnName));
using (var bulkCopy = new SqlBulkCopy(conn))
{
bulkCopy.BatchSize = ids.Count;
bulkCopy.DestinationTableName = string.Format("#{0}s", columnName);
var table = new DataTable();
table.Columns.Add(columnName, typeof (int));
bulkCopy.ColumnMappings.Add(columnName, columnName);
foreach (var id in ids)
{
table.Rows.Add(id);
}
bulkCopy.WriteToServer(table);
}
//or do other things with your table instead of deleting here
conn.Execute(string.Format(#"DELETE FROM myTable where Id IN
(SELECT {0} FROM #{0}s", columnName));
conn.Execute(string.Format("DROP TABLE #{0}s", columnName));
To get this code working, I went dark side.
Since Dapper makes my list into parameters. And SQL Server can't handle a lot of parameters. (I have never needed even double digit parameters before). I had to go with Dynamic SQL.
So here was my solution:
string listOfIdsJoined = "("+String.Join(",", listOfIds.ToArray())+")";
connection.Execute("delete from myTable where Id in " + listOfIdsJoined);
Before everyone grabs the their torches and pitchforks, let me explain.
This code runs on a server whose only input is a data feed from a Mainframe system.
The list I am dynamically creating is a list of longs/bigints.
The longs/bigints are from an Identity column.
I know constructing dynamic SQL is bad juju, but in this case, I just can't see how it leads to a security risk.
Dapper request the List of object having parameter as a property so in above case a list of object having Id as property will work.
connection.Execute("delete from myTable where Id in (#Id)", listOfIds.AsEnumerable().Select(i=> new { Id = i }).ToList());
This will work.