Bulk inserts with LINQ to SQL allowing some failures - sql-server

So, we have a service that takes list of updates via a WCF service, which should then be written to the DB. some of these could potentially fail (1000 updates/inserts per query) and if they do, the requirement is for the rest of the inserts/updates to be written, and the fails to be logged for processing later...
the question is how should this be done in LINQ to SQL. at the moment, i am doing something like follows:
TestDataContext dc = new TestDataContext();
Random r = new Random();
for (int i = 0; i < 300; i++)
{
Table table = new Table { randomNumber = r.Next(150) };
dc.Tables.InsertOnSubmit(table);
}
try
{
dc.SubmitChanges();
}
catch (Exception ex)
{
Console.WriteLine(ex.Message);
}
in this example, Table.randomNumber is primary key. there should be duplicates in the dataset before getting written, but when i call dc.SubmitChanges() it throws an exception... instead of calling dc.SubmitChanges() after each iteration, is there a better way of doing this?

I ended up submitting each row as they were written, and catching any errors. this way, i can catch the fails, but insert the rest correctly. Its not as slow as i though it would be, but not ideal... mind you, since this is a process that is done "offline" (backend processing) it works for what we need.

Related

Suspending Azure function until Entity Framework makes changes

I have an Azure function (Iot hub trigger) that:
selects a top 1 record ordered by time in descending order
compares with a new record that comes
writes the coming record only if it differs from the selected one (some fields are different)
The issue pops up when records come into the azure function very rapidly - I end up with duplicates in the database. I guess this is because SQL Server doesn't have enough time to make changes in the database by the time the next record comes and Azure function selects, and when the Azure function selects the latest record, it actually receives an outdated one.
I use EF Core.
I do believe that there is no issue with function but with the transactional nature of the operation you described. To solve your issue trivially, you can try using transaction with the highest isolation level:
using (var transaction = new TransactionScope(
TransactionScopeOption.Required,
new TransactionOptions
{
// With this isolation level all data modifications are sequential
IsolationLevel = IsolationLevel.Serializable
}))
{
using (var connection = new SqlConnection("YOUR CONNECTION"))
{
connection.Open();
try
{
// Run raw ADO.NET command in the transaction
var command = connection.CreateCommand();
// Your reading query (just for example sake)
command.CommandText = "SELECT TOP 1 FROM dbo.Whatever";
var result = command.ExecuteScalar();
// Run an EF Core command in the transaction
var options = new DbContextOptionsBuilder<TestContext>()
.UseSqlServer(connection)
.Options;
using (var context = new TestContext(options))
{
context.Items.Add(result);
context.SaveChanges();
}
// Commit transaction if all commands succeed, transaction will auto-rollback
// when disposed if either commands fails
transaction.Complete();
}
catch (System.Exception)
{
// TODO: Handle failure
}
}
}
You should adjust the code for your need, but you have an idea.
Although, I would rather avoid the problem entirely and not modify any records, but rather insert them and select the latests afterwards. Transactions are tricky in application, they may cause performance degradation and deadlocks being applied in the wrong place and in the wrong way.

Entity Framework insert (of one object) slow when table has large number of records

I have a large asp.net mvc application that runs on a database that is rapidly growing in size. When the database is empty, everything works quickly, but one of my tables now has 350K records in it and an insert is now taking 15s. Here is a snippet:
foreach (var packageSheet in packageSheets)
{
// Create OrderSheets
var orderSheet = new OrderSheet { Sheet = packageSheet.Sheet };
// Add Default Options
orderSheet.AddDefaultOptions();
orderSheet.OrderPrints.Add(
new OrderPrint
{
OrderPose = CurrentOrderSubject.OrderPoses.Where(op => op.Id == orderPoseId).Single(),
PrintId = packageSheet.Sheet.Prints.First().Id
});
// Create OrderPackageSheets and add it to the order package held in the session
var orderPackageSheet =
new OrderPackageSheet
{
OrderSheet = orderSheet,
PackageSheet = packageSheet
};
_orderPackageRepository.SaveChanges();
...
}
When I SaveChanges at this point it takes 15s the on the first loop; each iteration after is fast. I have indexed the tables in question so I believe the database is tuned properly. It's the OrderPackageSheets table that contains 350K rows.
Can anyone tell me how I can optimize this to get rid of the delay?
Thank you!
EF can be slow if you are inserting a lot of rows at same time.
context.Configuration.AutoDetectChangesEnabled = false; wont do too much for you if this is really web app
You need to share your table definition and for instance you can use Simple recovery model which will improve insert performances.
Or, as mentioned, if you need to insert a lot of rows use bulk inserts
If the number of records is too high ,You can use stored procedure instead of EF.
If you need to use EF itself ,Disable auto updating of the context using
context.Configuration.AutoDetectChangesEnabled = false;
and save the context after the loop
Check these links
Efficient way to do bulk insert/update with Entity Framework
http://weblog.west-wind.com/posts/2013/Dec/22/Entity-Framework-and-slow-bulk-INSERTs

How to handle unique constraint exception to update row after failing to insert?

I am trying to handle near-simultaneous input to my Entity Framework application. Members (users) can rate things, so I have a table for their ratings, where one column is the member's ID, one is the ID of the thing they're rating, one is the rating, and another is the time they rated it. The most recent rating is supposed to override the earlier ratings. When I receive input, I check to see if the member has already rated a thing or not, and if they have, I just update the rating using the existing row, or if they haven't, I add a new row. I noticed that when input comes in from the same user for the same item at nearly the same time, that I end up with two ratings for that user for the same thing.
Earlier I asked this question: How can I avoid duplicate rows from near-simultaneous SQL adds? and I followed the suggestion to add a SQL constraint requiring unique combinations of MemberID and ThingID, which makes sense, but I am having trouble getting this technique to work, probably because I don't know the syntax for doing what I want to do when an exception occurs. The exception comes up saying the constraint was violated, and what I would like to do then is forget the attemptd illegal addition of a row with the same MemberID and ThingID, and instead fetch the existing one and simply set the values to this slightly more recent data. However I have not been able to come up with a syntax that will do that. I have tried a few things and always I get an exception when I try to SaveChanges after getting the exception - either the unique constraint is still coming up, or I get a deadlock exception.
The latest version I tried was like this:
// Get the member's rating for the thing, or create it.
Member_Thing_Rating memPref = (from mip in _myEntities.Member_Thing_Rating
where mip.thingID == thingId
where mip.MemberID == memberId
select mip).FirstOrDefault();
bool RetryGet = false;
if (memPref == null)
{
using (TransactionScope txScope = new TransactionScope())
{
try
{
memPref = new Member_Thing_Rating();
memPref.MemberID = memberId;
memPref.thingID = thingId;
memPref.EffectiveDate = DateTime.Now;
_myEntities.Member_Thing_Rating.AddObject(memPref);
_myEntities.SaveChanges();
}
catch (Exception ex)
{
Thread.Sleep(750);
RetryGet = true;
}
}
if (RetryGet == true)
{
Member_Thing_Rating memPref = (from mip in _myEntities.Member_Thing_Rating
where mip.thingID == thingId
where mip.MemberID == memberId
select mip).FirstOrDefault();
}
}
After writing the above, I also tried wrapping the logic in a function call, because it seems like Entity Framework cleans up database transactions when leaving scope from where changes were submitted. So instead of using TransactionScope and managing the exception at the same level as above, I wrapped the whole thing inside a managing function, like this:
bool Succeeded = false;
while (Succeeded == false)
{
Thread.Sleep(750);
Exception Problem = AttemptToSaveMemberIngredientPreference(memberId, ingredientId, rating);
if (Problem == null)
Succeeded = true;
else
{
Exception BaseEx = Problem.GetBaseException();
}
}
But this only results in an unending string of exceptions on the unique constraint, being handled forever at the higher-level function. I have a 3/4 second delay between attempts, so I am surprised that there can be a reported conflict yet still there is nothing found when I query for a row. I suppose that indicates that all of the threads are failing because they are running at the same time and Entity Framework notices them all and fails them all before any succeed. So I suppose there should be a way to respond to the exception by looking at all the submissions and adjusting them? I don't know or see the syntax for that. So again, what is the way to handle this?
Update:
Paddy makes three good suggestions below. I expect his Stored Procedure technique would work around the problem, but I am still interested in the answer to the question. That is, surely one should be able to respond to this exception by manipulating the submission, but I haven't yet found the syntax to get it to insert one row and use the latest value.
To quote Eric Lippert, "if it hurts, stop doing it". If you are anticipating getting very high volumnes and you want to do an 'insert or update', then you may want to consider handling this within a stored procedure instead of using the methods outlined above.
Your problem is coming because there is a small gap between your call to the DB to check for existence and your insert/update.
The sproc could use a MERGE to do the insert or update in a single pass on the table, guaranteeing that you will only see a single row for a rating and that it will be the most recent update you receive.
Note - you can include the sproc in your EF model and call it using similar EF syntax.
Note 2 - Looking at your code, you don't rollback the transaction scope prior to sleeping your thread in the case of exception. This is a relatively long time to be holding a transaction open, particularly when you are expecting very high volumes. You may want to update your code something like this:
try
{
memPref = new Member_Thing_Rating();
memPref.MemberID = memberId;
memPref.thingID = thingId;
memPref.EffectiveDate = DateTime.Now;
_myEntities.Member_Thing_Rating.AddObject(memPref);
_myEntities.SaveChanges();
txScope.Complete();
}
catch (Exception ex)
{
txScope.Dispose();
Thread.Sleep(750);
RetryGet = true;
}
This may be why you seem to be suffering from deadlocks when you retry, particularly if you are getting rapid concurrent requests.

Custom constraint in EF fails, async issue

I have a controller action like this (ASP.NET web api)
public HttpResponseMessage<Component> Post(Component c)
{
//Don't allow equal setup ids in within the same installation when the component is of type 5
if (db.Components.Any(d => d.InstallationId == c.InstallationId && d.SetupId == c.SetupId && d.ComponentTypeId == 5)) return new HttpResponseMessage<Component>(c, HttpStatusCode.Conflict);
db.Components.Add(c);
db.SaveChanges();
return new HttpResponseMessage<Component>(c, HttpStatusCode.OK);
}
I send a number of posts request from javascript with 2 of them being equal
{SetupId : 7, InstallationId : 1, ComponentTypeId: 5}
I have verified this both using fiddler and stepping through the code on the server.
However sometimes the constraint that I do above is working as it should and other times it is not. I guess since Post is an async action the request #2 sometimes checks the database for duplicates BEFORE the first request have managed to save to the database.
How can I solve this? Is there a way to lock EF operations to the database from beginning of the post action until the end? Is that even a good idea?
I have though of database restraints, however, since this is only when componenttype is 5 then I'm not sure how to implement that. Or if it's even possible.
This is quite difficult to achieve with EF. In normal SQL you would start transaction and add some table hint to your constraint query to force locking records. The problem is EF doesn't support table hints. You cannot force linq or ESQL query to lock record.
Your options are:
Manual locking in your method. Using for example lock will dramatically reduce throughput of your method so you will most probably need some custom clever implementation locking per those Ids
Using custom SQL or stored procedure instead of LINQ query and force locking. I think UPDLOCK with HOLDLOCK hints should probably work in this case.
Alternatively you can place unique index on your InstallationId, SetupId and ComponentTypeId and simply catch exception if the concurrent request tryes to insert duplicate record. The problem is if that combination must be unique only for some cases but not for other.
I solved this in the database with the help of this answer: https://stackoverflow.com/a/5149263/94394
A conditional constraint allowed in SQL server 2008.
create unique nonclustered index [funcix_Components_setupid_installationid_RecordStatus]
on [components]([setupid], [Installationid])
where [componenttypeid] = 5
Then I catched DbUpdateException and checked if i got a constraint exception error code
try
{
db.Components.Add(c);
db.SaveChanges();
}
catch (DbUpdateException ex) {
Exception innermostException = ex;
while (innermostException.InnerException != null)//Get innermost exception
{
innermostException = innermostException.InnerException;
}
if (((System.Data.SqlClient.SqlException)innermostException).Number == 2601)//Constraint exception id
{
return new HttpResponseMessage<Component>(c, HttpStatusCode.Conflict);
}
}
return new HttpResponseMessage<Component>(c, HttpStatusCode.OK);

How to solve "Batch update returned unexpected row count from update; actual row count: 0; expected: 1" problem?

Getting this every time I attempt to CREATE a particular entity ... just want to know how I should go about figuring out the cause.
I'm using Fluent NHibernate auto-mapping so perhaps I haven't set a convention appropriately and/or need to override somethings in one or more mapping files. I've gone thru a number of posts on the web regarding this problem and having a hard time figuring out exactly why it is happening in my case.
The object I'm saving is pretty simple. It is a "Person" object that references a "Company" entity and has a collection of "Address" entities. UPDATES work fine on existing Person objects that are already in the database.
Any suggestions?
The error means that the SQL INSERT statement is being executed, but the ROWCOUNT being returned by SQL Server after it runs is 0, not 1 as expected.
There are several causes, from incorrect mappings, to UPDATE/INSERT triggers that have rowcount turned off.
Your best beat is to profile the SQL statements and see what happens. To do that either turn on nHibernate sql logging, or use the sql profiler. Once you have the SQL you may know the cause, if not try running the SQL manually and see what happens.
Also I suggest you to post your mapping, as it will help people spot any issues.
This can happen when trigger(s) execute additional DML (data modification) queries which affect the row counts. My solution was to add the following at the top of my trigger:
SET NOCOUNT ON;
This may occur because of Auto increment primary key. To solve this problem do not inset auto increment value with data set. Insert data without the primary key.
When targeting a view with an INSTEAD OF trigger it can be next to impossible to get the correct row count. After delving a bit into the source I found out that you can make a custom persister which makes NHibernate ignore the count checks.
public class SingleTableNoResultCheckEntityPersister : SingleTableEntityPersister
{
public SingleTableNoResultCheckEntityPersister(PersistentClass persistentClass, ICacheConcurrencyStrategy cache, ISessionFactoryImplementor factory, IMapping mapping)
: base(persistentClass, cache, factory, mapping)
{
for (int i = 0; i < this.insertResultCheckStyles.Length; i++)
{
this.insertResultCheckStyles[i] = ExecuteUpdateResultCheckStyle.None;
}
for (int i = 0; i < this.updateResultCheckStyles.Length; i++)
{
this.updateResultCheckStyles[i] = ExecuteUpdateResultCheckStyle.None;
}
for (int i = 0; i < this.deleteResultCheckStyles.Length; i++)
{
this.deleteResultCheckStyles[i] = ExecuteUpdateResultCheckStyle.None;
}
}
}
if you get row count greater than 1 its usually because you have duplicate rows in a table that have same ids. Check your table for duplicated records. Deleting duplicates will fix it.
NHibernate.AdoNet.TooManyRowsAffectedException: 'Batch update returned
unexpected row count from update; actual row count: 2; expected: 1'

Resources