We have this web service with the following algorithm (the input is an id and a name)
public bool SetCustomer(int externalId,string name)
{
using (var db = new BusinessEntities())
{
var c = db.Customer.FirstOrDefault(c => c.externalId== externalId) ?? new Customer(){externalId = externalId};
c.name = name;
db.SaveChanges();
}
}
The problem is, if someone calls at the same moment this web service with the same id, then 2 customers are created.
We cannot add a unique constraint, because some other process in the app might create customer with the same externalId.
We thought about a solution with a dictionary of object that would be used as lock argument, but it doesn't seems right.
We don't want to lock the full table customer because we have a lot of parallel calls, so this will cause timeout.
Can you use an intermediate table to log insertion jobs.
Then do the jobs (creating the customer) with a mono threaded task.
This may require an asynchronous solution if your web method has to return a result.
Or even something more elaborated like Asynchronous Triggers from Sql Service Broker.
I found something :
http://michaeljswart.com/2011/09/mythbusting-concurrent-updateinsert-solutions/
it's not with entity framework but I don't think EF provide access to this kind of features.
thanks
Related
I am working with a SQL Server table that contains 80 million (80,000,000) rows. Data space = 198,000 MB. Not surprisingly, queries against this table often churn or timeout. To add to the issues, the table rows get updated fairly frequently and new rows also get added on a regular basis. It thus continues to grow like a viral outbreak.
My issue is that I would like to write Entity Framework 5 LINQ to Entities queries to grab rows from this monster table. As I've tried, timeouts have become outright epidemic. A few more things: the table's primary key is indexed and it has non-clustered indexes on 4 of its 19 columns.
So far, I am writing simple LINQ queries that use Transaction Scope and Read Uncommitted Isolation Level. I have tried increasing both the command timeout and the connection timeout. I have written queries that return FirstOrDefault() or a collection, such as the following, which attempts to grab a single ID (an int) from seven days before the current date:
public int GetIDForSevenDaysAgo(DateTime sevenDaysAgo)
{
using (var txn = new TransactionScope(TransactionScopeOption.Required, new TransactionOptions { IsolationLevel = IsolationLevel.ReadUncommitted }))
{
var GetId = from te in _repo.GetTEvents()
where te.cr_date > sevenDaysAgo
orderby te.cr_date
select te.id;
return GetId.FirstOrDefault();
}
}
and
public IEnumerable<int> GetIDForSevenDaysAgo(DateTime sevenDaysAgo)
{
using (var txn = new TransactionScope(TransactionScopeOption.Required, new TransactionOptions { IsolationLevel = IsolationLevel.ReadUncommitted }))
{
var GetId = from te in _repo.GetTEvents()
where te.cr_date > sevenDaysAgo
orderby te.cr_date
select te.id;
return GetId.Take(1);
}
}
Each query times out repeatedly regardless of the timeout settings. I'm using the repository pattern with Unity DI and fetching the table with IQueryable<> calls. I'm also limiting the repository call to eight days from the current date (hoping to only grab the needed subset of this mammoth table). I'm using Visual Studio 2013 with Update 5 targeting .NET v4.5 and SQL Server 2008 R2.
I generated the SQL statement that EF generates and it didn't look incredibly more complicated than the LINQ statements above. And my brain hurts.
So, have I reached some sort of tolerance limit for EF? Is the table simply too big? Should I revert to Stored Procedures/domain methods when querying this table? Are there other options I should explore? There was some discussion around removing some of the table's rows, but that probably won't happen anytime soon. I did read a little about paging, but I'm not sure if that would help or not. Any thoughts or ideas would be appreciated! Thank you!
As I can see you only selecting data and don't change it. So why do you need to use TransactionScope? You need it only when you have 2 or more SaveChanges() in your code and you want them to be in one transaction. So get rid of it.
Another thing that i whould use in your case is disable change tracking and auto detection of changes on your context. But be carefull if you don't rectreade your context on each request. It can presist old data.
To do it you should write this lines near your context initialization:
context.ObjectTrackingEnabled = false;
context.DeferredLoadingEnabled = false;
The other thing that you should think about is pagenation and Cache. But as i can see in your example you trying to get only one row. So can't say anything particular.
I recommend you to read this article to further optimisation.
It's not easy to say if you have to go with stored procedures or EF since we speak for a monster. :-)
The first thing I would do is to run the query in SSMS displaying the Actual Execution Plan. Sometimes it provides information about indexes missing that might increase performance.
From you example, I 'm pretty sure you need an index on that date column.
In other words, -if you have access- be sure that table design is optimal for that amount of data.
My thought is that if a simple query hangs, what more EF can do?
I am using EclipseLink-2.6.1 with Amazon RDS database instance. The following code is used to insert new entities into database:
tx = em.getTransaction();
tx.begin();
for (T item : persistCollection) {
em.merge(item);
}
tx.commit();
The object which is being persisted has composite primary key (not a generated one). Locally, queries run super fast, but when inserting into remote DB it is really slow process (~20 times slower). I have tried to implement JDBC batch writing but had no success with it (eclipselink.jdbc.batch-writing and rewriteBatchedStatements=true). When logging queries which are being executed I only see lots of SELECTS and not one INSERT (SELECTS are probably here because objects are detached at first).
My question is how to proceed on this problem? (I would like to have batch writing and then see how the performance changes, but any help is appreciated)
Thank you!
Edit:
When using em.persist(item) loop is finished almost instantly but after tx.commit() there are lots (I guess for every persisted item) queries like :
[EL Fine]: sql: ServerSession(925803196) Connection(60187547) SELECT NAME FROM TICKER WHERE (NAME = ?), bind => [AA]
My model has #ManyToOne relationship with ticker_name. Why are there again so many slow SELECT queries?
For a table that has an identity:
[AutoIncrement]
public int Id { get; set;}
When inserting a new row into the database, what is the best way to retrieve the Id of the object?
For example:
db.Insert<> (new User());
The value of the Id is 0 after the insert, but in the database this obviously is not the case. The only possibility I can see is the following:
Id = (int)db.GetLastInsertId();
However I don't believe this would be a safe call to make. If there are 100's of inserts happening at the same time, an Id for another insert may be returned. In EF when you do an insert the Id is set for you.
Does anyone know the best way to go about this?
In ServiceStack.OrmLite v4 which defaults to using parameterized queries there are a couple of options in db.Save() which automatically populates the AutoIncrement Id, e.g:
db.Save(item);
item.Id //populated with the auto-incremented id
Otherwise you can select the last insert id using:
var itemId = db.Insert(item, selectIdentity:true);
Here are more examples showcasing OrmLite's new API's.
For OrmLite v3
The correct call is db.GetLastInsertId() which for SQL Server under the hood for example calls SELECT SCOPE_IDENTITY() which returns the last inserted id for that connection.
This is safe because all the other concurrent inserts that might be happening are using different DB connections. In order for anyone else to use the same connection it needs to be disposed of and released back into the pool.
You should definitely using the Unit of Work pattern, particularly in this scenarios, you wrap the db related codes in a transaction scope.
In ormLite, you can implement this via IDbCommand and IDbTransaction (see example here http://code.google.com/p/servicestack/source/browse/trunk/Common/ServiceStack.OrmLite/ServiceStack.OrmLite.Tests/ShippersExample.cs)
Looking at the code, you'll notice it's going to be less magical and more manual coding, but it's one way.
Update: As seen here, if you are using ServiceStack/ORMLite v4, you need to utilize the parameterized query to get the inserted ID. For example:
var UserId = db.Insert<User>(new User(), selectIdentity: true);
I have a multi-tenant database that returns vastly different numbers of rows depending on which tenant is being queried. Lately we are experiencing a parameter sniffing problem where Entity Framework (EF) queries executed against one tenant (TenantID = 1) take much longer than the same query against another tenant (TenantID = 2). I've done some research and determined that EF doesn't support query hints (see this question) that would allow me to force the query to recompile each time. Now I'm wondering if I can intercept the Sql query that is generated by EF and manually append "OPTION (OPTIMIZE FOR UNKNOWN)" before it is executed. Is this possible? Is EF pluggable such that I can modify the generated Sql before it is executed? Are there any examples of how to do this?
Have you tried working around the problem? You may be able to force EF to not use a param, e.g.:
var q = Context.Tenants.Where(t => t.TenantID == tenantId);
...will use a param, but:
var r = Context.Tenants.Where(t => t.TenantID == 1);
...won't, and I'll bet that:
var s = Context.Tenants.Where("it.TenantID = 1");
...won't, either.
We use NHibernate for ORM, and at the initialization phase of our program we need to load many instances of some class T from the DB.
In our application, the following code, which extracts all these instances, takes forever:
public IList<T> GetAllWithoutTransaction()
{
using (ISession session = GetSession())
{
IList<T> entities = session
.CreateCriteria(typeof(T))
.List<T>();
return entities;
}
}
}
Using the NHibernate log I found that the actual SQL queries the framework uses are:
{
Load a bunch of rows from a few tables in the DB (one SELECT statement).
for each instance of class T
{
Load all the data for this instance of class T from the abovementioned rows
(3 SELECT statements).
}
}
The 3 select statements are coupled, i.e. the second is dependent on the first, and the third on the first two.
As you can see, the number of SELECT statements is in the millions, giving us a huge overhead which results directly from all those calls to the DB (each of which entails "open DB session", "close DB session", ...), even though we are using a single NHibernate session.
My question is this:
I would somehow like to consolidate all those SELECT statements into one big SELECT statement. We are not running multithreaded, and are not altering the DB in any way at the init phase.
One way of doing this would be to define my own object and map it using NHibernate, which will be loaded quickly and load everything in one query, but it would require that we implement the join operations used in those statements ourselves, and worse - breaks the ORM abstraction.
Is there any way of doing this by some configuration?
Thanks guys!
This is known as the SELECT N+1 problem. You need to decide where you're going to place your joins (FetchMode.Eager)
If you can write the query as a single query in SQL, you can get NHibernate to execute it as a single query (usually without breaking the abstraction).
It appears likely that you have some relationships / classes setup to lazy load when what you really want in this scenario is eager loading.
There is plenty of good information about this in the NHibernate docs. You may want to start here:
http://www.nhforge.org/doc/nh/en/index.html#performance-fetching