Nested transaction on google app engine datastore - google-app-engine

If I want all deletes execute all-or-nothing.
If nothing changed. Will the group of deletes be atomic?
If I remove outer transaction, will something change?
If I remove only inner transaction, will group be atomic?
Ig I replace for-cycle with a batch delete and leave only outer transaction?
// inside event plan dao
public void delete(EventPlan eventPlan) {
final Objectify ofy = Objectify.beginTransaction();
try {
final ActivityDAO activityDao = new ActivityDAO();
for (final Activity activity : eventPlan.getActivities()) {
activityDao.delete(activity);
}
ofy.getTxn().commit();
} finally {
if (ofy.getTxn().isActive()) {
ofy.getTxn().rollback();
|
}
}
// inside activity dao
public void delete(Activity activity) {
final Objectify ofy = Objectify.beginTransaction();
try {
// do some logic in here, delete activity and commit txn
} finally {
// check and rollback as normal
}
}

If you use Objectify 3.1 then all transactions are XG-transactions, which can operate on max 5 different entity groups, i.e. if your Activities do not have common parent (= putting them in the same entity group) then you can only delete max five in one transaction.
No, you are using parallel transactions ( one outer, multiple inner).
No, outer transaction has no operation executed, so it does nothing. There are multiple inner transaction (loop) each doing it's own delete.
Yes, you must perform all operations within one transaction for operations to be atomic. If you remove inner transactions you are on the right path. However, Entity Group transaction limit still applies: all entities touched within a transaction must belong to the same Entity Group, or (since XG is enabled by default) to max five different Entity Groups (see above). Note if you don't explicitly put entity in Entity Group (by setting parent) then every entity gets it's own Entity Group.
Yes, batch delete is better then delete in loop (due to efficiency) but all transaction rules in point 3. still apply.

Related

Why RepeatableRead isolation level not working with EF Core when trying to avoid inserting entities with the same key into the database?

I'm trying to avoid multiple row insertion with the same Identifier field value into the database. Here is the code:
public async Task InsertTransaction(Transaction transaction)
{
await using var dbTransaction = await _dbContext.Database.BeginTransactionAsync(IsolationLevel.RepeatableRead);
var existing = await _dbContext.Set<Transaction>()
.AsNoTracking()
.FirstOrDefaultAsync(t => t.Identifier == transaction.Identifier);
if (existing != null) {
return;
}
_dbContext.Set<Transaction>().Add(transaction);
await _dbContext.SaveChangesAsync();
await dbTransaction.CommitAsync();
return;
}
I'm assuming that RepeatableRead isolation level should be enough here, as it should lock for reading queries with search criteria containing Identifier and all requests after first one stepping into transaction will wait for it to finish.
However when running concurrent tests I'm getting multiple rows inserted with the same Identifier and things work properly only after changing transaction isolation level to Serializable.
I'm assuming that RepeatableRead isolation level should be enough here
No. REPEATABLE READ doesn't use key range locks for non-existent data. SERIALIZABLE is the only isolation level that does. Although SERIALIZABLE will take Shared (S) range locks, and so after multiple sessions hold the same S lock, the conflicting inserts will create a deadlock, instead of blocking the second session at the SELECT query.
I'm getting multiple rows inserted with the same Identifier
In addition to an insert-if-not-exists transaction, you should also have a unique index on Identifier to prevent duplicates.
In SQL Server the lock hints to use are updlock,holdlock, which force restrictive update (U) locks, and key range locking. So something like:
public bool Exists<TEntity>(int Id) where TEntity: class
{
var et = this.Model.FindEntityType(typeof(TEntity));
return this.Set<TEntity>().FromSqlRaw($"select * from [{et.GetSchema()??"dbo"}].[{et.GetTableName()}] with(updlock,holdlock) where Id = #Id",new SqlParameter("#Id",Id)).Any();
}
You still need a transaction or the U locks will be released immediately, but the transaction can be at the default READ COMMITTED isolation level.

What does SaveChanges() exactly do in EF6?

I'm trying to understand transactions in entity framework 6.. I searched a lot but I'm still confused..
Take a look at this:
Dim transaction = context.Database.BeginTransaction()
Using transaction
.
.
context.Entry(entity1).State = System.Data.EntityState.Added;
SaveChanges()
.
.
context.Entry(entity2).State = System.Data.EntityState.Added;
SaveChanges()
.
.
context.Entry(entity3).State = System.Data.EntityState.Added;
SaveChanges()
Transaction.Commit()
'Or Transaction.RollBack()
End using
Now what exactly does SaveChanges() Do? and how does it differ from the commit??
Does it begin a new (maybe internal) transaction for each insert and then commit it?
I read https://msdn.microsoft.com/en-us/data/dn456843.aspx.
..that was what I understood from "In all versions of Entity Framework, whenever you execute SaveChanges() to insert, update or delete on the database the framework will wrap that operation in a transaction. This transaction lasts only long enough to execute the operation and then completes. When you execute another such operation a new transaction is started."
My understanding is, all changes to the entities (especially where there are relationships that have cascaded deletes, or, reinsert an item that has been deleted) is to sort the operations so they are carried out in the correct order.
For example, if you have a table with a unique constraint, and you have deleted one entity with a unique value on the column with the constraint and reinserted another entity with the same value, the operations are carried out in the correct order so the underlying dmbs doesn't throw a unique constraint exception. The goes for non auto incremented primary keys and a variety of other things although hopefully you get the gist of it.
The entities are stored in a graph with the relationships as edges so it can sort the graph and perform the operations in the correct order.
This is carried out by the ChangeTracker. I know this from working with / building my own entity tracker using the source code from the awesome IQToolkit.
I also understand that this is carried out in a single transaction, if the underlying dmbs supports it...
Also, in your example, you only need to call
SaveChanges()
Once not after each time you change an entity.
You also don't need to create an explicit transaction and commit it, as SaveChanges does this internally, unless you need to rollback the transaction due to some external factor
EDIT
To explicitly answer your questions in bold:
"Now what exactly does SaveChanges() Do? and how does it differ from the commit??"
It sorts the sql commands generated by each change made to the entities and executes them, in a single transaction, in an order that will not violate any relationship or field constraints setup within the database. As it uses its own transaction, you don't need to wrap the operation in a new transaction and commit it, unless you have a reason to roll the operations back due to some external factor.
It differs from Commit as Commit will commit any changes made during a transaction, while SaveChanges creates it's own transaction around the updates and commits the transaction. What you are doing is nesting the transaction created by SaveChanges in the outer transaction, so you can cancel it if required.
"Does it begin a new (maybe internal) transaction for each insert and then commit it?"
No, it wraps them all and commits in a single, internal transaction.

GAE Datastore Contention Issue

Our GAE app makes a local copy of another website's relational database in the NDB. There are 4 entity types - User, Table, Row, Field. Each user has a bunch of tables, each table has a bunch of rows, each row has a bunch of fields.
SomeUser > SomeTable > ARow > AField
Thus, each User becomes one entity group. I need a feature where I can clear out all the tables (and their rows) for a certain user. What is the right way to delete all the tables and all the rows, while avoiding the contention limit of ~5 operations/second.
The current code is getting TransactionFailedErrors because of contention on the Entity Group.
(detail that I overlooked is that we only want to delete tables with the attribute 'service' set to a certain value)
def delete_tables_for_service(user, service):
tables = Tables.query(Tables.service == service, ancestor=user.key).fetch(keys_only=True)
for table in tables:
keys = []
keys += Fields.query(ancestor=table).fetch(keys_only=True)
keys += TableRows.query(ancestor=table).fetch(keys_only=True)
keys.append(table)
ndb.delete_multi(keys)
If all of the entities you're deleting are in one entity group, try deleting them all in one transaction. Without an explicit transaction, each delete is occurring in its own transaction, and all of the transactions have to line up (via contention and retries) to change the entity group.
Are you sure it's contention-based, or perhaps because the code above is executed within a transaction? A quick fix might be to increase the number of retries and turn on cross-group transactions for this method:
#ndb.transactional(retries=5, xg=True)
You can read more about that here: https://developers.google.com/appengine/docs/python/ndb/transactions. If that's not the culprit, maybe consider deferring or running the deletes asynchronously so they execute over time and in smaller batches. The trick with NDB is to do small bursts of work regularly, versus a large chunk of work infrequently. Here is one way to turn that code into an asynchronous unit of work:
def delete_tables_for_service(user, service):
tables = Tables.query(Tables.service == service, ancestor=user.key).fetch(keys_only=True)
for table in tables:
# Delete fields
fields_keys = Fields.query(ancestor=table).fetch(keys_only=True)
ndb.delete_multi_async(fields_keys)
# Delete table rows
table_rows_keys = TableRows.query(ancestor=table).fetch(keys_only=True)
ndb.delete_multi_async(table_rows_keys)
# Finally delete table itself
ndb.delete_async(table.key)
If you want more control over the deletes, retries, failures, you can either use Task Queues, or simply use the defer library (https://developers.google.com/appengine/articles/deferred):
Turn deferred on in your app.yaml
Change the calls to ndb.delete_multi to deferred:
def delete_tables_for_service(user, service):
tables = Tables.query(Tables.service == service, ancestor=user.key).fetch(keys_only=True)
for table in tables:
keys = []
keys += Fields.query(ancestor=table).fetch(keys_only=True)
keys += TableRows.query(ancestor=table).fetch(keys_only=True)
keys.append(table)
deferred.defer(_deferred_delete_tables_for_keys, keys)
def _deferred_delete_tables_for_keys(keys):
ndb.delete_multi(keys)

do nested objectify transactions remain atomic - or do they work

i have a quick question about objectify - this may be in the actual documentation but i haven't found anything so i'm asking here to be safe.
i have a backend using objectify that I kind of rushed out - what i would like to do is the following - i have an event plan that is made up of activities. currently, if i delete an event i am actually writing in all of the logic to delete the individual activities inside the event plans delete method.
what i'm wondering is, if i call the activitys delete method from the event plans delete method (if it lets me do this), is it atomic?
sample (this is just pseudo code - not actual - case and method names may be wrong):
// inside event plan dao
public void delete(EventPlan eventPlan) {
final Objectify ofy = Objectify.beginTransaction();
try {
final ActivityDAO activityDao = new ActivityDAO();
for (final Activity activity : eventPlan.getActivities()) {
activityDao.delete(activity);
}
ofy.getTxn().commit();
} finally {
if (ofy.getTxn().isActive()) {
ofy.getTxn().rollback();
|
}
}
// inside activity dao
public void delete(Activity activity) {
final Objectify ofy = Objectify.beginTransaction();
try {
// do some logic in here, delete activity and commit txn
} finally {
// check and rollback as normal
}
}
is this safe to do? - as it is right now, the reason it's so mangled is because i didn't realize the entity group issue - there were certain things in the activity that were not in the same entity group as the activity itself - after fixing this i put all of the logic in the event plan delete and the method is becoming unmanageable - is it ok to break stuff down into smaller pieces or will it break atomicity.
thank you
Nested transactions do not happen in a single atomic chunk. There is not really any such thing as a nested transaction - the transactions in your example are all in parallel, with different Objectify (DatastoreService) objects.
Your inner transactions will complete transactionally. Your outer transaction doesn't really do anything. Each inner delete is in its own transaction - it is still perfectly possible for the first Activity to be successfully deleted even though the second Activity is not deleted.
If your goal is to delete a group of entities all-or-nothing style, look into using task queues. You can delete the first Activity and enqueue a task to delete the second transactionally, so you can be guaranteed that either the Activity will be deleted AND the task enqueued, or neither. Then, in the task, you can do the same with the second, etc. Since tasks are retried if they fail, you can control the behavior to be sort of like a transaction. The only thing to beware of are results in other requests including the partially-deleted series during the process.
If he removes the inner transaction will the outer transaction still do nothing?

Are updates within an entity group always visible to reads within the group after commit returns?

I have a question about the examples in this article:
http://code.google.com/appengine/articles/transaction_isolation.html
Suppose I put Adam and Bob in the same entity group and modify the operation getTallPeople to only check the height of Adam and Bob (i.e. access only entities in the entity group). Now, if I execute the following statements:
begin transaction
updatePerson (update Adam's height to 74 inches)
commit transaction
begin transaction
getTallPeople
commit transaction
Can I be sure that getTallPeople will always return both Adam and Bob? I.e. if entity/index updates have not completed, will the second transaction wait until they have? Also, would the behavior be the same without using a transaction for getTallPeople?
Thanks for your help!
Yes. For getTallPeople to be called within a transaction, it must use an "ancestor" filter in its query to limit its results to members of the group. If it does so, both the index data it uses to determine the results and the entities it fetches based on those results will be strongly consistent with the committed results of the previous transaction. This is also true without the explicit transaction if the query uses an ancestor filter and you're using the HR datastore. (The HR datastore has been the default for a while, so you're probably using it.)
If getTallPeople performs a query without an ancestor filter and you're using the HR datastore, it will use the global index data, which is only guaranteed to be eventually consistent across the dataset. In this case, the query might see index data for the group prior to the previous transaction, even though the previous transaction has already committed.

Resources