app engine datastore transaction exception - google-app-engine

In app engine transactions documentation I have found the following note:
Note: If your app receives an exception when submitting a transaction,
it does not always mean that the transaction failed. You can receive
Timeout, TransactionFailedError, or InternalError exceptions in cases
where transactions have been committed and eventually will be applied
successfully. Whenever possible, make your Datastore transactions
idempotent so that if you repeat a transaction, the end result will be
the same.
This is quite general information and I wasn't able to find more details. I have the following questions regarding this issue:
Does it affect NDB transations? NDB documentation doesn't
mention it, but I suppose that this behavior is inherited
What can cause this type of situation?
How often can it happen?
Can I prevent it, or decrease probability?
Are transactional tasks enqueued in this situation?
Is this situation a bug, which will be fixed in the future, or a feature, which I should just get used to?

Yes, it affects ndb too.
Potential causes include network partitions where the datastore server commits successfully but cannot communicate the result to the app.
It is rare, but cannot be prevented, and will never be fixed. It is inherent to all distributed systems.
Task queue adds are committed with the transaction by the datastore server.

Related

How do 2PC prevent commit failure? [duplicate]

I understand, in a fuzzy sort of way, how regular ACID transactions work. You perform some work on a database in such a way that the work is not confirmed until some kind of commit flag is set. The commit part is based on some underlying assumption (like a single disk block write is atomic). In the event of a catastrophic error, you can just clear out the uncommitted data in the recovery phase.
How do distributed transactions work? In some of the MS documentation I have read that you can somehow perform a transaction across databases and filesystems (among other things).
This technology could be (and probably is) used for installers, where you want the program to be fully installed or fully absent. You simply begin a transaction at the start of the installer. Next you could connect to the registry and filesystem, making the changes that define the installation. When the job is done, simply commit, or rollback if the installation fails for some reason. The registry and filesystem are automatically cleaned for you by this magical distributed transaction coordinator.
How is it possible that two disparate systems can be transacted upon in this fashion? It seems to me that it is always possible to leave the system in an inconsistent state, where the filesystem has committed its changes and the registry has not. I think in MSDTC it is even possible to perform a transaction across the network.
I have read http://blogs.msdn.com/florinlazar/archive/2004/03/04/84199.aspx, but it feels like only the beginning of the explanation, and that step 4 should be expanded considerably.
Edit: From what I gather on http://en.wikipedia.org/wiki/Distributed_transaction, it can be accomplished by a two-phase commit (http://en.wikipedia.org/wiki/Two-phase_commit). After reading this, I'm still not understanding the method 100%, it seems like there is a lot of room for error between the steps.
About "step 4":
The transaction manager coordinates
with the resource managers to ensure
that all succeed to do the requested
work or none of the work if done, thus
maintaining the ACID properties.
This of course requires all participants to provide the proper interfaces and (error-free) implementations. The interface looks like vaguely this:
public interface ITransactionParticipant {
bool WouldCommitWork();
void Commit();
void Rollback();
}
The Transaction manager at commit-time queries all participants whether they are willing to commit the transaction. The participants may only assert this if they are able to commit this transaction under all allowable error conditions (validation, system errors, etc). After all participants have asserted the ability to commit the transaction, the manager sends the Commit() message to all participants. If any participant instead raises an error or times out, the whole transaction aborts and individual members are rolled back.
This protocol requires participants to have recorded their whole transaction content before asserting their ability to commit. Of course this has to be in a special local transaction log structure to be able to recover from various kinds of failures.

GAE: How to rollback a transaction?

I just read this great summary of GAE best practices: https://cloud.google.com/datastore/docs/best-practices
One of them is:
If a transaction fails, ensure you try to rollback the transaction.
The rollback minimizes retry latency for a different request
contending for the same resource(s) in a transaction. Note that a
rollback itself may fail, so the rollback should be a best-effort
attempt only.
I thought that transaction rollback was something that GAE did for you, but the above quote says that you should do it yourself.
The documentation here also says you should do a rollback but does not say how.
So, how do I rollback a transaction in GAE Python?
The best practices document is for using the Cloud Datastore directly through its API or client libraries.
This is only necessary in the flexible Appengine environment. Even in this case, the Cloud Datastore client library provides a context manager to automatically handle rollbacks - this example code is from the docs
def transfer_funds(client, from_key, to_key, amount):
with client.transaction():
from_account = client.get(from_key)
to_account = client.get(to_key)
from_account['balance'] -= amount
to_account['balance'] += amount
client.put_multi([from_account, to_account])
The docs state:
By default, the transaction is rolled back if the transaction block exits with an error
Be aware that the client library is still in Beta, so the behaviour could change in future.
In the standard Appengine environment, the ndb library provides automatic transaction rollback:
The NDB Client Libary can group multiple operations in a single transaction. The transaction cannot succeed unless every operation in the transaction succeeds; if any of the operations fail, the transaction is automatically rolled back.

Task Queues, Idempotence, and Objectify Transactions

The documentation for GAE's Task Queue API states:
You can enqueue a task as part of a datastore transaction, such that the task is only enqueued—and guaranteed to be enqueued—if the transaction is committed successfully.
However, the documentation for datastore transactions states twice that we should make them idempotent whenever possible, and submitting to a task queue is not idempotent. The documentation for objectify takes this a step further, explaining that work MUST be idempotent within its transactions.
So, is there a standard way to handle combining these recommendations/requirements, or should I roll my own technique (perhaps using something like this)?
There is also the concern that a task can execute twice (or more) - the queue provides "at least once" semantics not "exactly once" semantics. This is common.
Some operations are easy to make idempotent (eg, "set birthdate"). Some operations can be difficult to make idempotent (eg, "transfer $5 from account A to account B"). For the difficult ones, usually the trick involves creating a transaction id outside of the start of the transaction sequence and making sure that id follows the whole chain, even through tasks. If anything retries and sees the transaction id has already been completed, you can just return.
If the task was enqueued than anything else in its associated transaction was committed as well. Yes, technically it is possible for a transaction to be committed and still get
an error response (e.g. a timed-out accepting the successful response) though that is not common. In any case your task should be idempotent as well (it could use the data committed
in its own transaction to help with that) as the task could be executed more than once even if you submitted it once. see Why Google App Engine Tasks can spuriously be executed more than once?.

Delivery of JMS message before the transaction is committed

I have a very simple scenario involving a database and a JMS in an application server (Glassfish). The scenario is dead simple:
1. an EJB inserts a row in the database and sends a message.
2. when the message is delivered with an MDB, the row is read and updated.
The problem is that sometimes the message is delivered before the insert has been committed in the database. This is actually understandable if we consider the 2 phase commit protocol:
1. prepare JMS
2. prepare database
3. commit JMS
4. ( tiny little gap where message can be delivered before insert has been committed)
5. commit database
I've discussed this problem with others, but the answer was always: "Strange, it should work out of the box".
My questions are then:
How could it work out-of-the box?
My scenario sounds fairly simple, why isn't there more people with similar troubles?
Am I doing something wrong? Is there a way to solve this issue correctly?
Here are a bit more details about my understanding of the problem:
This timing issue exist only if the participant are treated in this order. If the 2PC treats the participants in the reverse order (database first then message broker) that should be fine. The problem was randomly happening but completely reproducible.
I found no way to control the order of the participants in the distributed transactions in the JTA, JCA and JPA specifications neither in the Glassfish documentation. We could assume they will be enlisted in the distributed transaction according to the order when they are used, but with an ORM such as JPA, it's difficult to know when the data are flushed and when the database connection is really used. Any idea?
You are experiencing the classic XA 2-PC race condition. It does happen in production environments.
There are 3 things coming to my mind.
Last agent optimization where JDBC is the non-XA resource.(Lose recovery semantics)
Have JMS Time-To-Deliver. (Deliberately Lose real time)
Build retries into JDBC code. (Least effect on functionality)
Weblogic has this LLR optimization avoids this problem and gives you all XA guarantees.

Can I define a read-only transaction using GAE's JDO?

I'm use the latest versions of the GWT GAE w/ JDO stack.
I have a task queue updating persistent objects with the datastore.
I also have a gwt user interface displaying the save objects (without modification).
Given tightly defined transaction (start/commit) boundaries. Is there a way for me to define a read-only transaction for the GUI that does not conflict with the task updating the objects?
I believe they are conflicting and throwing these exceptions (abridged)
javax.jdo.JDODataStoreException: Transaction rolled back due to failure during commit
at org.datanucleus.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:402)
at org.datanucleus.jdo.JDOTransaction.commit(JDOTransaction.java:132)
....
NestedThrowablesStackTrace:
java.sql.SQLException: Concurrent Modification
at org.datanucleus.store.appengine.DatastoreTransaction.commit(DatastoreTransaction.java:70)
the app engine datastore actually uses optimistic concurrency (more), not locking. that means that a transaction that does only reads will not interfere or cause contention with other writes or transactions that include writes.

Resources