The questions is fairly simple.
Does Google Datastore Transactions Optimistic Concurrency Control or not?
One part of the documentations says that it does:
When a transaction starts, App Engine uses optimistic concurrency control by checking the last update time for the entity groups used in the transaction. Upon commiting a transaction for the entity groups, App Engine again checks the last update time for the entity groups used in the transaction. If it has changed since our initial check, an exception is thrown. Source
Another part of the documentation indicates that it doesn't:
When a transaction is started, the datastore rejects any other attempts to write to that entity group before the transaction is complete. To illustrate this, say you have an entity group consisting of two entities, one at the root of the hierarchy and the other directly below it. If these entities belonged to separate entity groups, they could be updated in parallel. But because they are part of the same entity group, any request attempting to update one of the entities will necessarily prevent a simultaneous request from updating any other entity in the same group until the original request is finished. Source
As I understand it, the first quote tells me that it is fine to start a transaction, read an entity and ignore closing the transaction, if I saw no reason for updating the entity.
The second quote tells me that, if I start a transaction and read an entity, then I should always remember to close it again, otherwise I cannot start a new on the same entity.
Which part of the documentation is correct?
BTW. In case the correct quote is the second one, I am using Objectify to handle all my transactions. Will this remember to close all started transactions, even though no changes was made?
The commenter (Greg) is correct. Whether or not you explicitly close a transaction, all transactions are closed by the container at the end of a request. You can't "leak" transactions (although you could screw up transactions within a single request).
Furthermore, with Objectify's transaction API, transactions are automatically opened and closed for you when you execute a unit of Work. You don't manage transactions yourself.
To answer your root question: Yes, all transactions in the GAE datastore are optimistic. There is no pessimistic locking in the datastore; you can start as many transactions as you want on a single entity group but only the first commit will succeed. All subsequent attempts to commit will rollback with ConcurrentModificationException.
Related
In many databases, when an operation is performed without explicitly starting a transaction, the database creates a new transaction implicitly.
Does the datastore do this?
If it does not, is there any model for reasoning about how the data changes in the absence of transactions? How do puts, fetches, and reads, work outside of transactions?
If it does, is there any characterization for when and how. Does it do it always? What is the scope of the transaction?
A mutation (put, delete) of a single entity will always be atomic (succeed entirely or fail entirely). You can think of the single mutation as transactional, even if you did not provide a transaction.
However, if you send multiple mutations in the same non-transactional request, that overall request is not atomic. Each mutation may succeed or fail independently -- one failure will not cause the other mutations to be reverted.
"Transactions are an optional feature of the Datastore; you're not required to use transactions to perform Datastore operations."
so there are no automatic transactions being opened for you across more than a single entity datastore operation.
a single entity commit will behave the same as a transaction internally. so if you are changing more than one entity or committing it more than once, its as if you open and close a transaction every time.
If I use multitenancy feature on GAE Datastore, will a datastore transaction lock be applied per tenant as well? Or if a tenant is using a datastore transaction, all of the other tenants will have to wait until the tenant's transaction is finished?
Two things to note:
Namespace is part of the key of the entity, so transaction will work only for entities that are part of your transaction. Entities of other namespaces will not be affected even if they have same IDs.
Transactions on GAE do not do locking, instead they use optimistic concurrency control. So transactions are never blocking, just when two transactions operate on same entities the second will fail and then go runtime will try to repeat it up to three times. This auto-retry is the reason why your transactions should be idempotent (= running the code multiple times should produce same end result).
The scope of transactions is limited to entity groups. Namespaces (multi tenancy) don't define an entity group on their own. You need the key (plus potential ancestor).
The only clash will be with multiple requests writing to the same entity group. With namespaces in use that could not happen between tenants.
https://developers.google.com/appengine/docs/go/datastore/entities#Go_Ancestor_paths
I have a game server running on Google App Engine. Some of my HTTP requests result in multiple puts to different Models. So I have a user model and a game model for example and the one request may write to both. I am using python with the NDB database interface.
Is there a way to ensure both succeed or in the even that one succeeds and one fails to have them both fail? Transactions sounded like it might be the right thing but I'm not clear after reading the docs and it's talking about multiple requests and collisions.
I do see that a single put can take a list of entities but I don't see any mention of if one fails then they all fail behavior.
Yes, transactions are what you need. You need to structure your data properly with Ancestors in order to write them all within a transaction though.
If any of the puts fail, none of the puts within the transaction get written. Says so right in the documentation.
https://developers.google.com/appengine/docs/python/datastore/transactions
You can use XG transactions for update up to 5 entity groups (up to 5 object if you don't use parent-child model).
Every entity group (or object) have limit about 1 transaction/second.
ndb transaction: https://developers.google.com/appengine/docs/python/ndb/transactions
limitation of xg transactions: https://developers.google.com/appengine/docs/python/datastore/overview#Cross_Group_Transactions
Making my way through the GAE documents.
I have a question I can't find an obvious answer to. Given that transaction to an entity group is limited to 1/sec, how can you scale a request where say, 10,000 users all want to access a particular user's page, at the same time?
Wouldn't this give you 10,000 reads on the particular user's entity group in 1/sec, thereby causing catastrophic system failure and unhappy users?
Or am I confused, and only writes get contentious.
AppEngine uses for transactions a optimistic concurrency control, meaning that they do not lock the data, but throw an exception when they detect that data is "dirty". So, first transaction to change data is ok, the second gets the exception and must retry.
Given this, I assume that reads do not block if they are not part of transaction, even if some other transaction is in progress.
Also, to make transactions less of a bottleneck, one should carefully organize entity groups and make them as small as possible and also have them organized in such a way that there is as few contention (parallel requests) as possible. Meaning:
Have small entity graphs - do not put a lot of entities under common parent.
Try having user entity as a root parent. Users usually do not create parallel transactions (e.g. make multiple money transfers at the same time, etc..)
Right. I wasn't thinking. The answer is memcache. At least partially. That, and an efficient data model/ schema.
I am using Hibernate in an Eclipse RAP application. I have database tables mapped to classes with Hibernate and these classes have properties that are fetched lazily (If these weren't fetched lazily then I would probably end up loading the whole database into memory on my first query). I do not synchronize database access so there are multiple Hibernate Sessions for the users and let the DBMS do the transaction isolation. This means different instances of fetched data will belong to different users. There are things that if a user changes those things, then I would like to update those across multiple users. Currently I was thinking about using Hibernate session.refresh(object) in these cases to refresh the data, but I'm unsure how this will impact performance when refreshing multiple objects or if it's the right way to go.
Hope my problem is clear. Is my approch to the problem OK or is it fundamentally flawed or am I missing something? Is there a general solution for this kind of problem?
I would appreciate any comments on this.
The general solution is
to have transactions as short as possible
to link the session lifecycle to the transaction lifecycle (this is the default: the session is closed when the transaction is committed or rolled back)
to use optimistic locking concurrency to avoid two transactions updating the same object at the same time.
If each transaction is very short and transaction A updates some object from O to O', then concurrent transaction B will only see O until it commits or rolls back, and any other transaction started after A will see O', because a new session starts with the transaction.
We maintain an application that does exactly what you are trying to accomplish. Yes, every session.refresh() will hit the database, but since all sessions will refresh the same row at the same time, the DB server will answer all of these queries from memory.
The only thing that you still need to solve is how to propagate the information that something has changed and needs reloading to all the other sessions, possibly even to sessions on a different host.
For our application, we have about 30 users on RCP and 10-100 users on RAP instances that all connect to the very same DB backend (though through pgpool). We use a small network service that every runtime connects to; when a transaction commits, the application tells this change service that "row id X of table T" has changed and this is then propagated to all other "change subscribers", even across JVMs.
But: make sure that session.refresh() is called within the Thread that belongs to that session, possibly the RAP-Display thread. Do not call refresh() from Jobs or other unrelated threads.
As long you don't have a large number of users updating big counts of rows in short time, I guess you won't have to worry about performance.