This might be a common question, but I am still not clear on a solution. I am using GAE (Datastore) , interacting with Objectify. I have a question about concurrency.
Let's say we have an object, lets call it blarkar. User 'A' starts a transaction where he will attempt to load blarkar from the database, modify it, and save the modified version back into the database.
I understand that in a transaction either everything happens or nothing happens, but what I am confused about is whether the blarkar object is marked as checked out, and whether it can or cannot be overwritten while the transaction, started by user 'A', is in progress.
For instance, if another user, we'll call her 'B', tries to saved a new version of blarkar to the database, after user 'A' has begun his transaction but before he has ended it, well her changes be overwritten when user 'A's transaction ends?
Again, I am using GAE Datastore, and interacting with it with Objectify. However, Objectify seems to just implement Datastore's transaction API.
Many thanks.
Read this: https://code.google.com/p/objectify-appengine/wiki/Concepts#Transactions
You can also google for Optimistic Concurrency.
Related
The questions is fairly simple.
Does Google Datastore Transactions Optimistic Concurrency Control or not?
One part of the documentations says that it does:
When a transaction starts, App Engine uses optimistic concurrency control by checking the last update time for the entity groups used in the transaction. Upon commiting a transaction for the entity groups, App Engine again checks the last update time for the entity groups used in the transaction. If it has changed since our initial check, an exception is thrown. Source
Another part of the documentation indicates that it doesn't:
When a transaction is started, the datastore rejects any other attempts to write to that entity group before the transaction is complete. To illustrate this, say you have an entity group consisting of two entities, one at the root of the hierarchy and the other directly below it. If these entities belonged to separate entity groups, they could be updated in parallel. But because they are part of the same entity group, any request attempting to update one of the entities will necessarily prevent a simultaneous request from updating any other entity in the same group until the original request is finished. Source
As I understand it, the first quote tells me that it is fine to start a transaction, read an entity and ignore closing the transaction, if I saw no reason for updating the entity.
The second quote tells me that, if I start a transaction and read an entity, then I should always remember to close it again, otherwise I cannot start a new on the same entity.
Which part of the documentation is correct?
BTW. In case the correct quote is the second one, I am using Objectify to handle all my transactions. Will this remember to close all started transactions, even though no changes was made?
The commenter (Greg) is correct. Whether or not you explicitly close a transaction, all transactions are closed by the container at the end of a request. You can't "leak" transactions (although you could screw up transactions within a single request).
Furthermore, with Objectify's transaction API, transactions are automatically opened and closed for you when you execute a unit of Work. You don't manage transactions yourself.
To answer your root question: Yes, all transactions in the GAE datastore are optimistic. There is no pessimistic locking in the datastore; you can start as many transactions as you want on a single entity group but only the first commit will succeed. All subsequent attempts to commit will rollback with ConcurrentModificationException.
I am trying to write a program in Google App Engine (Python) to continually run a resident Backend which is working on finding what a series converges to. I want to make it so that it runs in the Backend, writes to Datastore, and at any point in time, you can tell what item the series is on and what value it is. The Backend only writes to one entity in Datastore, so it does not overload the storage or anything.The probably I run into though is that the Backend does not write the entity to the Datastore so it is accessible by my frontend webpage until the Backend is shut down, which defeats the purpose of being able to continually check in on it. If there is some way to have the Backend write to the Datastore so the frontend page can check in on it, please tell me!
Datastore writes in a backend process should behave no differently than writes in your front end app, meaning that they should be available for read in your front end (nearly) instantly (within consistency constraints). Both backend and front end interact with the same datastore.
It sounds like you just need to implement a recurring write of the current status of your series (ie. once every x cycles), instead of writing once at the end of the backend process.
You post suggests two issues.
The first is "without shutting down". We don't guarantee that backends will run indefinitely. See the docs on Shutdown for some details.
The second issue, if I'm understanding you, is that you're not seeing values written by the backend until some time after they're written. You may be running into "eventual consistency", were "eventual" is usually pretty short, but can an rare occasions be surprisingly long. Understanding Isolation and Consistency can help here.
I am having the following problem. I am now using the low-level
google datastore API rather than JDO, that way I should be in a
better position to see exactly what is happening in my code. I am
writing an entity to the datastore and shortly thereafter reading it
from the datastore using Jetty and eclipse. Sometimes the written
entity is not being read. This would be a real problem if it were to
happen in production code. I am using the 2.0 RC2 API.
I have tried this several times, sometimes the entity is retrieved
from the datastore and sometimes it is not. I am doing a simple
query on the datastore just after committing a write transaction.
(If I run the code through the debugger things run slow enough
that the entity has a chance of being read back on the second pass).
Any help with this issue would be greatly appreciated,
Regards,
The development server has the same consistency guarantees as the High Replication datastore on the live server. A "global" query uses an index that is only guaranteed to be eventually consistent with writes. To perform a query with strongly consistent guarantees, the query must be limited to an entity group, using an "ancestor" key.
A typical technique is to group data specific to a single user in a group, so the user can see changes to queries limited to the user's group with strong consistency guarantees. Another technique is to use fancier client logic to update the client's local view as soon as the change is submitted, so the user sees the change in the UI immediately while the update to the global index is in progress.
See the docs on queries and transactions.
I 've noticed this behavior which results in consecutive gets to succeed.
Has anybody else seen this?
I've found a way to remove one single entity from memcache, painful but it works.
Now, I use Java and Objectify but I hope you'll find this useful, whatever environment and language you use.
Go to the page https://console.cloud.google.com/appengine/memcache for your project.
Enter under Namespace the value "ObjectifyCache", or whatever namespace you use.
Under Key Type, select Java String
This is the tricky bit. Under Key you've got to enter the "URL-safe key" you'll find from the Datastore Edit page for your entity (https://console.cloud.google.com/datastore/entities/edit)
Click on Find and hopefully en entity will appear.
Check the box, and click on DELETE
Now you click on Find again, nothing will come up.
If you're using the high-replication datastore, gets immediately after deletes may succeed and pull up the stale results. It takes up to a few seconds for the results of each operation to appear in the results of other operations.
Memcache is operates independently of the datastore. Some libraries like Objectify connect them,. If you're using Objectify to cache entities and you delete something from outside of Objectify (e.g. the data viewer) you'll have to update your cache yourself. This happens to me occasionally and I just wipe the whole memcache.
You have to find a way to work with this behavior. The simplest (expensive and really slow) method, for example, would just be to wait ten seconds after you do every datastore operation. Better methods might use a cache to return freshly stored or deleted entities.
There's an app that starts a transaction on SQL Server 2008 and moves some data around. Then, while the transaction is still not committed, the app prints out some labels. It is very important that the transaction is not committed until printing succeeded; if a printing error occurs, everything is rolled back.
Now, the printing engine is a) grew quite huge and complex, and b) is eventually required from lots of places. It is therefore decided to separate the engine and make it a service.
Yes, it is possible to pass all data required for printing from the client app to that server so that the server only prints and is not concerned about databases. However, that would mean leaving piles of code and label templates in each application that requires printing; effectively, very little separation will then occur. On contrary, it would be extreemely efficient (and easier for me to write and maintain) to just pass the IDs of what is required to the service which then would go to the database and get the data. All formats and layouts will be centralized and apps will only ask for 5 delivery notes from print job 12345.
Now, this is not going to happen as the transaction is still not committed at the moment of printing. The service would not be able to read the data, and using READ UNCOMMITTED is not quite an option.
I was going to use the good old sp_bindsession to join the two sessions, app's and service's, but then it is suddenly deprecated and to be removed from future releases. The help suggests I use MARS or distributed transactions instead, but I can't see how they would help.
Any advice?
My gut feeling is that attempting to share a transaction between two processes in this way is not a good idea.
My approach would be to either to pass all data to service, or investigate alternatives to keeping the transaction open for the duration of the printing - would a simpler mechanism (such as an IsPrinted flag for each record) not suffice?
Failing that, the eaisest way I can see of doing this would be to have the printing service pass all of its SQL requests back through to the originating process so that they can be executed in the context of the original transaction.
Only sp_getbindtoken/sp_bindsession can do what you ask, and it is deprecated and will be removed.
In theory you should use short transactions, represent the 'printing' state as a committed state, and have compensating actions if the print fails. Also if the printing engine is exposed as a service, it should be autonomous and receive as a message all data it needs to print (like label templates). I understand this is easy for me to to say but may be a major undertaking on the product.
For the moment I think your best bet is to use the session binding tokens. Altough I have to call out that leaving transactions open for the duration of physical operations (printing) is a very bad practice.