writing then reading entity does not fetch entity from datastore - google-app-engine

I am having the following problem. I am now using the low-level
google datastore API rather than JDO, that way I should be in a
better position to see exactly what is happening in my code. I am
writing an entity to the datastore and shortly thereafter reading it
from the datastore using Jetty and eclipse. Sometimes the written
entity is not being read. This would be a real problem if it were to
happen in production code. I am using the 2.0 RC2 API.
I have tried this several times, sometimes the entity is retrieved
from the datastore and sometimes it is not. I am doing a simple
query on the datastore just after committing a write transaction.
(If I run the code through the debugger things run slow enough
that the entity has a chance of being read back on the second pass).
Any help with this issue would be greatly appreciated,
Regards,

The development server has the same consistency guarantees as the High Replication datastore on the live server. A "global" query uses an index that is only guaranteed to be eventually consistent with writes. To perform a query with strongly consistent guarantees, the query must be limited to an entity group, using an "ancestor" key.
A typical technique is to group data specific to a single user in a group, so the user can see changes to queries limited to the user's group with strong consistency guarantees. Another technique is to use fancier client logic to update the client's local view as soon as the change is submitted, so the user sees the change in the UI immediately while the update to the global index is in progress.
See the docs on queries and transactions.

Related

GAE Golang - queries not returning data right after it is saved

I am writing a Google App Engine Go application and I'm having problem in testing some functionality. Here is some sample code. The problem is as follows:
I create a new item and save it to datastore
I make a search query for that item immediately after (for example, getting all items in the namespace)
The item is not there
If I query for the item later (say, in subsequent pages), the item is found normally. I understand this behaviour might be intentional for apps deployed on GAE, but I would expect the local datastore to be instantly good for queries.
Is there any way I can force the datastore to consolidate and be good for such queries?
This is called eventual consistency, and it's a feature of App Engine's Datastore.
You can use a get method instead of a query to test if an entity has been saved.
In Java we can set the desired behavior of a local datastore by changing a run parameter:
By default, the local datastore is configured to simulate the
consistency model of the High Replication Datastore, with the
percentage of datastore writes that are not immediately visible in
global queries set to 10%.
To adjust this level of consistency, set the
datastore.default_high_rep_job_policy_unapplied_job_pct system
property with a value corresponding to the amount of eventual
consistency you want your application to see.
I could not find something similar for Go.

GAE transaction failure and idempotency

The Google App Engine documentation contains this paragraph:
Note: If your application receives an exception when committing a
transaction, it does not always mean that the transaction failed. You
can receive DatastoreTimeoutException,
ConcurrentModificationException, or DatastoreFailureException
exceptions in cases where transactions have been committed and
eventually will be applied successfully. Whenever possible, make your
Datastore transactions idempotent so that if you repeat a transaction,
the end result will be the same.
Wait, what? It seems like there's a very important class of transactions that just simply cannot be made idempotent because they depend on current datastore state. For example, a simple counter, as in a like button. The transaction needs to read the current count, increment it, and write out the count again. If the transaction appears to "fail" but doesn't REALLY fail, and there's no way for me to tell that on the client side, then I need to try again, which will result in one click generating two "likes." Surely there is some way to prevent this with GAE?
Edit:
it seems that this is problem inherent in distributed systems, as per non other than Guido van Rossum -- see this link:
app engine datastore transaction exception
So it looks like designing idempotent transactions is pretty much a must if you want a high degree of reliability.
I was wondering if it was possible to implement a global system across a whole app for ensuring idempotency. The key would be to maintain a transaction log in the datastore. The client would generated a GUID, and then include that GUID with the request (the same GUID would be re-sent on retries for the same request). On the server, at the start of each transaction, it would look in the datastore for a record in the Transactions entity group with that ID. If it found it, then this is a repeated transaction, so it would return without doing anything.
Of course this would require enabling cross-group transactions, or having a separate transaction log as a child of each entity group. Also there would be a performance hit if failed entity key lookups are slow, because almost every transaction would include a failed lookup, because most GUIDs would be new.
In terms of the additional $ cost in terms of additional datastore interactions, this would probably still be less than if I had to make every transaction idempotent, since that would require a lot of checking what's in the datastore in each level.
dan wilkerson, simon goldsmith, et al. designed a thorough global transaction system on top of app engine's local (per entity group) transactions. at a high level, it uses techniques similar to the GUID one you describe. dan dealt with "submarine writes," ie the transactions you describe that report failure but later surface as succeeded, as well as many other theoretical and practical details of the datastore. erick armbrust implemented dan's design in tapioca-orm.
i don't necessarily recommend that you implement his design or use tapioca-orm, but you'd definitely be interested in the research.
in response to your questions: plenty of people implement GAE apps that use the datastore without idempotency. it's only important when you need transactions with certain kinds of guarantees like the ones you describe. it's definitely important to understand when you do need them, but you often don't.
the datastore is implemented on top of megastore, which is described in depth in this paper. in short, it uses multi-version concurrency control within each entity group and Paxos for replication across datacenters, both of which can contribute to submarine writes. i don't know if there are public numbers on submarine write frequency in the datastore, but if there are, searches with these terms and on the datastore mailing lists should find them.
amazon's S3 isn't really a comparable system; it's more of a CDN than a distributed database. amazon's SimpleDB is comparable. it originally only provided eventual consistency, and eventually added a very limited kind of transactions they call conditional writes, but it doesn't have true transactions. other NoSQL databases (redis, mongo, couchdb, etc.) have different variations on transactions and consistency.
basically, there's always a tradeoff in distributed databases between scale, transaction breadth, and strength of consistency guarantees. this is best known by eric brewer's CAP theorem, which says the three axes of the tradeoff are consistency, availability, and partition tolerance.
The best way I came up with making counters idempotent is using a set instead of an integer in order to count. Thus, when a person "likes" something, instead of incrementing a counter I add the like to the thing like this:
class Thing {
Set<User> likes = ....
public void like (User u) {
likes.add(u);
}
public Integer getLikeCount() {
return likes.size();
}
}
this is in java, but i hope you get my point even if you are using python.
This method is idempotent and you can add a single user for how many times you like, it will only be counted once. Of course, it has the penalty of storing a huge set instead of a simple counter. But hey, don't you need to keep track of likes anyway? If you don't want to bloat the Thing object, create another object ThingLikes, and cache the like count on the Thing object.
another option worth looking into is app engine's built in cross-group transaction support, which lets you operate on up to five entity groups in a single datastore transaction.
if you prefer reading on stack overflow, this SO question has more details.

Multiple puts on Google App Engine - all fail or all succeed?

I have a game server running on Google App Engine. Some of my HTTP requests result in multiple puts to different Models. So I have a user model and a game model for example and the one request may write to both. I am using python with the NDB database interface.
Is there a way to ensure both succeed or in the even that one succeeds and one fails to have them both fail? Transactions sounded like it might be the right thing but I'm not clear after reading the docs and it's talking about multiple requests and collisions.
I do see that a single put can take a list of entities but I don't see any mention of if one fails then they all fail behavior.
Yes, transactions are what you need. You need to structure your data properly with Ancestors in order to write them all within a transaction though.
If any of the puts fail, none of the puts within the transaction get written. Says so right in the documentation.
https://developers.google.com/appengine/docs/python/datastore/transactions
You can use XG transactions for update up to 5 entity groups (up to 5 object if you don't use parent-child model).
Every entity group (or object) have limit about 1 transaction/second.
ndb transaction: https://developers.google.com/appengine/docs/python/ndb/transactions
limitation of xg transactions: https://developers.google.com/appengine/docs/python/datastore/overview#Cross_Group_Transactions

Distributed store with transactions

I currently develop an application hosted at google app engine. However, gae has many disadvantages: it's expensive and is very hard to debug since we can't attach to real instances.
I am considering changing the gae to an open source alternative. Unfortunately, none of the existing NOSQL solutions which satisfy me support transactions similar to gae's transactions (gae support transactions inside of entity groups).
What do you think about solving this problem? I am currently considering a store like Apache Cassandra + some locking service (hazelcast) for transactions. Did anyone has any experience in this area? What can you recommend
There are plans to support entity groups in cassandra in the future, see CASSANDRA-1684.
If your data can't be easily modelled without transactions, is it worth using a non transcational database? Do you need the scalability?
The standard way to do transaction like things in cassandra is described in this presentation, starting at slide 24. Basically you write something similar to a WAL log entry to 1 row, then perform the actual writes on multiple rows, then delete the WAL log row. On failure, simply read and perform actions in the WAL log. Since all cassandra writes have a user supplied time stamp, all writes can be made idempotent, just store the time stamp of your write with the WAL log entry.
This strategy gives you the Atomic and Durable in ACID, but you do not get Consistency and Isolation. If you are working at scale that requires something like cassandra, you probably need to give up full ACID transactions anyway.
You may want to try AppScale or TyphoonAE for hosting applications built for App Engine on your own hardware.
If you are developing under Python, you have very interesting debugging options with the Werkzeug debugger.

How does Google App Engine infrastructure is fault tolerant?

I am actually implementing a web application on Google App Engine. This has taken me for the moment a huge time in re-designing the database and the application through GAE requirements and best practices.
My problem is this: How can I be sure that GAE is fault tolerant, or at what degree is it fault tolerant? I didn't find any documents in GAE on this, and it is an issue that could have drawbacks for me: My app would have, for example, to read an entity from the datastore, compute it in the application, and then put it on the datastore. In this case how could we be sure that this would be correctly done and that we get the right data : if for example the machine on which the computing have be done crash ?
Thank you for your help!
If a server crashes during a request, that request is going to fail, but any new requests would be routed to a different server. So one user might see an error, but the rest would not. The data in the datastore would be fine. If you have data that needs to be kept consistent, you would do your updates in a transaction, so that either the whole set of updates was applied or none.
Transactions operating on the same entity group are executed serially, but transactions operating on different entity groups run in parallel. So, unless there is a single entity which everything in your app wants to read and write, scalability will not suffer from transactions.

Resources