GAE datastore contention avoidance? - google-app-engine

Making my way through the GAE documents.
I have a question I can't find an obvious answer to. Given that transaction to an entity group is limited to 1/sec, how can you scale a request where say, 10,000 users all want to access a particular user's page, at the same time?
Wouldn't this give you 10,000 reads on the particular user's entity group in 1/sec, thereby causing catastrophic system failure and unhappy users?
Or am I confused, and only writes get contentious.

AppEngine uses for transactions a optimistic concurrency control, meaning that they do not lock the data, but throw an exception when they detect that data is "dirty". So, first transaction to change data is ok, the second gets the exception and must retry.
Given this, I assume that reads do not block if they are not part of transaction, even if some other transaction is in progress.
Also, to make transactions less of a bottleneck, one should carefully organize entity groups and make them as small as possible and also have them organized in such a way that there is as few contention (parallel requests) as possible. Meaning:
Have small entity graphs - do not put a lot of entities under common parent.
Try having user entity as a root parent. Users usually do not create parallel transactions (e.g. make multiple money transfers at the same time, etc..)

Right. I wasn't thinking. The answer is memcache. At least partially. That, and an efficient data model/ schema.

Related

When is data consistency not an issue?

I am new in learning distributed systems and I read about the CAP theorem, I am interested in an AP system such as Cassandra.
My question is in what cases can you actually sacrifice consistency? Effectively what I am saying is sacrificing consistency means serving inaccurate data. In what cases would then you actually use an AP datastore like Cassandra? I can't think of any case where I wouldn't want my reads to be consistent.
By AP system, I assume you will at least target to ensure eventual consistency.
Imagine you're developing a social network where users have friends and their own news feeds. It doesn't matter if a particular user's feed has occasional five minutes lag (his feed list has eventual consistency). Missing 2/3 very recent updates in the news feed is okay in this scenario as long as those feeds will eventually appear. And in fact, Facebook built it's news feed using Cassandra.
Imagine a distributed key-value store cache system where update is very rare. If there is almost no update operations, ensuring strong consistency is un-necessary, so you can focus on availability. Occasional cache miss (the key-value entry is not populated yet) and request to database due to eventual consistency should be okay.
My question is in what cases can you actually sacrifice consistency?
One case would be when building a recommendation engine data set and serving it with Cassandra. These data sets are essentially the aggregation of many, many users to determine purchasing/viewing patterns.
For example: If I add a Rey Star Wars action figure to my shopping cart, the underlying recommendation engine runs a query for similar resulting purchasing patterns based on others who have also purchased an action figure of Rey. The query returns the top 5 product results, and puts them at the bottom of the page.
Those 5 products returned are the result of analysis and aggregation of several thousand prior purchases. Let's assume that some of that data isn't consistent, causing a variance in the 5 products returned. Is that really a big deal?
tl;dr; The real question to ask; is whether or not getting a somewhat-accurate list of 5 product recommendations in less than 10ms, is better than getting a 100% accurate list of 5 product recommendations in 100ms?
Both result sets will help drive sales. But the one which is returned fast enough that it doesn't hinder the user experience is much more preferred.
'C' in CAP refers to linearizability which is a very strong form of consistancy that you don't need most of the time.
Linearizability is a recency guarantee which makes it appear that there is a single copy of data. As soon as you make a change in the data, all subsequent reads will return the changed data. Such a level of consistency is expensive and doesn't scale well. Yet in certain scenarios we need linearizability, viz.
Leader election
Allowing end users to create their unique user id
Distributed locking etc.
When you have these usecases, you'd use something like ZooKeeper, etcd etc. Cassandra also has Light Weight Transaction (LWT) which uses an extension of the classic Paxos algorithm to implement linearizability. This feature can be used to address those rare use cases where you must have linearizability and serializability, but it is expensive. And in vast majority of cases you are just fine with a little weaker consistency to get better scalability and performance. You trade a little bit of consistency with scalability and performance.
Some eCommerce websites send apology letter to customers for not being able to fulfill their orders. That is because the last copy of the product has been sold to more than one customers due to lack and linearizability. They prefer to deal with that over not being able to scale with the customer base and not being able to respond to their requests within stringent SLAs.
Cassandra is said to have a tuneable consistency. You may want to record user clicks or activities for analysis. You are okay if some data are lost, but you cannot compromise with the performance. You'd probably use a write consistency level of ANY with hints enabled (sloppy quorum).
If you want a little more consistency, you'd use a QUORUM consistency level to read and write along with hints and read repair. In vast majority of case all nodes are updated instantaneously. Even if one or two nodes go down, a majority of nodes will have the data and failed nodes would be repaired when they come back using hints, read repair, anti entropy repair.
Cassandra is particularly useful for cases where you'd not have many concurrent updates on same data. The reason is, unlike the dynamo architecture, it does not use vector clocks for conflict resolution between replicas. Instead it uses Last Write Wins (LWW) based on timestamp. If timestamps are same, it uses lexicographical order. Since the time on nodes cannot be accurate even in the presence of NTPD, there is a possibility of data loss, although Cassandra has taken some steps to avoid that - for e.g. client side timestamp instead of server side timestamp.
The CAP theorem says that given partition tolerence, you can either choose availability or consistency in a distributed database (no one would want to give up partition tolerence in any case). So if you want to have maximum availability, you'll have to give up on the consistency. This depends of course, on how critical the business is.
You answered something on SO but the answer doesn't show up when you visit the page? Can be tolerated. SO being down? Can't be. Critical financial systems would rather have strong consistency than availability. Every once-in-a-while, my bank's servers would go offline when I try to make a payment.
Normally, you choose availability and eventual consistency. The answer you wrote into SO would eventually show up.
Apart from the above mentioned cases where inconsistent data is tolerable, there are also scenarios where we can defer to the user to solve the inconsistency.
For example, if we found two different versions of someone's address in the database, we can prompt the user to identity the correct address.

Simultaneous access to database; keeping data consistent across all connections

I made a database system right here:
(comments on the normalization are highly appreciated as well - I have a feeling you'll hate me on what I did with tblIsolateSensitivity; tblHAIFile only has a bunch of Boolean fields and foreign keys).
Let's say we have x number of terminals accessing the database. X1 edits Patient 01, X2 edits Patient 02, X3 deletes Patient 01 at the same time. How can I ensure that the data between the three terminals are all up-to-date and consistent?
At the moment, I am querying the data only when the query is needed to be done (ie: when the user searches for a record, or if the program needs to verify something against a database record), meaning the data is only as updated as the most recent query that the user makes. This makes it difficult to ensure that the data is up-to-date on all terminals. Of course, for deleted entries, I have error handling to handle that, but for the rest, well...
So, my question is: how do you guys typically handle this kind of situation? Is there a name for this concept so that I can look it up and read long?
From a database design perspective, you should read up on optimistic concurrency and pessimistic concurrency. These are two options for making sure that you either don't have two users modifying the same record at the same time, or at least if you do allow that, the conflict is detected so it can be resolved.
The basic idea behind optimistic concurrency is that you allow multiple users to view and modify the data at the same time, on the assumption that this will be relatively rare. However, before any user writes changes to the data, a check is made to ensure that the underlying data hasn't changed since it was originally read. In some cases you do this manually with a read before update, checking each column value against a cached value. However, that is cumbersome. Some DBMS systems have features that make this simpler. For example, SQL Server has the ROWVERSION (formerly known as TIMESTAMP) data type, which lets you check easily using a single value whether someone else has changed a record since the last time you read it.
The basic idea behind pessimistic concurrency is that you put a lock on a record in the expectation that you're going to change it. While you hold the lock, the DBMS will prevent anyone else from getting their own lock.
The advantage of optimistic concurrency is that it's pretty light weight, doesn't interfere too much with your application, and let's you manually (or automatically) resolve any conflicts on those rare occasions when they happen. You also don't have to worry about someone reading a record, locking it and then going home for the weekend.
The advantage of pessimistic concurrency is that it prevents collisions, but it can stop one user from working while they wait for another to finish what they're doing.
From the perspective of notifying users when records change in the background (i.e. they're changed by another user) that isn't a database design feature. It may be a feature of your application logic or of your application's data access layer.

What exactly is the throughput restriction on an entity group in Google App Engine's datastore?

The documentation describes a limitation on the throughput to an entity group in the datastore, but is vague on what exactly the limitation is. My confusion is in two parts:
1. What is being restricted?
Specifically, is it:
The number of writes?
Number of transactions that write to the datastore?
Number of transactions regardless of whether it reads or writes to the datastore?
2. What is the type of the restriction?
Specifically, is it:
An artificially enforced one-per-second hard rule?
An empirically observed max throughput, that may in practice be better based on factors like network load, etc.?
There's no throughput restriction per se, but to guarantee atomicity in transactions, updates must be serialized and applied sequentially and in order, so if you make enough of them things will start to fail/timeout. This is called datastore contention:
Datastore contention occurs when a single entity or entity group is updated too rapidly. The datastore will queue concurrent requests to wait their turn. Requests waiting in the queue past the timeout period will throw a concurrency exception. If you're expecting to update a single entity or write to an entity group more than several times per second, it's best to re-work your design early-on to avoid possible contention once your application is deployed.
To directly answer your question in simple terms, it's specifically the number of writes per entity group (5/ish per second), and it's just a rule of thumb, your milage may vary (greatly).
Some people have reported no contention at all, while others have problems to get more than 1 update per second. As you can imagine this depends on the complexity of the operation and the load of all the machines involved in execution.
Limits:
writes per second to an entity group
entity groups per cross-entity-group transaction (XG transaction)
There is a limit of 1 write per second per entity group. This is a documented limit that in practice appears to be a 'soft' limit, as in it is possible to exceed it, but not guaranteed to be allowed. Transactions 'block' if the entity had been written to in the last second, however the API allows for transient exceptions to occur as well. Obviously you would be susceptible to timeouts as well.
This does not affect the overall number of transactions for your app, just specifically related to that entity group. If you need to, you can design portions of your data model to get around this limitation.
There is a limit of 25 entity groups per XG transaction, meaning a transaction can not incorporate more than 25 entity groups in its context (reads, writes etc). This used to be a limit of 5 but was recently increased.
So to answer your direct questions:
Writes for the entire entity group (as defined by the root key) within a second window (which is not strict)
artificially enforced one-per-second soft rule
If you ask that question, then the Google DataStore is probably not for you.
The Google DataStore is an experimental database, where the API can be changed any time - it is also ment for retail apps, non-critical applications.
A clear indication you meet when you signup for the DataStore, something like no responsibility to backwards compatibility etc. Another indication is the lack of clear examples, the lack of wrappers providing a simple API to implement an access to the DataStore - and the examples on the net being a soup of complicated installations and procedures to make a simple query.
My own conclusion so far after days of research, is Google DataStore is not ready for commercial use, but looks promising once it is finished and in a stable release version.
When you search the net, and look at the few Google examples, if there at all are any - it is about to notice whats not mentioned rather than what is mentioned - which is about nothing is mentioned by Google ..... ;-) If you look at the vendors "supporting" Google DataStore, they simply link to the Google DataStore site for further information, which mention nothing, so you are in a ring where nothing concrete is mentioned ....

GAE transaction failure and idempotency

The Google App Engine documentation contains this paragraph:
Note: If your application receives an exception when committing a
transaction, it does not always mean that the transaction failed. You
can receive DatastoreTimeoutException,
ConcurrentModificationException, or DatastoreFailureException
exceptions in cases where transactions have been committed and
eventually will be applied successfully. Whenever possible, make your
Datastore transactions idempotent so that if you repeat a transaction,
the end result will be the same.
Wait, what? It seems like there's a very important class of transactions that just simply cannot be made idempotent because they depend on current datastore state. For example, a simple counter, as in a like button. The transaction needs to read the current count, increment it, and write out the count again. If the transaction appears to "fail" but doesn't REALLY fail, and there's no way for me to tell that on the client side, then I need to try again, which will result in one click generating two "likes." Surely there is some way to prevent this with GAE?
Edit:
it seems that this is problem inherent in distributed systems, as per non other than Guido van Rossum -- see this link:
app engine datastore transaction exception
So it looks like designing idempotent transactions is pretty much a must if you want a high degree of reliability.
I was wondering if it was possible to implement a global system across a whole app for ensuring idempotency. The key would be to maintain a transaction log in the datastore. The client would generated a GUID, and then include that GUID with the request (the same GUID would be re-sent on retries for the same request). On the server, at the start of each transaction, it would look in the datastore for a record in the Transactions entity group with that ID. If it found it, then this is a repeated transaction, so it would return without doing anything.
Of course this would require enabling cross-group transactions, or having a separate transaction log as a child of each entity group. Also there would be a performance hit if failed entity key lookups are slow, because almost every transaction would include a failed lookup, because most GUIDs would be new.
In terms of the additional $ cost in terms of additional datastore interactions, this would probably still be less than if I had to make every transaction idempotent, since that would require a lot of checking what's in the datastore in each level.
dan wilkerson, simon goldsmith, et al. designed a thorough global transaction system on top of app engine's local (per entity group) transactions. at a high level, it uses techniques similar to the GUID one you describe. dan dealt with "submarine writes," ie the transactions you describe that report failure but later surface as succeeded, as well as many other theoretical and practical details of the datastore. erick armbrust implemented dan's design in tapioca-orm.
i don't necessarily recommend that you implement his design or use tapioca-orm, but you'd definitely be interested in the research.
in response to your questions: plenty of people implement GAE apps that use the datastore without idempotency. it's only important when you need transactions with certain kinds of guarantees like the ones you describe. it's definitely important to understand when you do need them, but you often don't.
the datastore is implemented on top of megastore, which is described in depth in this paper. in short, it uses multi-version concurrency control within each entity group and Paxos for replication across datacenters, both of which can contribute to submarine writes. i don't know if there are public numbers on submarine write frequency in the datastore, but if there are, searches with these terms and on the datastore mailing lists should find them.
amazon's S3 isn't really a comparable system; it's more of a CDN than a distributed database. amazon's SimpleDB is comparable. it originally only provided eventual consistency, and eventually added a very limited kind of transactions they call conditional writes, but it doesn't have true transactions. other NoSQL databases (redis, mongo, couchdb, etc.) have different variations on transactions and consistency.
basically, there's always a tradeoff in distributed databases between scale, transaction breadth, and strength of consistency guarantees. this is best known by eric brewer's CAP theorem, which says the three axes of the tradeoff are consistency, availability, and partition tolerance.
The best way I came up with making counters idempotent is using a set instead of an integer in order to count. Thus, when a person "likes" something, instead of incrementing a counter I add the like to the thing like this:
class Thing {
Set<User> likes = ....
public void like (User u) {
likes.add(u);
}
public Integer getLikeCount() {
return likes.size();
}
}
this is in java, but i hope you get my point even if you are using python.
This method is idempotent and you can add a single user for how many times you like, it will only be counted once. Of course, it has the penalty of storing a huge set instead of a simple counter. But hey, don't you need to keep track of likes anyway? If you don't want to bloat the Thing object, create another object ThingLikes, and cache the like count on the Thing object.
another option worth looking into is app engine's built in cross-group transaction support, which lets you operate on up to five entity groups in a single datastore transaction.
if you prefer reading on stack overflow, this SO question has more details.

Google App Engine HRD - what if I exceed the 1 write per second limit for writing to the entity group?

According to the Google App Engine documentation, writing to one entity group is limited to one write per second when using High Replication Datastore. So...
What happens if I exceed this limit? Some kind of exception? And what should I do?
How do I know that I'm close to exceeding this limit? I can design the application in a way that particular actions (adding an entity...) are not likely to happen often but naturally I can't guarantee that.
Based on the GAE documentation and from my limited experience:
Expect something like 1 QPS rate and tune your app accordingly.
Sharding is a common pattern to handle datastore contention.
Always add defensive code to handle every possible exception (Application Error 5, The datastore operation timed out, Transaction collision for entity group, ..)
In case of error, retry the write moving the task in a proper taskqueue or, if you can, just alert the user to try again.
Retrying a write, usually works.
When possible, use a write-behind cache mechanism moving the writes operation that can lead to contention to Memcache and a Taskqueue slowing down the datastore hit rate.
A good way to avoid contention is to keep the entity groups small, but don't count on it too much.
You can have contention even on single entity.
one write per second is a little low. it actually supports more than that. i would even say between 5 and 10 writes per second but i can't guarantee that of course.
if you hit that limit you will get an exception yes. the exception message will be:
Too much contention on these datastore entities. please try again.
but i don't know/remember the exact Exception that will raise.
you have the choice to retry, continue or whatever else you think is right to to at that point.
you can't tell if you are close to a 1 write per second limit. it happens and you deal with it.

Resources