Transactional counter with 5+ writes per second in Google App Engine datastore - google-app-engine

I'm developing a tournament version of a game where I expect 1000+ simultaneous players. When the tournament begins, players will be eliminated quite fast (possibly more than 5 per second), but the process will slow down as the tournament progresses. Depending when a player is eliminated from the tournament a certain amount of points is awarded. For example a player who drops first, gets nothing, while player who is 500th, receives 1 point and the first place winner receives say 200 points. Now I'd like to award and display the amount of points right away after a player has been eliminated.
The problem is that when I push a new row into a datastore after a player has been eliminated, the row entity has to be in a separate entity group so I would not hit the gae datastore limit of 1-5 writes per second for 1 entity group. Also I need to be able to read and write a count of rows consistently so I can determine the prize correctly for all the players that get eliminated.
What would be the best way to implement the datamodel to support this?

Since there's a limited number of players, contention issues over a few a second are not likely to be sustained for very long, so you have two options:
Simply ignore the issue. Clusters of eliminations will occur, but as long as it's not a sustained situation, the retry mechanics for transactions will ensure they all get executed.
When someone goes out, record this independently, and update the tournament status, assigning ranks, asynchronously. This means you can't inform them of their rank immediately, but rather need to make an asynchronous reply or have them poll for it.
I would suggest the former, frankly: Even if half your 1000 person tournament went out in the first 5 minutes - a preposterously unlikely event - you're still looking at less than 2 eliminations per second. In reality, any spikes will be smaller and shorter-lived than that.
One thing to bear in mind is that due to how transaction retries work, transactions on the same entity group that occur together will be resolved in semi-random order - that is, it's not a strict FIFO queue. If you require that, you'll have to enforce it yourself, though that's a far from trivial thing to do in a distributed system of any sort.

the existing comments and answers address the specific question pretty well.
at a higher level, take a look at this post and open source library from the google code jam team. they had a similar problem and ended up developing a scalable scoreboard based on the datastore that handles both updates and requests for arbitrary pages efficiently.

Related

NDB: What happens when 1/s write is exceeded?

I am evauating how to use GAE + NDB for a new project, and got concerned with the limit of 1 write per second for ancestor writes. I might be missing information, so I'm happy to ask for help.
Say several users work with orders. If all new "order" entities have the same unique ancestor, what would happen if say 5 users each create a new order and all 5 hit "save" at the same time?
Do you know what the consecuences could be?
Thanks!
In your use case, nothing bad would happen - all of your writes will succeed. Some of them may be retried internally by the App Engine, but you should not worry about that. You should only get concerned when you expect this rate to be exceeded for a substantial period of time. Then retries would come on top of previous retries and commits may start failing. Giving your example, you will probably need a few million people working on those orders like crazy before it becomes an issue.
From the documentation (emphasis mine):
The first type of timeout occurs when you attempt to write to a single
entity group too quickly. Writes to a single entity group are
serialized by the App Engine datastore, and thus there's a limit on
how quickly you can update one entity group. In general, this works
out to somewhere between 1 and 5 updates per second; a good guideline
is that you should consider rearchitecting if you expect an entity
group to have to sustain more than one update per second for an
extended period.

Appengine: help needed in designing my entities to be more scalable yet inexpensive

As an e.g. consider it as online survey site.
Entities:
Survey (created with questions, answers)
Respondent (who takes surveys in parallel. huge in no.)
survey id
List (question id, answer id)
Problem:
Need to get summary of responses i.e. for any particular survey, for any question, no. of respondents who chose answer 1 vs 2 vs 3 (say)
Summary to be retrieved cheaply i.e. with less calls as possible
This code is just to help you get more understanding:
Survey sampleSurvey = ..
// get all respondents of above survey
List<Respondent> r = getAllRespondents(sampleSurvey);
// update summary per chosen question, answer
for each respondent:
List<QuestionAnswer> qa = respondent.getChosenAnswers()
for each chosen question, answer:
// increments corresponding answer count by 1
sampleSurvey.updateSummary(question.getId(), answer.getId())
// summary update done
// process summary
Summary summary = sampleSurvey.getSummary();
for each available question, answer:
print 'No. of respondents who chose answer %s for question %s are %s' % (answer.text(), question.text(), answer.count())
My thoughts:
a. Creating a summary entity for each survey as and when a respondent takes the survey by updating the counters inside summary entity(Q1 -> A1 -> 4, Q1 -> A2 -> 222,...).
Pros: get summary by reading just 1 entity; cheap
Cons: since huge no. of respondents take same survey in parallel, it means datastore contention; sharding a solution? no. of shards to be dynamic depending on no. of respondents for surveys.
b. Querying the count against indexes. With my little knowledge about appengine indexing, i dont know
how the index will be formed for above Respondent entity and how large it will be. Also am worried about no. of those extra writes needed for indexes, index exploding might happen?
query should be something like
select count(*) from Respondent where surveyId=xx and questId=yy and ansId=zz
Any other better solutions? And what about above? which one you recommend and why. Thanks a lot for looking and for your suggestions. Ping if something is unclear.
I think this depends on two main factors:
Do you know the queries you'll want to run ahead-of-time? (i.e. while the respondents are answering, as opposed to slicing up the data later.)
How many respondents do you expect? (Both # and rate.)
If you don't know the queries ahead-of-time, then I think the best you can do is fetch all the entities and compute the information you need. (And then cache that, perhaps, in another entity or memcache or both.) If you have a lot of respondents, you might have to do this computation on a backend or via a task queue to avoid hitting the request timeout/quotas. If you have a truly enormous amount of data, you might even consider Mapreduce, which is currently experimental and Python-only.
If you do know the queries ahead-of-time, then I think your approach of a single entity is on the right track. You can a standard sharded counter technique to reduce contention if you expect more than about one write per second. If you don't expect more than that, you can just use a single entity group.
If you only expect around one write per second, with possible spikes above this, another option would be to use a single entity but use a task in a task queue to update its counter asynchronously; you can throttle the task queue rate to reduce contention, so long as you won't be creating tasks faster than it can complete them. This might be easier to write, especially if you have lots of statistics to compute, though I think the sharded counters technique above is ultimately more scalable.
Updating a summary with every write is not practical, since you'll quickly run into contention issues; counting the results dynamically will be extremely inefficient. In this case, you're better off computing aggregates using a batch process such as mapreduce - just write a task that scans over all the survey answers and accumulates the relevant statistics, and run this task periodically.

How to handle daily/weekly/mothly boards on AppEngine datastore?

I'm developing a high score web service for my game, and it's running on Google App Engine.
My game has 5 difficulties, so I originally had 5 boards with entries for each (player_login, score and time). If the player submitted a lower score than the previously scored, it got dismissed, so only the highest score is kept for each player.
But to add more fun into this, I'd decided to include daily/weekly/monthly/yearly high score tables. So I've created 5 boards for each difficulty, making it 25 boards. When a score is submitted, it's saved into each board, and the boards are supposed to be cleared on every day/week/month/year.
This happens by a cron job that is invoked and deletes all entries from a specific board.
Here comes the problem: it looks like deleting entries from the datastore is slow. From my test daily cleanups it looks like deleting a single entry takes around 200 ms.
In the worst-case scenario, if the game would be quite popular and would have, say, 100 000 players, and each of them would have an entry in the yearly board, it would take 100 000 * 0.012 seconds = 12 000 seconds (3 hours!!) to clear that board. I think we are allowed to have jobs of up to 30 seconds in App Engine, so this wouldn't work.
I'm deleting with following code (thanks to Nick Johnson):
q = Score.all(keys_only=True).filter('b = ',boardToClear)
results = q.fetch(500)
while results:
self.response.out.write("deleting one batch;")
db.delete(results)
q = Score.all(keys_only=True).filter('b = ',boardToClear).with_cursor(q.cursor())
results = q.fetch(500)
What do you recommend me to do with this problem?
One approach that comes to my mind is to use a task queue and delete older scores than that are permitted in each board, i.e. which have expired, but in smaller quantities. This way I wouldn't hit the CPU limit for one task, but the cleanup would not be (nearly) instantaneous, so my 12 000 seconds long cleanup would be split into 1 200 tasks, each roughly 10 seconds long.
But I think that there is something that I'm doing wrong, this kind of operation would be a lot faster when done in relational database. Possibly something is wrong with my approach to the datastore and scoring, because being locked in RDBMS mindset.
First, a couple of small suggestions:
Does deletion take 200ms per item even when you delete items in a batch process? The fastest way to delete should be to do a keys_only query and then call db.delete() on an entire list of keys at once.
The 30-second limit was recently relaxed to 10 minutes for background work (like the cron jobs or queue tasks that you're contemplating) as of 1.4.0.
These may not fundamentally address your problem, though. I think there's no way to get around the fact that deleting a large number of records (hundreds of thousands, say), will take some time. I'm not sure that this is as big a problem for your use case though, as I can see a couple of techniques that would help.
As you suggest, use a task queue to split up a long-running tasks into several smaller tasks. Your use case (deleting a huge number of items that match a particular query) is ideal for a map-reduce task. Nick Johnson's blog post on the Mapper API may be very helpful for you (so that you don't have to write all of that task management code on your own).
Do you need to delete all the out-of-date board entries immediately? If you had a field that listed which week, month, or year that a particular entry counted for, you could index on that field and then only display entries from the current month on the visible leaderboard. (Disk space is cheap, after all.) And then if you wanted to slowly (over hours, say, instead of milliseconds) remove the out-of-date data, you could do that in the background without ever having incorrect data on your leaderboards.
Delete entities in batches. Although a single delete takes a noticeable amount of time (though 200ms seems very high), batch deletes take no longer, as they delete all the entities in parallel. Task Queue and cron jobs can now run for up to 10 minutes, so timeouts should not be an issue.

Database design - google app engine

I am working with google app engine and using the low leval java api to access Big Table. I'm building a SAAS application with 4 layers:
Client web browser
RESTful resources layer
Business layer
Data access layer
I'm building an application to help manage my mobile auto detailing company (and others like it). I have to represent these four separate concepts, but am unsure if my current plan is a good one:
Appointments
Line Items
Invoices
Payments
Appointment: An "Appointment" is a place and time where employees are expected to be in order to deliver a service.
Line Item: A "Line Item" is a service, fee or discount and its associated information. An example of line items that might go into an appointment:
Name: Price: Commission: Time estimate
Full Detail, Regular Size: 160 75 3.5 hours
$10 Off Full Detail Coupon: -10 0 0 hours
Premium Detail: 220 110 4.5 hours
Derived totals(not a line item): $370 $185 8.0 hours
Invoice: An "Invoice" is a record of one or more line items that a customer has committed to pay for.
Payment: A "Payment" is a record of what payments have come in.
In a previous implementation of this application, life was simpler and I treated all four of these concepts as one table in a SQL database: "Appointment." One "Appointment" could have multiple line items, multiple payments, and one invoice. The invoice was just an e-mail or print out that was produced from the line items and customer record.
9 out of 10 times, this worked fine. When one customer made one appointment for one or a few vehicles and paid for it themselves, all was grand. But this system didn't work under a lot of conditions. For example:
When one customer made one appointment, but the appointment got rained out halfway through resulting in the detailer had to come back the next day, I needed two appointments, but only one line item, one invoice and one payment.
When a group of customers at an office all decided to have their cars done the same day in order to get a discount, I needed one appointment, but multiple invoices and multiple payments.
When one customer paid for two appointments with one check, I needed two appointments, but only one invoice and one payment.
I was able to handle all of these outliers by fudging things a little. For example, if a detailer had to come back the next day, i'd just make another appointment on the second day with a line item that said "Finish Up" and the cost would be $0. Or if I had one customer pay for two appointments with one check, I'd put split payment records in each appointment. The problem with this is that it creates a huge opportunity for data in-congruency. Data in-congruency can be a serious problem especially for cases involving financial information such as the third exmaple where the customer paid for two appointments with one check. Payments must be matched up directly with goods and services rendered in order to properly keep track of accounts receivable.
Proposed structure:
Below, is a normalized structure for organizing and storing this data. Perhaps because of my inexperience, I place a lot of emphasis on data normalization because it seems like a great way to avoid data incongruity errors. With this structure, changes to the data can be done with one operation without having to worry about updating other tables. Reads, however, can require multiple reads coupled with in-memory organization of data. I figure later on, if there are performance issues, I can add some denormalized fields to "Appointment" for faster querying while keeping the "safe" normalized structure intact. Denormalization could potentially slow down writes, but I was thinking that I might be able to make asynchronous calls to other resources or add to the task que so that the client does not have to wait for the extra writes that update the denormalized portions of the data.
Tables:
Appointment
start_time
etc...
Invoice
due_date
etc...
Payment
invoice_Key_List
amount_paid
etc...
Line_Item
appointment_Key_List
invoice_Key
name
price
etc...
The following is the series of queries and operations required to tie all four entities (tables) together for a given list of appointments. This would include information on what services were scheduled for each appointment, the total cost of each appointment and weather or not payment as been received for each appointment. This would be a common query when loading the calendar for appointment scheduling or for a manager to get an overall view of operations.
QUERY for the list of "Appointments" who's "start_time" field lies between the given range.
Add each key from the returned appointments into a List.
QUERY for all "Line_Items" who's appointment_key_List field includes any of the returns appointments
Add each invoice_key from all of the line items into a Set collection.
QUERY for all "Invoices" in the invoice ket set (this can be done in one asynchronous operation using app engine)
Add each key from the returned invoices into a List
QUERY for all "Payments" who's invoice_key_list field contains a key matching any of the returned invoices
Reorganize in memory so that each appointment reflects the line_items that are scheduled for it, the total price, total estimated time, and weather or not it has been paid for.
...As you can see, this operation requires 4 datastore queries as well as some in-memory organization (hopefully the in-memory will be pretty fast)
Can anyone comment on this design? This is the best I could come up with, but I suspect there might be better options or completely different designs that I'm not thinking of that might work better in general or specifically under GAE's (google app engine) strengths, weaknesses, and capabilities.
Thanks!
Usage clarification
Most applications are more read-intensive, some are more write intensive. Below, I describe a typical use-case and break down operations that the user would want to perform:
Manager gets a call from a customer:
Read - Manager loads the calendar and looks for a time that is available
Write - Manager queries customer for their information, I pictured this to be a succession of asynchronous reads as the manager enters each piece of information such as phone number, name, e-mail, address, etc... Or if necessary, perhaps one write at the end after the client application has gathered all of the information and it is then submitted.
Write - Manager takes down customer's credit card info and adds it to their record as a separate operation
Write - Manager charges credit card and verifies that the payment went through
Manager makes an outgoing phone call:
Read Manager loads the calendar
Read Manager loads the appointment for the customer he wants to call
Write Manager clicks "Call" button, a call is initiated and a new CallReacord entity is written
Read Call server responds to call request and reads CallRecord to find out how to handle the call
Write Call server writes updated information to the CallRecord
Write when call is closed, call server makes another request to the server to update the CallRecord resource (note: this request is not time-critical)
Accepted answer::
Both of the top two answers were very thoughtful and appreciated. I accepted the one with few votes in order to imperfectly equalize their exposure as much as possible.
You specified two specific "views" your website needs to provide:
Scheduling an appointment. Your current scheme should work just fine for this - you'll just need to do the first query you mentioned.
Overall view of operations. I'm not really sure what this entails, but if you need to do the string of four queries you mentioned above to get this, then your design could use some improvement. Details below.
Four datastore queries in and of itself isn't necessarily overboard. The problem in your case is that two of the queries are expensive and probably even impossible. I'll go through each query:
Getting a list of appointments - no problem. This query will be able to scan an index to efficiently retrieve the appointments in the date range you specify.
Get all line items for each of appointment from #1 - this is a problem. This query requires that you do an IN query. IN queries are transformed into N sub-queries behind the scenes - so you'll end up with one query per appointment key from #1! These will be executed in parallel so that isn't so bad. The main problem is that IN queries are limited to only a small list of values (up to just 30 values). If you have more than 30 appointment keys returned by #1 then this query will fail to execute!
Get all invoices referenced by line items - no problem. You are correct that this query is cheap because you can simply fetch all of the relevant invoices directly by key. (Note: this query is still synchronous - I don't think asynchronous was the word you were looking for).
Get all payments for all invoices returned by #3 - this is a problem. Like #2, this query will be an IN query and will fail if #3 returns even a moderate number of invoices which you need to fetch payments for.
If the number of items returned by #1 and #3 are small enough, then GAE will almost certainly be able to do this within the allowed limits. And that should be good enough for your personal needs - it sounds like you mostly need it to work, and don't need to it to scale to huge numbers of users (it won't).
Suggestions for improvement:
Denormalization! Try storing the keys for Line_Item, Invoice, and Payment entities relevant to a given appointment in lists on the appointment itself. Then you can eliminate your IN queries. Make sure these new ListProperty are not indexed to avoid problems with exploding indices
Other less specific ideas for improvement:
Depending on what your "overall view of operations" is going to show, you might be able to split up the retrieval of all this information. For example, perhaps you start by showing a list of appointments, and then when the manager wants more information about a particular appointment you go ahead and fetch the information relevant to that appointment. You could even do this via AJAX if you this interaction to take place on a single page.
Memcache is your friend - use it to cache the results of datastore queries (or even higher level results) so that you don't have to recompute it from scratch on every access.
As you've noticed, this design doesn't scale. It requires 4 (!!!) DB queries to render the page. That's 3 too many :)
The prevailing notion of working with the App Engine Datastore is that you want to do as much work as you possibly can when something is written, so that almost nothing needs to be done when something is retrieved and rendered. You presumably write the data very few times, compared to how many times it's rendered.
Normalization is similarly something that you seem to be striving for. The Datastore doesn't place any value in normalization -- it may mean less data incongruity, but it also means reading data is muuuuuch slower (4 reads?!!). Since your data is read much more often than it's written, optimize for reads, even if that means your data will occasionally be duplicated or out of sync for a short amount of time.
Instead of thinking about how the data looks when it's stored, think about how you want the data to look when it's displayed to the user. Store as close to that format as you can, even if that means literally storing pre-rendered HTML in the datastore. Reads will be lightning-fast, and that's a good thing.
So since you should optimize for reads, oftentimes your writes will grow to gigantic proportions. So gigantic that you can't fit it in the 30 second time limit for requests. Well, that's what the task queue is for. Store what you consider the "bare necessities" of your model in the datastore, then fire off a task queue to pull it back out, generate the HTML to be rendered, and put it in there in the background. This might mean your model is immediately ready to display until the task has finished with it, so you'll need a graceful degradation in this case, even if that means rendering it "the slow way" until the data is fully populated. Any further reads will be lightning-quick.
In summary, I don't have any specific advice directly related to your database -- that's dependent on what you want the data to look like when the user sees it.
What I can give you are some links to some super helpful videos about the datastore:
Brett Slatkin's 2008 and 2009 talks on building scalable, complex apps on App Engine, and a great one from this year about data pipelines (which isn't directly applicable I think, but really useful in general)
App Engine Under the Covers: How App Engine does what it does, behind the scenes
AppStats: a great way to see how many datastore reads you're performing, and some tips on reducing that number
Here are a few app-engine specific factors that I think you'll have to contend with:
When querying using an inequality, you can only use an inequality on one property. for example, if you are filtering on an appt date being between July 1st and July 4th, you couldn't also filter by price > 200
Transactions on app engine are a bit tricky compared to the SQL database you are probably used to. You can only do transactions on entities that are in the same "entity group".

app engine data pipelines talk - for fan-in materialized view, why are work indexes necessary?

I'm trying to understand the data pipelines talk presented at google i/o:
http://www.youtube.com/watch?v=zSDC_TU7rtc
I don't see why fan-in work indexes are necessary if i'm just going to batch through input-sequence markers.
Can't the optimistically-enqueued task grab all unapplied markers, churn through as many of them as possible (repeatedly fetching a batch of say 10, then transactionally update the materialized view entity), and re-enqueue itself if the task times out before working through all markers?
Does the work indexes have something to do with the efficiency querying for all unapplied markers? i.e., it's better to query for "markers with work_index = " than for "markers with applied = False"? If so, why is that?
For reference, the question+answer which led me to the data pipelines talk is here:
app engine datastore: model for progressively updated terrain height map
A few things:
My approach assumes multiple workers (see ShardedForkJoinQueue here: http://code.google.com/p/pubsubhubbub/source/browse/trunk/hub/fork_join_queue.py), where the inbound rate of tasks exceeds the amount of work a single thread can do. With that in mind, how would you use a simple "applied = False" to split work across N threads? Probably assign another field on your model to a worker's shard_number at random; then your query would be on "shard_number=N AND applied=False" (requiring another composite index). Okay that should work.
But then how do you know how many worker shards/threads you need? With the approach above you need to statically configure them so your shard_number parameter is between 1 and N. You can only have one thread querying for each shard_number at a time or else you have contention. I want the system to figure out the shard/thread count at runtime. My approach batches work together into reasonably sized chunks (like the 10 items) and then enqueues a continuation task to take care of the rest. Using query cursors I know that each continuation will not overlap the last thread's, so there's no contention. This gives me a dynamic number of threads working in parallel on the same shard's work items.
Now say your queue backs up. How do you ensure the oldest work items are processed first? Put another way: How do you prevent starvation? You could assign another field on your model to the time of insertion-- call it add_time. Now your query would be "shard_number=N AND applied=False ORDER BY add_time DESC". This works fine for low throughput queues.
What if your work item write-rate goes up a ton? You're going to be writing many, many rows with roughly the same add_time. This requires a Bigtable row prefix for your entities as something like "shard_number=1|applied=False|add_time=2010-06-24T9:15:22". That means every work item insert is hitting the same Bigtable tablet server, the server that's currently owner of the lexical head of the descending index. So fundamentally you're limited to the throughput of a single machine for each work shard's Datastore writes.
With my approach, your only Bigtable index row is prefixed by the hash of the incrementing work sequence number. This work_index value is scattered across the lexical rowspace of Bigtable each time the sequence number is incremented. Thus, each sequential work item enqueue will likely go to a different tablet server (given enough data), spreading the load of my queue beyond a single machine. With this approach the write-rate should effectively be bound only by the number of physical Bigtable machines in a cluster.
One disadvantage of this approach is that it requires an extra write: you have to flip the flag on the original marker entity when you've completed the update, which is something Brett's original approach doesn't require.
You still need some sort of work index, too, or you encounter the race conditions Brett talked about, where the task that should apply an update runs before the update transaction has committed. In your system, the update would still get applied - but it could be an arbitrary amount of time before the next update runs and applies it.
Still, I'm not the expert on this (yet ;). I've forwarded your question to Brett, and I'll let you know what he says - I'm curious as to his answer, too!

Resources