Sharding and managing very busy variable - google-app-engine

I have one variable pool, shared by all clients, that stores all remaining enemies in a game. When a client starts a game, he gets some enemies from pool. When he finishes, enemies that he did not kill are put back into pool. The game also checks to see if all the enemies have been killed (i.e., pool is empty).
What is the best way to implement pool? I'm concerned about the 5 updates per second limit on a datastore entity.

How often do expect a game to be started/finished? If the answer is definitely less than 5 per second, then just use a single entity to represent pool and use transactions to atomically get and update it as games start and finish.
If you're really expecting so many clients to share a single pool that it will be updated at a sustained rate of more that 5 times per second, then consider sharding pool into multiple pieces. When the client starts a game, remove entities from just one of the shards. To test if the sharded pool is empty, just retrieve all the shards and see if they are all empty. (When modifying a shard, you'll still need to use a transaction.)

Related

How to efficiently wait for data becoming available in RDBS (PostgreSQL) table?

I'm building a web service which reserves unique items to users.
Service is required to handle high amounts of concurrent requests that should avoid blocking each other as much as possible. Each incoming request must reserve n-amount of unique items of the desired type, and then process them successfully or release them back to the reservables list so they can be reserved by an another request. A succesful processing contains multiple steps like communicating with integrated services and other time consuming steps, so keeping items reserved with a DB transaction until the end would not be an efficient solution.
Currently I've implemented a solution where reservable items are stored in a buffer DB table where items are being locked and deleted by incoming requests with SELECT FOR UPDATE SKIP LOCKED. As service must support multiple item types, this buffer table contains only n amount of items per type at a time as the table size would otherwise grow into too big as there is about ten thousand different types. When certain item types are all reserved (selected and removed) the request locks the item type and adds more reservable items into the buffer. This fill operation requires integration calls and may take some time. During the fill, all other operations needs to wait until the filling operation finishes and items become available. This is where the problem arises. When thousands of requests wait for the same item type to become available in the buffer, each needs to poll this information somehow.
What could be an efficient solution for this kind of polling?
I think the "real" answer is to start the refill process when the stock gets low, rather than when it is completely depleted. Then it would already be refilled by the time anyone needs to block on it. Or perhaps you could make the refill process work asynchronously, so that the new rows are generated near-instantly and then the integrations are called later. So you would enqueue the integrations, rather than the consumers.
But barring that, it seems like you want the waiters to lock the "item type" in a mode incompatible with the how the refiller locks it. Then it will naturally block, and be released once the refiller is done. The problem is that if you want to assemble an order of 50 things and the 47th is depleted, do you want to maintain the reservation on the previous 46 things while you wait?
Presumably your reservation is not blocking anyone else, unless the one you have reserved is the last one available. In which case you are not really blocking them, just forcing them to go through the refill process, which would have had to be done eventually anyway.

GAE Map Reduce - hitting entities more than once?

I'm using Map Reduce (http://code.google.com/p/appengine-mapreduce/) to do an operation over a set of entities. However, I am finding my operations are being duplicated.
Are map reduce maps sometimes called more than once for a specific entity? Is this the case even if they don't fail the initial time?
edit: here are some more details.
def reparent_request(entity):
#check if the entity has a parent
if not is_valid_to_reparent(entity):
return
#copy it
try:
copy = clone_entity(Request, entity, parent=entity.user)
copy.put() #we hard put here so we can use the reference later in this function.
except:
...
... update some references to the copied object ...
#delete the original
yield op.db.Delete(entity)
At the end, I am non-deterministically left with two entities, both with the new parent.
I've reparented a load of entities before - it was a nightmare because of the exact problem you're facing.
What I would do instead is:
Create a new queue. Ensure its paused and that you have a lot of storage space dedicated to queues. It's only temporary, but you'll need it.
Instead of editing your entities in your map reduce job, add them to the queue with a name that will be unique for each entity. The key works fine.
When adding to the queue, because it's paused you'll get an error if you try and add the same named queue twice - so catch the error and skip it, because you know that entity must already have been touched by the map reduce job.
When you're confident that every entity has a matching queue task and the map reduce job has finished, unpause your queue. The queue will do the reparenting.
A couple of notes:
* the task queue size can get pretty big. Can't remember numbers, but it was gigs. Also the size of the queue doesn't update in real time - so don't worry that it might still says gigs of tasks when the queue is nearly empty.
* the reliability of the queue storage is an unknown I believe. It didn't happen to us, but queue items could disappear I guess. Fortunately, you can rerun this process multiple times to ensure it worked, especially if you're deleting entities.
* you may want to ensure you queue has a concurrency limit on it. Without one, a delay in the execution of a couple of tasks can absolutely cripple your application. Learnt that the hard way! I think 30 concurrent tasks went quite well for us.
Hope that's useful, let me know if you come up with any improvements!
App Engine mapreduce runs on the task queue, and like anything else that uses the task queue, tasks have to be idempotent - that is, running them multiple times should have the same effect as running them once. Tasks will occasionally be run more than once; the mapreduce library may have its own reasons for rerunning mapper tasks, too.
In your situation, I'd suggest creating the new entity with a key whose ID is the same as the old entity; that way running it multiple times will just overwrite the same entity.

Google App Engine Go score counting and saving

I want to make a simple GAE app in Go that will let users vote and store their answers in two ways. First way will be raw data (Database store of "voted for X"), the second will be a running count of those votes ("12 votes for X, 10 votes for Y"). What is an effective way to store both of those values with the app being accessed by multiple people at the same time? If I retrieve the data from the Datastore, change it, and save it back for one instance, another might be wanting to do the same in parallel, and I`m not sure if the final result will be correct.
It seems like a good way to do that is to simply store all vote events as separate entities (the "voted for X" way) and use the Task Queue for the recalculation (the "12 votes for X, 10 votes for Y" way), so the recalculation is done offline and sequentially (without any races and other concurrency issues). Then you'd have to put the recalc task every once in a while to the queue so the results are updated.
The Task Queue doesn't allow adding another task with the same name as an existing one, but doesn't allow checking whether a specific task is already enqueued, so maybe simply trying adding a task with a same name to the queue will be enough to be sure that multiple recalc tasks are not there.
Another way would be to use a goroutine waiting for a poke from an input channel in order to recalculate the results. I haven't run such goroutines on App Engine so I'm not sure of the general behavior of this approach.

Transactional counter with 5+ writes per second in Google App Engine datastore

I'm developing a tournament version of a game where I expect 1000+ simultaneous players. When the tournament begins, players will be eliminated quite fast (possibly more than 5 per second), but the process will slow down as the tournament progresses. Depending when a player is eliminated from the tournament a certain amount of points is awarded. For example a player who drops first, gets nothing, while player who is 500th, receives 1 point and the first place winner receives say 200 points. Now I'd like to award and display the amount of points right away after a player has been eliminated.
The problem is that when I push a new row into a datastore after a player has been eliminated, the row entity has to be in a separate entity group so I would not hit the gae datastore limit of 1-5 writes per second for 1 entity group. Also I need to be able to read and write a count of rows consistently so I can determine the prize correctly for all the players that get eliminated.
What would be the best way to implement the datamodel to support this?
Since there's a limited number of players, contention issues over a few a second are not likely to be sustained for very long, so you have two options:
Simply ignore the issue. Clusters of eliminations will occur, but as long as it's not a sustained situation, the retry mechanics for transactions will ensure they all get executed.
When someone goes out, record this independently, and update the tournament status, assigning ranks, asynchronously. This means you can't inform them of their rank immediately, but rather need to make an asynchronous reply or have them poll for it.
I would suggest the former, frankly: Even if half your 1000 person tournament went out in the first 5 minutes - a preposterously unlikely event - you're still looking at less than 2 eliminations per second. In reality, any spikes will be smaller and shorter-lived than that.
One thing to bear in mind is that due to how transaction retries work, transactions on the same entity group that occur together will be resolved in semi-random order - that is, it's not a strict FIFO queue. If you require that, you'll have to enforce it yourself, though that's a far from trivial thing to do in a distributed system of any sort.
the existing comments and answers address the specific question pretty well.
at a higher level, take a look at this post and open source library from the google code jam team. they had a similar problem and ended up developing a scalable scoreboard based on the datastore that handles both updates and requests for arbitrary pages efficiently.

app engine data pipelines talk - for fan-in materialized view, why are work indexes necessary?

I'm trying to understand the data pipelines talk presented at google i/o:
http://www.youtube.com/watch?v=zSDC_TU7rtc
I don't see why fan-in work indexes are necessary if i'm just going to batch through input-sequence markers.
Can't the optimistically-enqueued task grab all unapplied markers, churn through as many of them as possible (repeatedly fetching a batch of say 10, then transactionally update the materialized view entity), and re-enqueue itself if the task times out before working through all markers?
Does the work indexes have something to do with the efficiency querying for all unapplied markers? i.e., it's better to query for "markers with work_index = " than for "markers with applied = False"? If so, why is that?
For reference, the question+answer which led me to the data pipelines talk is here:
app engine datastore: model for progressively updated terrain height map
A few things:
My approach assumes multiple workers (see ShardedForkJoinQueue here: http://code.google.com/p/pubsubhubbub/source/browse/trunk/hub/fork_join_queue.py), where the inbound rate of tasks exceeds the amount of work a single thread can do. With that in mind, how would you use a simple "applied = False" to split work across N threads? Probably assign another field on your model to a worker's shard_number at random; then your query would be on "shard_number=N AND applied=False" (requiring another composite index). Okay that should work.
But then how do you know how many worker shards/threads you need? With the approach above you need to statically configure them so your shard_number parameter is between 1 and N. You can only have one thread querying for each shard_number at a time or else you have contention. I want the system to figure out the shard/thread count at runtime. My approach batches work together into reasonably sized chunks (like the 10 items) and then enqueues a continuation task to take care of the rest. Using query cursors I know that each continuation will not overlap the last thread's, so there's no contention. This gives me a dynamic number of threads working in parallel on the same shard's work items.
Now say your queue backs up. How do you ensure the oldest work items are processed first? Put another way: How do you prevent starvation? You could assign another field on your model to the time of insertion-- call it add_time. Now your query would be "shard_number=N AND applied=False ORDER BY add_time DESC". This works fine for low throughput queues.
What if your work item write-rate goes up a ton? You're going to be writing many, many rows with roughly the same add_time. This requires a Bigtable row prefix for your entities as something like "shard_number=1|applied=False|add_time=2010-06-24T9:15:22". That means every work item insert is hitting the same Bigtable tablet server, the server that's currently owner of the lexical head of the descending index. So fundamentally you're limited to the throughput of a single machine for each work shard's Datastore writes.
With my approach, your only Bigtable index row is prefixed by the hash of the incrementing work sequence number. This work_index value is scattered across the lexical rowspace of Bigtable each time the sequence number is incremented. Thus, each sequential work item enqueue will likely go to a different tablet server (given enough data), spreading the load of my queue beyond a single machine. With this approach the write-rate should effectively be bound only by the number of physical Bigtable machines in a cluster.
One disadvantage of this approach is that it requires an extra write: you have to flip the flag on the original marker entity when you've completed the update, which is something Brett's original approach doesn't require.
You still need some sort of work index, too, or you encounter the race conditions Brett talked about, where the task that should apply an update runs before the update transaction has committed. In your system, the update would still get applied - but it could be an arbitrary amount of time before the next update runs and applies it.
Still, I'm not the expert on this (yet ;). I've forwarded your question to Brett, and I'll let you know what he says - I'm curious as to his answer, too!

Resources