Best implementation of turn-based access on App Engine?

Best implementation of turn-based access on App Engine? - google-app-engine

I am trying to implement a 2-player turn-based game with a GAE backend. The first thing this game requires is a very simple match making system that operates like this:
User A asks the backend for a match. The back ends tells him to come back later
User B asks the backend for a match. He will be matched with A.
User C asks the backend for a match. The back ends tells him to come back later
User D asks the backend for a match. He will be matched with C.
and so on...
(edit: my assumption is that if I can figure this one out, most other operation i a turn based game can use the same implementation)
This can be done quite easily in Apple Gamecenter and Xbox Live, however I would rather implement this on an open and platform independent backend like GAE. After some research, I have found the following options for a GAE implementation:
use memcache. However, there is no guarantee that the memcache is synchronized across different instances. I did some tests and could actually see match request disappearing due to memcache mis-synchronization.
Harden memcache with Sharding Counters. This does not always solve the multiple instance problem and mayabe results in high memcache quota usage.
Use memcache with Compare and Set. Does not solve the multiple instance problem when used as a mutex.
task queues. I have no idea how to use these but someone mentioned as a possible solution. However, I am afraid that queues will eat me GAE quota very quickly.
push queues. Same as above.
transaction. Same as above. Also probably very expensive.
channels. Same as above. Also probably very expensive.
Given that the match making is a very basic operation in online games, I cannot be the first one encountering this. Hence my questions:
Do you know of any safe mechanism for match making?
If multiple solutions exist, which is the cheapest (in terms of GAE quota usage) solution?

You could accomplish this using a cron tasks in a scheme like this:
define MatchRequest:
requestor = db.StringProperty()
opponent = db.StringProperty(default = '')
User A asks for a match, a MatchRequest entity is created with A as the requestor and the opponent blank.
User A polls to see when the opponent field has been filled.
User B asks for a match, a MatchRequest entity is created with B as as the requestor.
User B pools to see when the opponent field has been filled.
A cron job that runs every 20 seconds? or so runs:
Grab all MatchRequest where opponent == ''
Make all appropriate matches
Put all the MatchRequests as a transaction
Now when A and B poll next they will see that they they have an opponent.
According to the GAE docs on crons free apps can have up to 20 free cron tasks. The computation required for these crons for a small amount of users should be small.
This would be a safe way but I'm not sure if it is the cheapest way. It's also pretty easy to implement.

Related

DB consistency with microservices

What is the best way to achieve DB consistency in microservice-based systems?
At the GOTO in Berlin, Martin Fowler was talking about microservices and one "rule" he mentioned was to keep "per-service" databases, which means that services cannot directly connect to a DB "owned" by another service.
This is super-nice and elegant but in practice it becomes a bit tricky. Suppose that you have a few services:
a frontend
an order-management service
a loyalty-program service
Now, a customer make a purchase on your frontend, which will call the order management service, which will save everything in the DB -- no problem. At this point, there will also be a call to the loyalty-program service so that it credits / debits points from your account.
Now, when everything is on the same DB / DB server it all becomes easy since you can run everything in one transaction: if the loyalty program service fails to write to the DB we can roll the whole thing back.
When we do DB operations throughout multiple services this isn't possible, as we don't rely on one connection / take advantage of running a single transaction.
What are the best patterns to keep things consistent and live a happy life?
I'm quite eager to hear your suggestions!..and thanks in advance!

This is super-nice and elegant but in practice it becomes a bit tricky
What it means "in practice" is that you need to design your microservices in such a way that the necessary business consistency is fulfilled when following the rule:
that services cannot directly connect to a DB "owned" by another service.
In other words - don't make any assumptions about their responsibilities and change the boundaries as needed until you can find a way to make that work.
Now, to your question:
What are the best patterns to keep things consistent and live a happy life?
For things that don't require immediate consistency, and updating loyalty points seems to fall in that category, you could use a reliable pub/sub pattern to dispatch events from one microservice to be processed by others. The reliable bit is that you'd want good retries, rollback, and idempotence (or transactionality) for the event processing stuff.
If you're running on .NET some examples of infrastructure that support this kind of reliability include NServiceBus and MassTransit. Full disclosure - I'm the founder of NServiceBus.
Update: Following comments regarding concerns about the loyalty points: "if balance updates are processed with delay, a customer may actually be able to order more items than they have points for".
Many people struggle with these kinds of requirements for strong consistency. The thing is that these kinds of scenarios can usually be dealt with by introducing additional rules, like if a user ends up with negative loyalty points notify them. If T goes by without the loyalty points being sorted out, notify the user that they will be charged M based on some conversion rate. This policy should be visible to customers when they use points to purchase stuff.

I don’t usually deal with microservices, and this might not be a good way of doing things, but here’s an idea:
To restate the problem, the system consists of three independent-but-communicating parts: the frontend, the order-management backend, and the loyalty-program backend. The frontend wants to make sure some state is saved in both the order-management backend and the loyalty-program backend.
One possible solution would be to implement some type of two-phase commit:
First, the frontend places a record in its own database with all the data. Call this the frontend record.
The frontend asks the order-management backend for a transaction ID, and passes it whatever data it would need to complete the action. The order-management backend stores this data in a staging area, associating with it a fresh transaction ID and returning that to the frontend.
The order-management transaction ID is stored as part of the frontend record.
The frontend asks the loyalty-program backend for a transaction ID, and passes it whatever data it would need to complete the action. The loyalty-program backend stores this data in a staging area, associating with it a fresh transaction ID and returning that to the frontend.
The loyalty-program transaction ID is stored as part of the frontend record.
The frontend tells the order-management backend to finalize the transaction associated with the transaction ID the frontend stored.
The frontend tells the loyalty-program backend to finalize the transaction associated with the transaction ID the frontend stored.
The frontend deletes its frontend record.
If this is implemented, the changes will not necessarily be atomic, but it will be eventually consistent. Let’s think of the places it could fail:
If it fails in the first step, no data will change.
If it fails in the second, third, fourth, or fifth, when the system comes back online it can scan through all frontend records, looking for records without an associated transaction ID (of either type). If it comes across any such record, it can replay beginning at step 2. (If there is a failure in step 3 or 5, there will be some abandoned records left in the backends, but it is never moved out of the staging area so it is OK.)
If it fails in the sixth, seventh, or eighth step, when the system comes back online it can look for all frontend records with both transaction IDs filled in. It can then query the backends to see the state of these transactions—committed or uncommitted. Depending on which have been committed, it can resume from the appropriate step.

I agree with what #Udi Dahan said. Just want to add to his answer.
I think you need to persist the request to the loyalty program so that if it fails it can be done at some other point. There are various ways to word/do this.
1) Make the loyalty program API failure recoverable. That is to say it can persist requests so that they do not get lost and can be recovered (re-executed) at some later point.
2) Execute the loyalty program requests asynchronously. That is to say, persist the request somewhere first then allow the service to read it from this persisted store. Only remove from the persisted store when successfully executed.
3) Do what Udi said, and place it on a good queue (pub/sub pattern to be exact). This usually requires that the subscriber do one of two things... either persist the request before removing from the queue (goto 1) --OR-- first borrow the request from the queue, then after successfully processing the request, have the request removed from the queue (this is my preference).
All three accomplish the same thing. They move the request to a persisted place where it can be worked on till successful completion. The request is never lost, and retried if necessary till a satisfactory state is reached.
I like to use the example of a relay race. Each service or piece of code must take hold and ownership of the request before allowing the previous piece of code to let go of it. Once it's handed off, the current owner must not lose the request till it gets processed or handed off to some other piece of code.

Even for distributed transactions you can get into "transaction in doubt status" if one of the participants crashes in the midst of the transaction. If you design the services as idempotent operation then life becomes a bit easier. One can write programs to fulfill business conditions without XA. Pat Helland has written excellent paper on this called "Life Beyond XA". Basically the approach is to make as minimum assumptions about remote entities as possible. He also illustrated an approach called Open Nested Transactions (http://www.cidrdb.org/cidr2013/Papers/CIDR13_Paper142.pdf) to model business processes. In this specific case, Purchase transaction would be top level flow and loyalty and order management will be next level flows. The trick is to crate granular services as idempotent services with compensation logic. So if any thing fails anywhere in the flow, individual services can compensate for it. So e.g. if order fails for some reason, loyalty can deduct the accrued point for that purchase.
Other approach is to model using eventual consistency using CALM or CRDTs. I've written a blog to highlight using CALM in real life - http://shripad-agashe.github.io/2015/08/Art-Of-Disorderly-Programming May be it will help you.

Keeping Consistent Count in Google App Engine

I am looking for suggestions on a very common problem on Google App Engine platform for keeping consistent counters.
I have a task to load the groups of a domain and then create a task for each group to load its group members in a separate task. Now as there are thousands of groups and members there will be too many tasks.
I will be creating one task to get one page of groups and within that task I will be creating multiple tasks for each group to get its members.Now, to know whether I have loaded all groups or not, I have the logic to just check the nextPageToken and then set the flag of groups loading to finished.
However as there will be separate tasks for each group to load members, I need to keep track of all whether all group member tasks have finished or not. Now here I have a problem that various tasks accessing a single count of numGroupMembersFinished, will create concurrency issues and somewhere the count will get corrupted and not return correct data.

My answer is general because your question doesn't have any code or proposed solution since you don't say where you plan to keep that counter.
Many articles on the web cover this. Google for "sharding counters" for a semi-scalable way to count datastore entities quickly in O(1) time.
more importantly look at the memcache api. It has a function to atomically increment/decrement counters stored there. That one is guaranteed to never have concurrency issues however you would still need some way to recover and/or double-check that the memcache entry wasn't evicted, maybe by also keeping the count stored in an entity that you set asynchronously and "get by key" to always get its latest value.
this still isn't 100% bulletproof because the cache could be evicted at the same moment that you have many concurrent attempts to modify it thus your backup datastore entity could miss a "set".
You need to calculate, based on your expected concurrent usage, if those chances to miss an increment/decrement are greater than a comet hitting the earth. Hopefully you wont use it on an air traffic controller.

you could use the MapReduce or Pipeline API:
https://github.com/GoogleCloudPlatform/appengine-mapreduce
https://github.com/GoogleCloudPlatform/appengine-pipelines
allowing you to split your problem into smaller manageable parts whereby the library can handle all of the details of signaling/blocking between tasks, gathering the results, and handing them back to you when it's done
Google I/O 2010 - Data pipelines with Google App Engine:
https://www.youtube.com/watch?v=zSDC_TU7rtc
Google I/O 2011: Large-scale Data Analysis Using the App Engine Pipeline API:
https://www.youtube.com/watch?v=Rsfy_TYA2ZY
Google I/O 2011: App Engine MapReduce:
https://www.youtube.com/watch?v=EIxelKcyCC0
Google I/O 2012 - Building Data Pipelines at Google Scale:
https://www.youtube.com/watch?v=lqQ6VFd3Tnw

Zig Mandel mentioned it, here's the link to Google's own recipe for implementing a counter:
https://cloud.google.com/appengine/articles/sharding_counters
I copy-pasted (renamed some variables, etc...) the configurable sharded counter into my app and it's working great!

I used this tutorial: https://cloud.google.com/appengine/articles/sharding_counters together with hashid library and created this golang library:
https://github.com/janekolszak/go-gae-uid
gen := gaeuid.NewGenerator("Kind", "HASH'S SALT", 11 /*id length*/)
c := appengine.NewContext(r)
id, err = gen.NewID(c)
The same approach should be easy for other languages.

Google App Engine and data replication with multiple instances

I'm using the Google App Engine as the backend for an iOS game that was just released.
Through that act of playing the game, players create levels and then those are shared with their friends and the world at large. GAE is used to store and retrieve those levels. GAE also Manages player's high scores since they are more complex than Game Center can handle.
As a whole, GAE works great. I like how GAE spins up new instances as they are needed without me having to constantly monitor load. For this game, GAE is running around 10 instance and serving around 8 queries a second.
But there is a small problem.
I've noticed that sometimes players will get on the high score table twice. This should not be possible since I remove any old scores before putting up the new scores (this is done in one query to GAE).
After some testing and poking around, it seems that what is happening is that a player will get a high score and instance 1 handles the removing of the old score and the adding of the new one. The player then gets a new high score, but this time instance 4 is the one that handles the request and it doesn't know about the other score yet.
At their fastest, it might take a player 10 seconds to get a new high score. It was my understanding that the replication of data only took 2 or 3 seconds.
I never saw this problem during testing because load rarely caused 2 instances to be started.
Does this seem like a plausible explanation for what is happening and how data is stored for each instance?
Is there a way to guarantee that data added, deleted or altered in one instance will be available in another? High scores are not "mission critical", so I'm not too worried about it, but I would like to use GAE for some more complex situations where it is very important that data is consistent.
Is that possible with GAE, or should I be looking at other solutions?

It is possible to guarantee that data will be consistent across all data centers (strong consistency). You need to use ancestor queries to achieve it. However, doing so poses a restriction on how many write per seconds you can achieve. Currently the limit is 1 write per second.
If the write limit is too slow for you, one alternative is to add a cache layer. So you will still be using the eventual consistency model, but you will mix those results with the ones in memcache.
See the doc Structuring for Strong Consistency for further details.

Is a real-time multiplayer game using Google App Engine feasible?

I am currently developing a real-time multiplayer game, and have been evaluating various cloud-based hosting solutions. I am unsure whether App Engine fits my needs, and would be grateful for any feedback.
In essence, I want the system to work like this: Player A calculates round n, and generates a hash out of the game state at the end of that round. He then sends his commands for that round, and the hash, as a http POST to the server. Player B does the same thing, in parallel.
The server, while handling the POST from a player, first writes the received hash code to the memcache. If the hash from the other player is not yet in the memcache, it waits and periodically checks the memcache for the other players hash. As soon as both hashes are in the memcache, it compares them for equality. If they are equal, the server sends the commands of each player to the respectively other one as the http response.
A round like that should last around half a second, meaning two requests per player per second.
Of course, this way of doing it will only work if there are at least two instances of the application running, as two requests must be dealt with in parallel. Also, the memory cache must be consistent over all instances, be fairly reliable, and update immediately.
I cannot use XMPP because I want my game to be able to run within restricted networks, so it has to be limited to http on port 80.
Is there a way to enforce that two instances of the app are always running? Are there glaringly obvious flaws in my design? Do you think an architecture like this might work on App Engine? If not, what cloud based solution would you suggest?

I believe this could work. The key API for you to learn about / test would probably be the Channel API. That is what would allow back and forth communication between the client and server.
The next issue to worry about would be memcache. In general, it is reliable, but in the strictest sense we are supposed to assume that memcached data could disappear at any time.
If you decide that you can't risk losing the data like that, then you need to persist it in the datastore, which means you will have to experiment to make sure you can sustain 2 moves per turn. I think this is possible, but not trivially so. If you had said 1 move every 3 seconds I would say "no problem." But multiple updates to one entity per second start to bump up against the practical limit on writes per second, especially if they are transactional.
Having multiple instances running will not be a problem - you can pay to keep instances warm if necessary.

Google App Engine - How to implement the activity stream in a social network

I want some ideas on the best practice to implement an activity stream for a social network im building in app engine (PYTHON)
I first want to keep a log for all activities of each user - so that we have a history. i.e. someone became a friend, added a picture, changed their address etc. This way we have a users history available should we need it. Also mean we can remove friendship joins, change user data but have a historical log.
I also want to stream a users activity to their friends. for this only the last X activities need to be kept - that is in the scenario that messages are sent to friends when an activity occurs.
Its pretty straight forward designing a history log - ie: when, what, where. The complication comes as to how we notify friends of a user as to their activity.
In our app friendships are not mutual - ie they are based on the twitter following model. Some accounts could have thousands of followers.
What is the best approach to model this.
using a many to many join table and doing a costly query -
using a feed class that fired a copy of the activity to all the subscribers - maybe into mcache? As their maybe a need to fire thousands of messages i would imagine a cron job would need to be used.
Any help ideas thoughts on this
Thx

There's a great talk by Brett Slatkin called Building Scalable, Complex Apps on App Engine from last year's Google I/O, in which the example is a Twitter-like application, where users' updates are pushed to their followers. Basically exactly what you're trying to do.
I highly recommend the video for anyone writing an App Engine app, it's really helpful.

Don't do joins. They're too expensive, you'll burn through your quota in no time.
You can use a task queue, it's a bit like a cron job (i.e. stuff happens outside of the original request) but you can start them at will. memcache would be good if you're ok with loosing some activity at times the cache is flushed...

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight