I'm building a simple "play against a random opponent" back-end using Goole App Engine. So far I'm adding each user that wants to play into a "table" in the Datastore. As soon as there are more than 1 player in the Datastore, I can start to match them.
The Schedule Tasks with Cron looked promising for this work until I saw that the lowest resolution seems to be minutely. If there are plenty of players signing up I want them to be matched quickly and not have to wait a whole minute (worst case).
I thought about having the servlet that recives the "play against random opponent" request POST to a Task Queue that would do the match making, but I think this will lead to a lot of contention when reading from the Datastore and deleting the enteties from the "random" table after they have been matched?
Basically I want a single worker that will do the matching, and I want to signal this worker from time to time that now is a good time to try to match opponents.
Any suggestions on what would be the right course of action here?
You can guarantee exclusive access via transactions:
Receive a request to play via REST. Check (within a transaction) if there is any request in database.
If there is, notify both users to start the play and delete request (transactionaly) from database.
If there isn't, add it to the database and wait for the next request.
Update:
Alternativelly you can achieve what you need via pull queue. Same scenario as above, just instead of datastore you'd check if there is a task in the pull queue, retrieve if there is or create a new one if there isn't one.
Related
I have a GAE/P/Standard/FirstGen app that sends a lot of email with Sendgrid. Sendgrid sends my app a lot of notifications for when email is delivered, opened, etc.
This is how I process the Sendgrid notifications:
My handler processes the Sendgrid notification and adds a task to a pull queue
About once every minute I lease a batch of tasks from the pull queue to process them.
This works great except when I am sending more emails than usual. When I am adding tasks to the pull queue at a high rate, the pull queue refuses to lease tasks (it responds with TransientError) so the pull queue keeps filling up.
What is the best way to scale this procedure?
If I create a second pull queue and split the tasks between the two of them, will that double my capacity? Or is there something else I should consider?
====
This is how I add tasks:
q = taskqueue.Queue("pull-queue-name")
q.add(taskqueue.Task(data, method="PULL", tag=tag_name))
I have found some information about it in Google documentation here. According to it solution for TransientError should be to:
catch these exceptions, back off from calling lease_tasks(), and then
try again later.
etc.
Actually I suppose this is App Engine Task queue, not Cloud Tasks which are different product.
According to my understanding there is no option to scale this better. It seems that the some solution might be to migrate to Cloud Task and Pub/Sub which is better way to manage queues in GAE as you may find here.
I hope it will help somehow... :)
I want to send a particular HTTP request (or otherwise communicate a message) to every (dynamic/autoscaled) instance which is currently running for a particular App Engine application.
My goal is to trigger each instance to discard some locally cached data (because I have just modified the underlying data and want them to reload it).
One possible solution is to store a value in Memcache, and have instances check this each time they handle a request to see if they should flush their cache. But this adds latency to every request.
Another possible solution would be to somehow stop all running instances. No fixed overhead, but some impact while instances are restarted.
An even less desirable solution would be to redeploy the application code in order to cause all instances to be stopped. This now adds additional delay on my end as a deployment takes some time.
You could use the management API to list instances for a given version, but I'd suggest that you'd probably want to use something like the PubSub API to create a subscription on each of your App Engine instances. Since each instance has its own subscription, any messages sent to the monitored queue will be received by all instances.
You can create the subscription at startup (the /_ah/start endpoint may be useful), and then delete it at shutdown (using the /_ah/stop endpoint).
It looks like I can create a push-queue that will start backends to process tasks and that I can limit the number of workers to 1. However, is there a way to do this with a pull-queue?
Can App-Engine auto-start a named backend when a pull-queue has tasks and then let it expire when idle and the queue is empty?
It looks like I just need some way to call an arbitrary URL to "notify" it that there are tasks to process but I'm unable to find any documentation on how this can be done.
Use a cron task or a push queue to periodically start the backend. The backend can loop through the tasks (if any) in the pull queue, and then expire.
There isn't a notification system for pull queues, only inspection through queue statistics and empty/non-empty lease results.
First of all you need to decide scalability type you want to use for your module. I think that you should take a look to Basic Scaling (https://developers.google.com/appengine/docs/java/modules/)
Next, to process tasks from pull queue you can use Cron to check queues every several minutes. It will be important to request not basic scaling module, but frontend module, cause cron will start instances. The problem is that you will still need to pay for at least one instance of frontend cause your cron job will not allow it to shutdown.
So the solution could be the following:
Start cron every 1 or 5 minutes and call frontend
Check queue in frontend and issue URLFetch request to basic scaling module if there are tasks in pull queue
Process tasks in queue using basic scaling module
If you use F1 instances for frontend and b2 or greate for other modules it could save you some money.
Every minute or so my app creates some data and needs to send it out to more than 1000 remote servers via URL Fetch callbacks. The callback URL for each server is stored on separate entities. The time lag between creating the data and sending it to the remote servers should be roughly less than 5 seconds.
My initial thought is to use the Pipeline API to fan out URL Fetch requests to different task queues.
Unfortunately task queues are not guaranteed to be executed in a timely fashion. Therefore from requesting a task queue start to it actually executing could take minutes to hours. From previous experience this gap is regularly over a minute so is not necessarily appropriate.
Is there any way from within App Engine to achieve what I want? Maybe you know of an outside service that can do the fan out in a timely fashion?
Well, there's probably no good solution for the gae here.
You could keep a backend running; hammering the datastore/memcache
every second for new data to send out, and then spawn dozens of async url-fetches.
But thats really inefficient...
If you want a 3rd party service, pubnub.com is capable of doing fan-out, however i don't know if it could fit in your setup.
How about using the async API? You could then do a large number of simultaneous URL calls, all from a single location.
If the performance is particularly sensitive, you could do them from a backend and use a B8 instance.
We're using sessions in our GAE/J application. Over the weekend, we had a large spike in our datastore writes that appears to have been caused by a large number of _ah_SESSION entities being created (about 100-200 every minute). Near as we can tell, there was a rogue task queue creating them because they stopped when we purged the queue. The task was part of a mapper process we run hourly.
We don't need sessions in that hourly mapper (or indeed in any of our task queues or cron jobs or many other requests). Is there a way to disable creating a session for selected URLs?
Unfortunately that can not be done.
This is particularly nasty when you have a non-browser clients (devices via REST or mapreduce jobs) where every request generates a new _ah_SESSION entity in the database.
The only way to avoid this is to write your own session handler: e.g. a servlet filter that sets/checks cookies and set it so that it ignores certain paths.
EDIT:
I just realized that there could be another way: make sure your client (mapreduce job) sets a dummy cookie with a proper name. GAE uses cookies named ACSID in production and dev_appserver_login on dev server. Just use always the same cookie value, so all requests will be treated as one user/session.
There will still be overhead of looking-up/saving session objects, but at least it will not create countless _ah_SESSION entities.