My website on GAE-Python has a function to calculate some math using Evolutionary Optimization Algorithm, which will be called by an ajax request when the user click a button. Each request usually takes very long time to finish calculation.
I need some way (ajax or other methods) to tell the server to cancel the current request rather than using ajax's xhr.abort() function which does not stop the calculation on server side.
For an early attempt, I have found that GAE has the Request Timer in which the DeadlineExceededError will be raised by the runtime if the request takes too long to finish.
Based on this idea, I would like to ask if there is a way to send a signal to the server to cause the runtime to trigger an interrupt on the request?
You shouldn't be trying to do any long-running tasks synchronously in a handler. This is the perfect candidate for a task queue. The Ajax request should simply push the task onto the queue, and App Engine will process it offline. Tasks get a ten-minute timeout.
You can use memcache or the datastore to pass information to and from your Ajax code. For instance, the task handler could check memcache every few seconds for the existence of a 'stop_processing_FOO' key (where FOO is a unique ID generated by the Ajax when the task is first triggered), and the your 'cancel' button would call a handler to insert that key into memcache.
Similarly, the task could put a 'finished_processing_FOO' key with the associated values into memcache when it finishes, and your Ajax could periodically poll a handler that checks if that key is present, and return the value if so.
Related
I was building a service that runs on Cloud Run that is triggered by PubSub through EventArc.
'PubSub' guarantees delivery at least one time and it would retry for every acknowledgement deadline. This deadline is set in the queue subscription details.
We could send an acknowledgement back at two points when a service receives a pub-sub request (which is received as a POST request in the service).
At the beginning of the request as soon as the request was received. The service would then continue to process the request at its own pace. However, this article points out that
When an application running on Cloud Run finishes handling a request, the container instance's access to CPU will be disabled or severely limited. Therefore, you should not start background threads or routines that run outside the scope of the request handlers.
So sending a response at the beginning may not be an option
After the request has been processed by the service. So this would mean that, depending on what the service would do, we cannot always predict how long it would take to process the request. Hence we cannot set the Acknowledgement deadline correctly, resulting in PubSub retries and duplicate requests.
So what is the best practice here? Is there a better way to handle this?
Best practice is generally to ack a message once the processing is complete. In addition to the Cloud Run limitation you linked, consider that if the endpoint acked a message immediately upon receipt and then an error occurred in processing it, your application could lose that message.
To minimize duplicates, you can set the ack deadline to an upper bound of the processing time. (If your endpoint ends up processing messages faster than this, the ack deadline won’t rate-limit incoming messages.) If the 600s deadline is not sufficient, you could consider writing the message to some persistent storage and then acking it. Then, a separate worker can asynchronously process the messages from persistent storage.
Since you are concerned that you might not be able to set the correct "Acknowledgement Deadline", you can use modify_ack_deadline() in your code where you can dynamically extend your deadline if the process is still running. You can refer to this document for sample code implementations.
Be wary that the maximum acknowledgement deadline is 600 seconds. Just make sure that your processing in cloud run does not exceed the said limit.
Acknowledgements do not apply to Cloud Run, because acks are for "pull subscriptions" where a process is continuously pulling the Cloud PubSub API.
To get events from PubSub into Cloud Run, you use "push subscriptions" where PubSub makes an HTTP request to Cloud Run, and waits for it to finish.
In this push scenario, PubSub already knows it made you a request (you received the event) so it does not need an acknowledgement about the receipt of the message. However, if your request sends a faulty response code (e.g. http 500) PubSub will make another request to retry (and this is configurable on the Push Subscription itself).
I'm building an app that essentially does the following:
Get the user to enter certain parameters.
Pass those params to the backend and start a task based on those
params.
When the task is complete redirect the user to another page showing
the results of the task.
The problem here is that the task is expected to take quite long. I was thus hoping to make the request asynchronous. Does appengine allow this ?
If not, what are my options ? I was looking at the documentation for task queues. While it satisfies part of what I'm trying to do, I'm not very clear on how the queue notifies the client when the task is complete, so that the redirect can be initiated.
Also, what if the results of the task have to be returned to the calling client itself ? Is that possible ?
You can't (shouldn't really) wait for completion, GAE is not designed for that. Just launch the task, get a task ID (unique, persisted it in the app) and send the ID back to the client in the response to the launch request.
The client can check, either by polling (at a reasonable rate) or simply on-demand, that status page (you can use the ID to find the right task). You can even add a progress/ETA info on that page, down the road if you so desire.
After the task completes the next status check request from the client can be redirected to the results page as you mentioned.
This Q&A might help as well, it's a very similar scenario, only using the deferred library: How do I return data from a deferred task in Google App Engine
Update:
The Task Queues are preferable to the deferred library, the deferred functionality is available using the optional countdown or eta arguments to taskqueue.add():
countdown -- Time in seconds into the future that this task should run or be leased. Defaults to zero. Do not specify this argument if
you specified an eta.
eta -- A datetime.datetime that specifies the absolute earliest time at which the task should run. You cannot specify this argument if
the countdown argument is specified. This argument can be time
zone-aware or time zone-naive, or set to a time in the past. If the
argument is set to None, the default value is now. For pull tasks, no
worker can lease the task before the time indicated by the eta
argument.
Consider implementing poker on Google App Engine. Suppose a player is allowed only 10 seconds to check/fold/raise.
That is, if 10 seconds pass with no response from the player then some timer should fire which executes code that writes to DataStore declaring that the player folded. What is the idiomatic way to implement this on Google App Engine.
The GAE has a feature called "Tasks". Sadly, they have no guaranteed resolution, so a task scheduled for now+10 seconds can execute in 10 seconds or any later time.
Solution: Write the current time-stamp along with the information about the current player into the database. If any of the players request updated information about the current game, you can check this time-stamp, compare it with the current one, and therefore determine if these 10 seconds have passed and update the database accordingly.
You can combine this solution with tasks to ensure, that even if nobody "watches" that game, its still updated sometime.
This needs to be done on a backend, as that's the only code that can persist outside of a request handler.
Player is dealt. Timer starts on backend. Timer expires. Player
status updated.
Backends are special App Engine instances that have no request deadlines, higher memory and CPU limits, and persistent state across requests. They are started automatically by App Engine and can run continously for long periods. Each backend instance has a unique URL to use for requests, and you can load-balance requests across multiple instances.
https://developers.google.com/appengine/docs/python/backends/
No need to act synchronously - i.e. do some action exactly 10 seconds after last user action.
Just record the time of last user action and act accordingly next time the user action happens: if <10s let user do next move, if >10s notify user he folded.
To keep things more responsive, e.g. to show user how much time he hes before folding, you should also track this on client.
I am using ndb to write a profiling model that logs some data per application request. Each request calls a ndb request by ndb.put_async to log the data, while the client do not care about the result. In essence, I do not want the application request to wait for saving statistics data for profiling.
However, I was confused about the explanation from the official documentation. If an application request has finished before the ndb request finishes, would the ndb request still be guaranteed to finish? The documentation indicates that
if the request handler exists too early, the put might never happen
Under what criteria would this happen? Does this mean that regardless of whether a user care about the result, future.get_result needs to be called anyway just to make sure the ndb request is performed?
The original documentation (https://developers.google.com/appengine/docs/python/ndb/async) says:
In this example, it's a little silly to call future.get_result: the
application never uses the result from NDB. That code is just in there
to make sure that the request handler doesn't exit before the NDB put
finishes; if the request handler exits too early, the put might never
happen. As a convenience, you can decorate the request handler with
#ndb.toplevel. This tells the handler not to exit until its
asynchronous requests have finished. This in turn lets you send off
the request and not worry about the result.
If an application request has finished before the ndb request finishes, would the ndb request still be guaranteed to finish?
No.
Does this mean that regardless of whether a user care about the result, future.get_result needs to be called anyway just to make sure the ndb request is performed?
Basically yes, but you can use ndb.toplevel decorator for the convenience so that you don't have to wait for the result explicitly. That said, I don't think this is what you want.
Probably taskqueue is what you want. Please check it out.
Thanks for the clarification. What about a general RPC (non-NDB) - e.g., incr_async() in memcache.Client()? Setting aside that this is a very, very fast RPC call, is it guaranteed that the RPC will complete?
I.e., which of the following is true:
(a) there is something in the infrastructure that will wait on all known RPCs before completing the request
(b) the request will complete and the async RPCs will also complete regardless of when the request completes
(c) the in-flight RPCs are formally cancelled
(d) something else?
I'm building a simple "play against a random opponent" back-end using Goole App Engine. So far I'm adding each user that wants to play into a "table" in the Datastore. As soon as there are more than 1 player in the Datastore, I can start to match them.
The Schedule Tasks with Cron looked promising for this work until I saw that the lowest resolution seems to be minutely. If there are plenty of players signing up I want them to be matched quickly and not have to wait a whole minute (worst case).
I thought about having the servlet that recives the "play against random opponent" request POST to a Task Queue that would do the match making, but I think this will lead to a lot of contention when reading from the Datastore and deleting the enteties from the "random" table after they have been matched?
Basically I want a single worker that will do the matching, and I want to signal this worker from time to time that now is a good time to try to match opponents.
Any suggestions on what would be the right course of action here?
You can guarantee exclusive access via transactions:
Receive a request to play via REST. Check (within a transaction) if there is any request in database.
If there is, notify both users to start the play and delete request (transactionaly) from database.
If there isn't, add it to the database and wait for the next request.
Update:
Alternativelly you can achieve what you need via pull queue. Same scenario as above, just instead of datastore you'd check if there is a task in the pull queue, retrieve if there is or create a new one if there isn't one.