I am using ndb to write a profiling model that logs some data per application request. Each request calls a ndb request by ndb.put_async to log the data, while the client do not care about the result. In essence, I do not want the application request to wait for saving statistics data for profiling.
However, I was confused about the explanation from the official documentation. If an application request has finished before the ndb request finishes, would the ndb request still be guaranteed to finish? The documentation indicates that
if the request handler exists too early, the put might never happen
Under what criteria would this happen? Does this mean that regardless of whether a user care about the result, future.get_result needs to be called anyway just to make sure the ndb request is performed?
The original documentation (https://developers.google.com/appengine/docs/python/ndb/async) says:
In this example, it's a little silly to call future.get_result: the
application never uses the result from NDB. That code is just in there
to make sure that the request handler doesn't exit before the NDB put
finishes; if the request handler exits too early, the put might never
happen. As a convenience, you can decorate the request handler with
#ndb.toplevel. This tells the handler not to exit until its
asynchronous requests have finished. This in turn lets you send off
the request and not worry about the result.
If an application request has finished before the ndb request finishes, would the ndb request still be guaranteed to finish?
No.
Does this mean that regardless of whether a user care about the result, future.get_result needs to be called anyway just to make sure the ndb request is performed?
Basically yes, but you can use ndb.toplevel decorator for the convenience so that you don't have to wait for the result explicitly. That said, I don't think this is what you want.
Probably taskqueue is what you want. Please check it out.
Thanks for the clarification. What about a general RPC (non-NDB) - e.g., incr_async() in memcache.Client()? Setting aside that this is a very, very fast RPC call, is it guaranteed that the RPC will complete?
I.e., which of the following is true:
(a) there is something in the infrastructure that will wait on all known RPCs before completing the request
(b) the request will complete and the async RPCs will also complete regardless of when the request completes
(c) the in-flight RPCs are formally cancelled
(d) something else?
Related
The Google cloud ndb documentation doesn't say much about async operations.
In the old days, I would do this
#ndb.toplevel
#flask.route('/', methods=['GET'])
def page():
for x in xxx:
ndb.put_multi_async([...])
return 'Done', 200
and the toplevel decorator would make sure that my async puts were done.
How do I do this with the latest cloud ndb?
The cloud ndb docs for toplevel say
Use of this decorator is largely unnecessary, as you should be using
context() which also flushes pending work when exiting the context.
but it would be helpful to have more clarity. When would it still be necessary to use toplevel?
As stated in the documentation for the NDB Asynchronous Operation:
As a convenience, you can decorate the request handler with #ndb.toplevel. This tells the handler not to exit until its asynchronous requests have finished.
...
Using a toplevel application is more convenient than all its handler functions.
This was convenient when using the NDB Client Library with Python 2, as you've said:
the toplevel decorator would make sure that my async puts were done
Nowadays using the Cloud NDB library, as shown in this answer,
each NDB operation needs to be wrapped in a context manager:
with ndb_client.context(): # <- you need this line
cls.get_or_insert('master')
That is why the documentation says that the use of toplevel decorator
is largely unnecessary, as you should be using context()
because context decorator replaced it and it will
flush pending work_ as async operations.
As referred in the Client NDB documentation:
The context is used to manage the connection to Google Cloud Datastore, an event loop for asynchronous API calls, runtime caching policy, and other essential runtime state.
Finally, as referred in the ndb Migration notes:
The biggest difference is in establishing a runtime context for your NDB application. The Google App Engine Python 2.7 runtime had a strong assumption that all code executed inside a web framework request-response cycle, in a single thread per request. In order to decouple from that assumption, Cloud NDB implements explicit clients and contexts.
Personally, I've always been in the habit of calling .get_result() on my async tasklets/operations, so this is something that I've never actually used.
The only use case i can think of for toplevel is if you want to force the flush to occur before you reach the end of your request handler (because at the end of your request handler, you should be exiting the context). In the example below, we want the puts in operation_1 to finish before operation_2 begins:
#ndb.toplevel
def operation_1():
for x in xxx:
ndb.put_multi_async([...])
#flask.route('/', methods=['GET'])
def page():
operation_1()
operation_2()
return 'Done', 200
This could be useful for request handlers for Google Cloud Tasks which can run for up to 10 minutes, so you could be doing a bunch of things in there.
I understand that Azure standard Stateful Logic app workflow runs Asynchronously but can i use stateful standard logic app for the below scenario:
We want to receive Json data from the third party in a HTTP post request, then process it and store it in Azure data lake. But the problem is since Azure standard stateful workflow runs asynchronously as soon the http trigger is hit it returns Status 202 Accepted. I want to send the caller end status of the request. For example- I want to send 500 Internal server error when the request was valid but still the workflow failed due to an internal error. If the data was processed successfully i want to send the caller HTTP Status 200 Ok. I dont want to send always HTTP status 202 Accepted to the caller. I want the caller to know what exactly happened to their HTTP request. Is it possible through standard logic app? I dont want to use consumption Logic app because of security reasons.
You can achieve this using runafter configuration by enabling this configuration it runs even after the whole workflow is getting failed.
Go to your work flow and select Menu for the action you want to run regardless if the previous one is about to fail, timeout, or skip. It's Condition in my case, and then 'Configure run after'.
For instance here is my logic app
Here is how my code view looks like :
OUTPUT:
UPDATED ANSWER
In that case, too you can use the same runafter concept with the condition having status code is not equal to 200 as a true statement and continue the flow
Here is the logic app
Here is the output
I was building a service that runs on Cloud Run that is triggered by PubSub through EventArc.
'PubSub' guarantees delivery at least one time and it would retry for every acknowledgement deadline. This deadline is set in the queue subscription details.
We could send an acknowledgement back at two points when a service receives a pub-sub request (which is received as a POST request in the service).
At the beginning of the request as soon as the request was received. The service would then continue to process the request at its own pace. However, this article points out that
When an application running on Cloud Run finishes handling a request, the container instance's access to CPU will be disabled or severely limited. Therefore, you should not start background threads or routines that run outside the scope of the request handlers.
So sending a response at the beginning may not be an option
After the request has been processed by the service. So this would mean that, depending on what the service would do, we cannot always predict how long it would take to process the request. Hence we cannot set the Acknowledgement deadline correctly, resulting in PubSub retries and duplicate requests.
So what is the best practice here? Is there a better way to handle this?
Best practice is generally to ack a message once the processing is complete. In addition to the Cloud Run limitation you linked, consider that if the endpoint acked a message immediately upon receipt and then an error occurred in processing it, your application could lose that message.
To minimize duplicates, you can set the ack deadline to an upper bound of the processing time. (If your endpoint ends up processing messages faster than this, the ack deadline won’t rate-limit incoming messages.) If the 600s deadline is not sufficient, you could consider writing the message to some persistent storage and then acking it. Then, a separate worker can asynchronously process the messages from persistent storage.
Since you are concerned that you might not be able to set the correct "Acknowledgement Deadline", you can use modify_ack_deadline() in your code where you can dynamically extend your deadline if the process is still running. You can refer to this document for sample code implementations.
Be wary that the maximum acknowledgement deadline is 600 seconds. Just make sure that your processing in cloud run does not exceed the said limit.
Acknowledgements do not apply to Cloud Run, because acks are for "pull subscriptions" where a process is continuously pulling the Cloud PubSub API.
To get events from PubSub into Cloud Run, you use "push subscriptions" where PubSub makes an HTTP request to Cloud Run, and waits for it to finish.
In this push scenario, PubSub already knows it made you a request (you received the event) so it does not need an acknowledgement about the receipt of the message. However, if your request sends a faulty response code (e.g. http 500) PubSub will make another request to retry (and this is configurable on the Push Subscription itself).
My website on GAE-Python has a function to calculate some math using Evolutionary Optimization Algorithm, which will be called by an ajax request when the user click a button. Each request usually takes very long time to finish calculation.
I need some way (ajax or other methods) to tell the server to cancel the current request rather than using ajax's xhr.abort() function which does not stop the calculation on server side.
For an early attempt, I have found that GAE has the Request Timer in which the DeadlineExceededError will be raised by the runtime if the request takes too long to finish.
Based on this idea, I would like to ask if there is a way to send a signal to the server to cause the runtime to trigger an interrupt on the request?
You shouldn't be trying to do any long-running tasks synchronously in a handler. This is the perfect candidate for a task queue. The Ajax request should simply push the task onto the queue, and App Engine will process it offline. Tasks get a ten-minute timeout.
You can use memcache or the datastore to pass information to and from your Ajax code. For instance, the task handler could check memcache every few seconds for the existence of a 'stop_processing_FOO' key (where FOO is a unique ID generated by the Ajax when the task is first triggered), and the your 'cancel' button would call a handler to insert that key into memcache.
Similarly, the task could put a 'finished_processing_FOO' key with the associated values into memcache when it finishes, and your Ajax could periodically poll a handler that checks if that key is present, and return the value if so.
In the Paypal API, the flow allows opportunity for logging before an API call, sometimes in the middle, such as in the case of Express Checkout, and after a successful payment/transaction. I am concerned about the last step where confirmation of success of the call is received but perhaps was unable to be logged. I've been looking through the API but have not found a way to check the result of a previous payment/transaction. Is there such a call? How do I ensure atomicity in this case?
The DoExpressCheckoutPayment API call is idempotent as of any version > 76.0; you can simply call it again a second time if you wish to verify the transaction really has completed.
Optionally, you can also employ PayPal Instant Payment Notification to get a server-to-server POST with data for each transaction happening on your account.