what is the name of the timeout exception in App Engine? - google-app-engine

For some reason, I was under the impression that it was just called Timeout, but it doesn't seem to be.
Thanks!

For datastore calls, the exception is google.appengine.ext.db.Timeout. For total (wall clock) duration exceeded, the exception is google.appengine.runtime.DeadlineExceededError. The DeadlineExceeded error is thrown 'soft' once, at which point you have a short time to return a response and exit; if you don't, it's thrown again, uncatchable, and your script is unceremoniously terminated.

The timeout exception handling is explained in the Request Timer section of the docs:
A request handler has a limited amount of time to generate and return a response to a request, typically around 30 seconds. Once the deadline has been reached, the request handler is interrupted.
The Python runtime environment interrupts the request handler by raising a DeadlineExceededError, from the package google.appengine.runtime. If the request handler does not catch this exception, as with all uncaught exceptions, the runtime environment will return an HTTP 500 server error to the client.
The request handler can catch this error to customize the response. The runtime environment gives the request handler a little bit more time (less than a second) after raising the exception to prepare a custom response.
from google.appengine.runtime import DeadlineExceededError
class MainPage(webapp.RequestHandler):
def get(self):
try:
# Do stuff...
except DeadlineExceededError:
self.response.clear()
self.response.set_status(500)
self.response.out.write("This operation could not be completed in time...")
If the handler hasn't returned a response or raised an exception by the second deadline, the handler is terminated and a default error response is returned.
While a request can take as long as 30 seconds to respond, App Engine is optimized for applications with short-lived requests, typically those that take a few hundred milliseconds. An efficient app responds quickly for the majority of requests. An app that doesn't will not scale well with App Engine's infrastructure.
The DataStore has its own TimeOut exception
The google.appengine.ext.db package provides the following exception classes:
[...]
exception Timeout()
Raised when the datastore operation exceeds the maximum amount of time allowed for datastore operations.

Related

For Cloud Run triggered from PubSub, when is the right time to send ACK for the request message?

I was building a service that runs on Cloud Run that is triggered by PubSub through EventArc.
'PubSub' guarantees delivery at least one time and it would retry for every acknowledgement deadline. This deadline is set in the queue subscription details.
We could send an acknowledgement back at two points when a service receives a pub-sub request (which is received as a POST request in the service).
At the beginning of the request as soon as the request was received. The service would then continue to process the request at its own pace. However, this article points out that
When an application running on Cloud Run finishes handling a request, the container instance's access to CPU will be disabled or severely limited. Therefore, you should not start background threads or routines that run outside the scope of the request handlers.
So sending a response at the beginning may not be an option
After the request has been processed by the service. So this would mean that, depending on what the service would do, we cannot always predict how long it would take to process the request. Hence we cannot set the Acknowledgement deadline correctly, resulting in PubSub retries and duplicate requests.
So what is the best practice here? Is there a better way to handle this?
Best practice is generally to ack a message once the processing is complete. In addition to the Cloud Run limitation you linked, consider that if the endpoint acked a message immediately upon receipt and then an error occurred in processing it, your application could lose that message.
To minimize duplicates, you can set the ack deadline to an upper bound of the processing time. (If your endpoint ends up processing messages faster than this, the ack deadline won’t rate-limit incoming messages.) If the 600s deadline is not sufficient, you could consider writing the message to some persistent storage and then acking it. Then, a separate worker can asynchronously process the messages from persistent storage.
Since you are concerned that you might not be able to set the correct "Acknowledgement Deadline", you can use modify_ack_deadline() in your code where you can dynamically extend your deadline if the process is still running. You can refer to this document for sample code implementations.
Be wary that the maximum acknowledgement deadline is 600 seconds. Just make sure that your processing in cloud run does not exceed the said limit.
Acknowledgements do not apply to Cloud Run, because acks are for "pull subscriptions" where a process is continuously pulling the Cloud PubSub API.
To get events from PubSub into Cloud Run, you use "push subscriptions" where PubSub makes an HTTP request to Cloud Run, and waits for it to finish.
In this push scenario, PubSub already knows it made you a request (you received the event) so it does not need an acknowledgement about the receipt of the message. However, if your request sends a faulty response code (e.g. http 500) PubSub will make another request to retry (and this is configurable on the Push Subscription itself).

Pub/Sub push return 503 for basic scaling

I am using Pub/Sub push subscription, ack deadline is set to 10 minutes, the push endpoint is hosted within AppEngine using basic scaling.
In my logs, I see that some of the Pub/Sub (supposedly delivered to starting instances) push requests are failed with 503 error status and Request was aborted after waiting too long to attempt to service your request. log message. The execution time for this request varies from 10 seconds (for most of the requests) up to 30 seconds for some of them.
According to this article https://cloud.google.com/appengine/docs/standard/python/how-instances-are-managed#instance_scaling Deadlines for HTTP request is 24 hours and request should not be aborted in 10 seconds.
Is there a way to avoid such exceptions?
These failed requests are most likely timing out in the Pending Request Queue, meaning that no instances are available to serve them. This usually happens during spikes of PubSub messages that are delivered in burst and App Engine can't scale up quickly enough to cope with them.
One option to mitigate this would be to switch the scaling option to automatic scaling in your app.yaml file. You can tweak the min_pending_latency and max_pending_latency to better fit your scenario. You can also specify min_idle_instances to get idle instances that would be ready to handle extra load (make sure to also enable and handle warmup requests)
Take into account though that PubSub will automatically retry to deliver failed messages. It will adjust the delivery rate according to your system's behavior, as documented here. So you may experience some errors during spikes of messages, while new instances are being spawned, but your messages will eventually be processed (as long as you have setup max_instances high enough to handle the load).

How to avoid DeadlineExceededError errors with RPC

I am getting the DeadlineExceededError with the Google App Engine.
Read this https://developers.google.com/appengine/articles/deadlineexceedederrors I believe that it is due to:
google.appengine.runtime.apiproxy_errors.DeadlineExceededError: raised if an RPC exceeded its deadline. This is typically 5 seconds, but it is settable for some APIs using the 'deadline' option.
However, I can't find how to set the deadline option. How do I do it? I can only find examples of how to set the deadline for url_fetch, not the apiproxy_errors.
Adding more details:
I am using Python. An example of my usage is:
taskqueue.add( url='/hl', deadline=60)
So after I add to the taskqueue, the page at xyz.appspot.com/hl will be called, but it gave me the deadline exceeded error after 5 seconds.

How can Google App Engine be prevented from immediately rescheduling tasks after status code 500?

I have a Google App Engine servlet that is cron configured to run once a week. Since it will take more than 1 minute of execution time, it launches a task (i.e. another servlet task/clear) on the application's default push task queue.
Now what I'm observing is this: if the task causes an exception (e.g. NullPointerException inside its second servlet), this gets translated into HTTP status 500 (i.e. HttpURLConnection.HTTP_INTERNAL_ERROR) and Google App Engine apparently reacts by immediately relaunching the same task again. It announces this by printing:
Web hook at http://127.0.0.1:8888/task/clear returned status code 500. Rescheduling..
I can see how this can sometimes be a feature, but in my scenario it's inappropriate. Can I request that Google App Engine should not do such automatic rescheduling, or am I expected to use other status codes to indicate error conditions that would not cause rescheduling by its rules? Or is this something that happens only on the dev. server?
BTW, I am currently also running other tasks (with different frequencies) on the same task queue, so throttling reschedules on the level of task queue configuration would be inconvenient (so I hope there is another/better option too.)
As per https://developers.google.com/appengine/docs/java/taskqueue/overview-push#Java_Task_execution - the task must return a response code between 200 and 299.
You can either return the correct value, set the taskRetryLimit in RetryOptions or check the header X-AppEngine-TaskExecutionCount when task launches to check how many times it has been launched and act accordingly.
I think I've found a solution: in the Java API, there is a method RetryOptions#taskRetryLimit, which serves my case.

GAE behavior when relocating an application to another server

Two questions:
Does Google App Engine send any kind of message to an application just before relocating it to another server?
If so, what is that message?
No it doesnt. It doesnt relocate either, old instances keep running (and eventually stop when idle for long enough) while new ones are spawned.
There are times when App Engine needs to move your instance to a different machine to improve load distribution.
When App Engine needs to turn down a manual scaling instance it first
notifies the instance. There are two ways to receive this
notification. First, the is_shutting_down() method from
google.appengine.api.runtime begins returning true. Second, if you
have registered a shutdown hook, it will be called. It's a good idea
to register a shutdown hook in your start request. After the
notification is issued, existing requests are given 30 seconds to
complete, and new requests immediately return 404.
If an instance is
handling a request, App Engine pauses the request and runs the
shutdown hook. If there is no active request, App Engine sends an
/_ah/stop request, which runs the shutdown hook. The /_ah/stop request
bypasses normal handling logic and cannot be handled by user code; its
sole purpose is to invoke the shutdown hook. If you raise an exception
in your shutdown hook while handling another request, it will bubble
up into the request, where you can catch it.
The following code sample demonstrates a basic shutdown hook:
from google.appengine.api import apiproxy_stub_map
from google.appengine.api import runtime
def my_shutdown_hook():
apiproxy_stub_map.apiproxy.CancelApiCalls()
save_state()
# May want to raise an exception
runtime.set_shutdown_hook(my_shutdown_hook)
Alternatively, the following sample demonstrates how to use the is_shutting_down() method:
while more_work_to_do and not runtime.is_shutting_down():
do_some_work()
save_state()
More details here: https://developers.google.com/appengine/docs/python/modules/#Python_Instance_states

Resources