App Engine urlfetch DeadlineExceededError - google-app-engine

I have 2 service. One is hosted in Google App Engine and one is hosted in Cloud Run.
I use urlfetch (Python 2) imported from google.appengine.api in GAE to call APIs provided by the Cloud Run.
Occasionally there are a few (like <10 per week) DeadlineExceededError shown up like this:
Deadline exceeded while waiting for HTTP response from URL
But these few days such error suddenly occurs frequently (like ~40 per day). Not sure if it is due to Christmas peak hour or what.
I've checked Load Balancer logs of Cloud Run and turned out the request has never reached the Load Balancer.
Has anyone encountered similar issue before? Is anything wrong with GAE urlfetch?
I found a conversion which is similar but the suggestion was to handle the error...
Wonder what can I do to mitigate the issue. Many thanks.
Update 1
Checked again, found some requests from App Engine did show up in Cloud Run Load Balancer logs but the time is weird:
e.g.
Logs from GAE project
10:36:24.706 send request
10:36:29.648 deadline exceeded
Logs from Cloud Run project
10:36:35.742 reached load balancer
10:36:49.289 finished processing
Not sure why it took so long for the request to reach the Load Balancer...
Update 2
I am using GAE Standard located in US with the following settings:
runtime: python27
api_version: 1
threadsafe: true
automatic_scaling:
max_pending_latency: 5s
inbound_services:
- warmup
- channel_presence
builtins:
- appstats: on
- remote_api: on
- deferred: on
...
The Cloud Run hosted API gateway I was trying to call is located in Asia. In front of it there is a Google Load Balancer whose type is HTTP(S) (classic).
Update 3
I wrote a simple script to directly call Cloud Run endpoint using axios (whose timeout is set to 5s) periodically. After a while some requests were timed out. I checked the logs in my Cloud Run project, 2 different phenomena were found:
For request A, pretty much like what I mentioned in Update 1, logs were found for both Load Balancer and Cloud Run revision.
Time of CR revision log - Time of LB log > 5s so I think this is an acceptable time out.
But for request B, no logs were found at all.
So I guess the problem is not about urlfetch nor GAE?

Deadline exceeded while waiting for HTTP response from URL is actually a DeadlineExceededError. The URL was not fetched because the deadline was exceeded. This can occur with either the client-supplied deadline (which you would need to change), or the system default if the client does not supply a deadline parameter.
When you are making a HTTP request, App Engine maps this request to URLFetch. URLFetch has its own deadline that is configurable. See the URLFetch documentation.
You can set a deadline for each URLFetch request. By default, the deadline for a fetch is 5 seconds. You can change this default by:
Including the following appengine.api.urlfetch.defaultDeadline setting in your appengine-web.xml configuration file. Specify the timeout in seconds:
<system-properties>:
<property name="appengine.api.urlfetch.defaultDeadline" value="10"/>
</system-properties>
You can also adjust the default deadline by using the urlfetch.set_default_fetch_deadline() function. This function stores the new default deadline on a thread-local variable, so it must be set for each request, for example, in a custom middleware.
from google.appengine.api import urlfetch
urlfetch.set_default_fetch_deadline(45)
If your Cloud Run service is processing long requests, you can increase the request timeout. If your service doesn't return a response within the time specified, the request ends and the service returns an HTTP 504 error.
Update the timeoutSeconds attribute in YAML file as :
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
name: SERVICE
spec:
template:
spec:
containers:
- image: IMAGE
timeoutSeconds: VALUE
OR
You can update the request timeout for a given revision at any time by using the following command:
gcloud run services update [SERVICE] --timeout=[TIMEOUT]
If requests are terminating earlier with error code 503, you might need to update the request timeout setting for your language framework:
Node.js developers might need to update the [server.timeout property via server.setTimeout][6] (use server.setTimeout(0) to achieve an unlimited timeout) depending on the version you are using.
Python developers need to update Gunicorn's default timeout.

Related

How to have app engine avoid cold starts?

Even when there are instances already running, I am still experiencing cold starts on some of the requests.
I thought that GAE would start some instances in the background and add them to the pool of active instances that serve requests only once the instances are started. Is that not the case? Is there a way to configure GAE to make it so?
Instead it seems like some of the requests are waiting the full duration of the new instance to be started, which can take up to 10 seconds, when using the existing instances only would have served all the benchmark traffic under a couple of seconds.
UPDATE:
This is my app.yaml config:
runtime: nodejs10
env: standard
instance_class: F1
handlers:
- url: '.*'
script: auto
automatic_scaling:
min_instances: 1
max_instances: 3
What you're looking for are the Warmup requests:
Warmup requests are a specific type of loading request that load
application code into an instance ahead of time, before any live
requests are made. Manual or basic scaling instances do not receive an
/_ah/warmup request.
And from Configuring warmup requests:
Loading your app's code to a new instance can result in loading
requests. Loading requests can result in increased request latency
for your users, but you can avoid this latency using warmup
requests. Warmup requests load your app's code into a new instance
before any live requests reach that instance.
Not 100% perfect - there are some limitations, but they're the next best thing.
Configuring warmup requests means:
Enabling warmup requests in your app.yaml file:
inbound_services:
- warmup
Creating your handler for the '/_ah/warmup' warmup requests URL

GCP CRON jobs failing with no logs

I am trying to set up a CRON job in a Google Cloud Platform. The job is showing up in the GCP console, although it is failing. There are no logs that reveal why it is failing. The schedule seems to be working ok, and I am able to manually run the job, although it also fails when initiated manually.
If I go to http://...../api/status/addit in the url bar, the job runs as expected.
There is a link to "View Logs" in on the task queues page where it shows my CRON job, but when I go to those logs they are completely empty.
Looking at the nginx request logs does not show any requests made to that url (or any requests for that matter). If I go to the url for the job manually, I can see those requests show up in the logs and everything that is supposed to happen happens so I know that endpoint is good.
Google App Engine Flexible environment, Python 3
Flask API
What other info can I provide? There are so many moving parts that I don't want to flood the question with irrelevant info.
cron.yaml:
cron:
- description: 'test cron job'
url: /api/status/addit
schedule: every 1 minutes
endpoint:
< some Flask Blueprint stuff initiates the "status" blueprint so that this url will resolve to /api/status/addit >
...
#status.route('/addit')
def add_to_file():
print('made it into the request')
from flask import Response
res = Response("{'foo':'bar'}", status=202, mimetype='application/json')
return res
I experienced the same issue with the same kind of configuration (GAE flexible, Python 3):
Cron fails with no logging.
Turns out it is a firewall issue: my default action was set to DENY.
According to the docs:
Note: If you define a firewall in the flexible environment, you must set firewall rules for both the 10.0.0.1 and 0.1.0.1 IP addresses to allow your app to receive requests from the Cron service.
Whitelisting 10.0.0.1 and 0.1.0.1 solved my issue.
Your urls don't match. Try:
cron:
- description: 'test cron job'
url: /addit
schedule: every 1 minutes
I ran into a similar problem when I was using SSL and a script to redirect users from http to https URLs (in my case SSLify). Google app cron seems to use the http version (at least for my flex app), so when my app was called by cron, it returned a 302 redirect to the https version which was interpreted as an error.
"A cron job will invoke a URL, using an HTTP GET" https://cloud.google.com/appengine/docs/flexible/nodejs/scheduling-jobs-with-cron-yaml
Thanks to https://stackoverflow.com/a/53018498/4288232 and comments that lead me to the solution.

GAE App gets socket errors when communicating with BigQuery

Our GAE python application communicates with BigQuery using the Google Api Client for Python (currently we use version 1.3.1) with the GAE-specific authentication helpers. Very often we get a socket error while communicating with BigQuery.
More specifically, we build a python Google API client as follows
1. bq_scope = 'https://www.googleapis.com/auth/bigquery'
2. credentials = AppAssertionCredentials(scope=bq_scope)
3. http = credentials.authorize(httplib2.Http())
4. bq_service = build('bigquery', 'v2', http=http)
We then interact with the BQ service and get the following error
File "/base/data/home/runtimes/python27/python27_dist/lib/python2.7/gae_override/httplib.py", line 536, in getresponse
'An error occured while connecting to the server: %s' % e)
error: An error occured while connecting to the server: Unable to fetch URL: [api url...]
The error raised is of type google.appengine.api.remote_socket._remote_socket_error.error, not an exception that wraps the error.
Initially we thought that it might be timeout-related, so we also tried setting a timeout altering line 3 in the above snippet to
3. http = credentials.authorize(httplib2.Http(timeout=60))
However, according to the log output of client library the API call takes less than 1 second to crash and explicitly setting the timeout did not change the system behavior.
Note that the error occurs in various API calls, not just a single one, and usually this happens on very light operations, for example we often see the error while polling BQ for a job status and rarely on data fetching. When we re-run the operation, the system works.
Any idea why this might happen and -perhaps- a best-practice to handle it?
All HTTP(s) requests will be routed through the urlfetch service.
Beneath that, the Google Api Client for Python uses httplib2 to make HTTP(s) requests and under the covers this library uses socket.
Since the error is coming from socket you might try to set the timeout there.
import socket
timeout = 30
socket.setdefaulttimeout(timeout)
If we continue up the stack httplib2 will use the timeout parameter from the socket level timeout.
http://httplib2.readthedocs.io/en/latest/libhttplib2.html
Moving further up the stack you can set the timeout and retries for BigQuery.
try:
timeout = 30000
num_retries = 5
query_request = bigquery_service.jobs()
query_data = {
'query': (query_var),
'timeoutMs': timeout,
}
And finally you can set the timeout for urlfetch.
from google.appengine.api import urlfetch
urlfetch.set_default_fetch_deadline(30)
If you believe it's timeout related you might want to test each library / level to make sure the timeout is being passed correctly. You can also use a basic timer to see the results.
start_query = time.time()
query_response = query_request.query(
projectId='<project_name>',
body=query_data).execute(num_retries=num_retries)
end_query = time.time()
logging.info(end_query - start_query)
There are dozens of questions about timeout and deadline exceeded for GAE and BigQuery on this site so I wouldn't be surprised if you're hitting something weird.
Good luck!

A simple Cron job on frontend servers and no backends. Fails

I configured a simple cron job available at secure(admin only) path /cron?method=sendMail to send an email once daily. The servlet at the endpoint of the url is a jsp file which has the code to send the email.
This is tested through the frontend with the full url and it works.
I do not have any backend servers configured. Only frontend instances with 1 resident and dynamics.
The issue is that cron triggers successfully but the request fails with the following message.
0.1.0.1 - - [17/Nov/2013:01:07:44 -0800] "GET /cron?method=sendMail HTTP/1.1" 503 364 - "AppEngine-Google; (+http://code.google.com/appengine)"
This request is getting retried continuously and keeps failing.
The complete url however works. What is wrong with the setup? what am I missing?
Also, how do I stop the retries?
The cron entry I use goes like
<cron>
<url>/cron?method=sendMail</url>
<description>Send mail</description>
<schedule>every mon,tue,wed,thu,fri,sat,sun 11:30</schedule>
<timezone>Asia/Calcutta</timezone>
</cron>
The issue was resolved using the following tip from the documentation to secure the url
Requests from the Cron Service will also contain a HTTP header: X-AppEngine-Cron: true
The X-AppEngine-Cron header is set internally by Google App Engine. If your request handler finds this header it can trust that the request is a cron request. If the header is present in an external user request to your app, it is stripped.

HTTPException: Deadline exceeded while waiting for HTTP response from URL: #deadline

We are using the developers python guide with Python data 2.15 library and as per example stated for app engine.
createSite("test site one", description="test site one", source_site =("https://sites.google.com/feeds/site/googleappsforus.com/google-site-template" ))
We are getting an un-predictable response every time we use.
Exception: HTTPException: Deadline exceeded while waiting for HTTP response from URL: https://sites.google.com/feeds/site/googleappsforyou.com
Did someone experience the same issue? Is it AppEngine or Sites API related?
Regards,
Deadline exceeded while waiting for HTTP response from URL is actually a DeadlineExceededError. When you are making a HTTP request, App Engine maps this request to URLFetch. URLFetch has its own deadline that is configurable. See the URLFetch documentation.
It seems that your client library catches DeadlineExceededError and throws HTTPException. Your client library either passes a deadline to URLFetch (which you would need to change) or the default deadline is insufficient for your request.
Try setting the default URLFetch deadline like this:
from google.appengine.api import urlfetch
urlfetch.set_default_fetch_deadline(45)
Also make sure to check out Dealing with DeadlineExceededErrors in the official Docs.
Keep in mind that any end-user initiated request must complete in 60 seconds or will encounter a DeadlineExceededError. See App Engine Python Runtime Request Timer.
The accepted solution did not work for me when working with the very recent versions of httplib2 and googleapiclient. The problem appears to be that httplib2.Http passes it's timeout argument all the way through to urlfetch. Since it has a default value of None, urlfetch sets the limit for that request to 5s irrespective of whatever default you set with urlfetch.set_default_fetch_deadline. It appears that you have a few options.
First option -- You can explicitly pass an instance of httplib2.Http around:
http = httplib2.Http(timeout=30)
...
client.method().execute(http=http)
Second option, you can set the default value using sockets1:
import socket
socket.setdefaulttimeout(30)
Third option, you can tell appengine to use sockets for requests2. To do that, you modify app.yaml:
env_variables:
GAE_USE_SOCKETS_HTTPLIB : 'anyvalue'
1This might only work for paid apps since the socket api might not be present for unpaid apps...
2I'm almost certain this will only work for paid apps since the socket api won't be functional for unpaid apps...

Resources