How urlfetch quotas work? - google-app-engine

The following quotas are given at GAE docs:
request size: 1 megabyte
response size: 32 megabytes
If my GAE app receives upload file, 1 megabyte quota is applied?
If my GAE app sends (POST) the file to another server with urlfetch, still 1 megabyte is the the limit?

Imcoming bandwidth quota: Each incoming HTTP request can be no larger than 32MB.
So an HTTP request from a browser directly to your application (suchas uploading a file) cannot exceed 32MB.
Urlfetch quota: request size 1 megabyte
So you can't POST a request larger than 1MB using urlfetch
If you need to have an outside service process large files from your app; Upload the files into the blobstore, and then post a link to the external service so that it can fetch the file. If you do not control the external service, and their api does not have a method for fetching files via URL, you might have to rethink and maybe send the file to the external service first rather that to your AE app.

Related

App Engine urlfetch DeadlineExceededError

I have 2 service. One is hosted in Google App Engine and one is hosted in Cloud Run.
I use urlfetch (Python 2) imported from google.appengine.api in GAE to call APIs provided by the Cloud Run.
Occasionally there are a few (like <10 per week) DeadlineExceededError shown up like this:
Deadline exceeded while waiting for HTTP response from URL
But these few days such error suddenly occurs frequently (like ~40 per day). Not sure if it is due to Christmas peak hour or what.
I've checked Load Balancer logs of Cloud Run and turned out the request has never reached the Load Balancer.
Has anyone encountered similar issue before? Is anything wrong with GAE urlfetch?
I found a conversion which is similar but the suggestion was to handle the error...
Wonder what can I do to mitigate the issue. Many thanks.
Update 1
Checked again, found some requests from App Engine did show up in Cloud Run Load Balancer logs but the time is weird:
e.g.
Logs from GAE project
10:36:24.706 send request
10:36:29.648 deadline exceeded
Logs from Cloud Run project
10:36:35.742 reached load balancer
10:36:49.289 finished processing
Not sure why it took so long for the request to reach the Load Balancer...
Update 2
I am using GAE Standard located in US with the following settings:
runtime: python27
api_version: 1
threadsafe: true
automatic_scaling:
max_pending_latency: 5s
inbound_services:
- warmup
- channel_presence
builtins:
- appstats: on
- remote_api: on
- deferred: on
...
The Cloud Run hosted API gateway I was trying to call is located in Asia. In front of it there is a Google Load Balancer whose type is HTTP(S) (classic).
Update 3
I wrote a simple script to directly call Cloud Run endpoint using axios (whose timeout is set to 5s) periodically. After a while some requests were timed out. I checked the logs in my Cloud Run project, 2 different phenomena were found:
For request A, pretty much like what I mentioned in Update 1, logs were found for both Load Balancer and Cloud Run revision.
Time of CR revision log - Time of LB log > 5s so I think this is an acceptable time out.
But for request B, no logs were found at all.
So I guess the problem is not about urlfetch nor GAE?
Deadline exceeded while waiting for HTTP response from URL is actually a DeadlineExceededError. The URL was not fetched because the deadline was exceeded. This can occur with either the client-supplied deadline (which you would need to change), or the system default if the client does not supply a deadline parameter.
When you are making a HTTP request, App Engine maps this request to URLFetch. URLFetch has its own deadline that is configurable. See the URLFetch documentation.
You can set a deadline for each URLFetch request. By default, the deadline for a fetch is 5 seconds. You can change this default by:
Including the following appengine.api.urlfetch.defaultDeadline setting in your appengine-web.xml configuration file. Specify the timeout in seconds:
<system-properties>:
<property name="appengine.api.urlfetch.defaultDeadline" value="10"/>
</system-properties>
You can also adjust the default deadline by using the urlfetch.set_default_fetch_deadline() function. This function stores the new default deadline on a thread-local variable, so it must be set for each request, for example, in a custom middleware.
from google.appengine.api import urlfetch
urlfetch.set_default_fetch_deadline(45)
If your Cloud Run service is processing long requests, you can increase the request timeout. If your service doesn't return a response within the time specified, the request ends and the service returns an HTTP 504 error.
Update the timeoutSeconds attribute in YAML file as :
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
name: SERVICE
spec:
template:
spec:
containers:
- image: IMAGE
timeoutSeconds: VALUE
OR
You can update the request timeout for a given revision at any time by using the following command:
gcloud run services update [SERVICE] --timeout=[TIMEOUT]
If requests are terminating earlier with error code 503, you might need to update the request timeout setting for your language framework:
Node.js developers might need to update the [server.timeout property via server.setTimeout][6] (use server.setTimeout(0) to achieve an unlimited timeout) depending on the version you are using.
Python developers need to update Gunicorn's default timeout.

Send large POST request from Google App Engine

I have to send a large POST request as part of a RESTful API call. The payload is around 80MB in size. When I try to send this in GAE Java, I get an exception saying it is not a permissible size because it is too large. What are the most common ways people send such large POST request? In my case, this request only happens very rarely, maybe once in 6 months or so. Nonetheless, I would need to have this feature.
From the docs - https://cloud.google.com/appengine/docs/quotas#Requests
The amount of data received by the application from requests. Each incoming HTTP request can be no larger than 32MB.
This includes:
data received by the application in secure requests and non-secure requests
uploads to the Blobstore
data received in response to HTTP requests by the URL fetch service
So write it to GCS or S3 or google sheets, docs etc... (anywhere that allows you to store a larger payload, then process this via a task queue.

why my app reaches GAE `Incoming Bandwidth` while it just fetches tiny content

My app use urllib2 to fetch remote http file. But it does not fetch the whole file, it just read 5 bytes from it. I purposely do that to save quota. As the line 'content = remoteFileFh.read(5)' below does.
def httpGetFile(self,remoteFile):
print 'downloading %s...'%remoteFile,
remoteFileFh = urllib2.urlopen(remoteFile)
content = remoteFileFh.read(5)
print 'content:%s' % content
remoteFileFh.close()
print 'done.'
But it seems it still consume 'Incoming Bandwidth' as soon as fetching the whole file.
Why? how Google host service calculate that?
The URLfetch service doesn't support fetching partial content. On App Engine, urllib2 is just a wrapper over urlfetch, so the whole response is fetched and made available to your application whether you read it all or not.

Redirect GET request and include an authentication token

GAE = Google App Engine
GCS = Google Cloud Storage
My GAE application receives GET requests for files that are actually stored on a bucket of GCS. I would like to redirect those requests to their real location and include an auth token in the redirected request so that GCS accepts to serve them.
To issue a redirection, GAE exposes webapp2.RequestHandler.redirect which does not let me add any header to the original request.
Is it possible to redirect the GET request and include an auth token in it?
HTTP redirect is just a response with 3XX status code. You can't forward a header or response body to a new location.
That said, you will want to implement some logic on a client. Your client has to issue one request to your GAE application, then process the response, and then issue one more request to GCS with all the headers or body that you want to supply (auth token in your case).
I updated another thread with this as well, but just in case you didn't see it.
In the upcoming 1.6.4 release of AppEngine we've added the ability to pass a Google Storage object name to the blobstore.send_blob() to send Google Storage files of any size from you AppEngine application. We create the correct token for your application to access the objects in the Google Storage bucket.
Here is the pre-release announcement for 1.6.4.

How often is the Silverlight Access policy accessed?

As you may well know, it is required to host an access policy
(clientaccesspolicy.xml) on your web server if you want SL apps
to perform HTTP requests, or you need to host an access server
on port 943 for socket connections.
My app makes many short requests and latency is important. I want
to know if this access policy file is accessed once for every
new HTTP request or is it accessed for the first request and have
its result cached on the client. It would be quite costly for me
to have two web requests (one for the policy, one for the HTTP GET)
for each HTTP request I create.
One easy way to test this is to use Fiddler and watch for requests to the policy file. The documentation also specifies that the cross-domain policy file is requested only once per application session. This means that the runtime will only request it once and store the result in memory for the silverlight session.

Resources