Send large POST request from Google App Engine - google-app-engine

I have to send a large POST request as part of a RESTful API call. The payload is around 80MB in size. When I try to send this in GAE Java, I get an exception saying it is not a permissible size because it is too large. What are the most common ways people send such large POST request? In my case, this request only happens very rarely, maybe once in 6 months or so. Nonetheless, I would need to have this feature.

From the docs - https://cloud.google.com/appengine/docs/quotas#Requests
The amount of data received by the application from requests. Each incoming HTTP request can be no larger than 32MB.
This includes:
data received by the application in secure requests and non-secure requests
uploads to the Blobstore
data received in response to HTTP requests by the URL fetch service
So write it to GCS or S3 or google sheets, docs etc... (anywhere that allows you to store a larger payload, then process this via a task queue.

Related

What is the idiomatic way for an API to return part of the data now, and part of the data later?

Suppose my back-end is using two (legacy) APIs the responses of which I want to combine. One returns the majority of the data and is quite fast. The other is slow, and only returns one feature for the dataset. Also, the response of the first API is required for the request to the slow API to be meaningful.
What I'd like to do, is that the back-end would return the result of the fast API instantly, then wait for the slow API, and then combine the two into a second response also containing the missing feature. If I'm correct, this would be accomplished somehow with the asynchronous HTTP pattern. However, I'm having troubles conceiving a solution while still keeping my API stateless, like all REST APIs should be.
So far: the following is the best concept I have been able to come up with:
A client (browser) sends a request to my API.
My API retrieves the data from the fast API, and sends a response like: {dataFromFirstAPI: {...}, linkToPoll: "/api/whatever/xyxzzy"} with status 202.
My API sends a request to the slow API, waits for response, processes the response and combines it with the result from the fast API. After that it returns the data to a request at linkToPoll.
Meanwhile the client will poll linkToPoll until it receives an appropriate response and body.
I have a few questions about the above approach:
Is it an antipattern to also return some other data in a 202 response (point 2. above)?
Is there any convention on what the polling URL should be?
One approach would be for the client to send two requests, but this would create state for my API.

Multiple request for 1 endpoint block request to other endpoints

I need to upload multiple pdf’s to an endpoint.
Currently I am bombarding api with 1 pdf per 1 request (using devextreme-react file uploader)
It works fine for 3-5 request but more than that queues them into pending.
The problem is that I also still need an api to handle other endpoints.
Maybe something like telling the browser “use up to 3 parallel requests for this endpoint and leave the rest pending but when I request other endpoints then handle them immidiately”.
How to do that?
My stack is React with Next, axios for request, Django Rest Framework on Back end.

Google Cloud Endpoints not respecting etag cache headers

When I issue a GET request, I get back a 200 OK and the etag header:
etag → "tZIZl_M15FKLVh9sN6Nj0iz1dQE/fA5Fya8Zz6DLGJwPNnIWbruyt30"
In my subsequent request, I send the
If-Not-Modified → "tZIZl_M15FKLVh9sN6Nj0iz1dQE/fA5Fya8Zz6DLGJwPNnIWbruyt30"
header, but the endpoint still sends back 200 OK rather than 304.
How do I get my endpoint to respect the If-Not-Modified header? Documentation on caching using cloud endpoints is non existent :/
Google Cloud Endpoints is a mechanism to call your back-end methods directly.
Hence, they don't follow the normal rules for other requests, like the cache one you're mentioning.
Think of them like AJAX code for App Engine that can be called from your Android/iOS/web code.
You have two options if the cache is important for you:
To use the standard HTTP request/response model, i.e. not to use Cloud Endpoints.
To implement a cache control by yourself inside your own methods.

Request to App Engine Backend Timing Out

I created an App Engine backend to serve http requests for a long running process. The backend process works as expected when the query references an input of small size, but times out when the input size is large. The query parameter is the url of an App Engine BlobStore blob, which is the input data for the backend process. I thought the whole point of using App Engine backends was to avoid the timeout restricts that App Engine frontends possess. How can I avoid getting a timeout?
I call the backend like this, setting the connection timeout length to infinite:
HttpURLConnection connection = (HttpURLConnection)(new URL(url + "?" + query).openConnection());
connection.setRequestProperty("Accept-Charset", charset);
connection.setRequestMethod("GET");
connection.setConnectTimeout(0);
connection.connect();
InputStream in = connection.getInputStream();
int ch;
while ((ch = in.read()) != -1)
json = json + String.valueOf((char) ch);
System.out.println("Response Message is: " + json);
connection.disconnect();
The traceback (edited for anonymity) is:
Uncaught exception from servlet
java.net.SocketTimeoutException: Timeout while fetching URL: http://my-backend.myapp.appspot.com/somemethod?someparameter=AMIfv97IBE43y1pFaLNSKO1hAH1U4cpB45dc756FzVAyifPner8_TCJbg1pPMwMulsGnObJTgiC2I6G6CdWpSrH8TrRBO9x8BG_No26AM9LmGSkcbQZiilhC_-KGLx17mrS6QOLsUm3JFY88h8TnFNer5N6-cl0iKA
at com.google.appengine.api.urlfetch.URLFetchServiceImpl.convertApplicationException(URLFetchServiceImpl.java:142)
at com.google.appengine.api.urlfetch.URLFetchServiceImpl.fetch(URLFetchServiceImpl.java:43)
at com.google.apphosting.utils.security.urlfetch.URLFetchServiceStreamHandler$Connection.fetchResponse(URLFetchServiceStreamHandler.java:417)
at com.google.apphosting.utils.security.urlfetch.URLFetchServiceStreamHandler$Connection.getInputStream(URLFetchServiceStreamHandler.java:296)
at org.someorg.server.HUDXML3UploadService.doPost(SomeService.java:70)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:637)
As you can see, I'm not getting the DeadlineExceededException, so I think something other than Google's limits is causing the timeout, and also making this a different issue from similar stackoverflow posts on the topic.
I humbly thank you for any insights.
Update 2/19/2012: I see what's going on, I think. I should be able to have the client wait indefinitely using GWT [or any other type of client-side async framework] async handler for an any client request to complete, so I don't think that is the problem. The problem is that the file upload is calling the _ah/upload App Engine system endpoint which then, once the blob is stored in the Blobstore) calls the upload service's doPost backend to process the blob. The client request to _ah/upload is what is timing out, because the backend doesn't return in a timely fashion. To make this timeout problem go away, I attempted to make the _ah_upload service itself a public backend accessible via http://backend_name.project_name.appspot.com/_ah/upload, but I don't think that google allows a system service (like _ah/upload) to be run as a backend. Now my next approach is to just have ah_upload immediately return after triggering the backend processing, and then call another service to get the original response I wanted, after processing is finished.
The solution was to start a backend process as a tasks and add that to the task queue, then returning a response to client before it waits to process the backend task (which can take a long time). If I could have assigned ah_upload to a backend, this would have also solved the problem, since the clien't async handler could wait forever for the backend to finish, but I do not think Google permits assigning System Servlets to backends. The client will now have to poll persisted backend process response data, as Paul C mentioned, since tasks can not respond like a normal servlet.

How urlfetch quotas work?

The following quotas are given at GAE docs:
request size: 1 megabyte
response size: 32 megabytes
If my GAE app receives upload file, 1 megabyte quota is applied?
If my GAE app sends (POST) the file to another server with urlfetch, still 1 megabyte is the the limit?
Imcoming bandwidth quota: Each incoming HTTP request can be no larger than 32MB.
So an HTTP request from a browser directly to your application (suchas uploading a file) cannot exceed 32MB.
Urlfetch quota: request size 1 megabyte
So you can't POST a request larger than 1MB using urlfetch
If you need to have an outside service process large files from your app; Upload the files into the blobstore, and then post a link to the external service so that it can fetch the file. If you do not control the external service, and their api does not have a method for fetching files via URL, you might have to rethink and maybe send the file to the external service first rather that to your AE app.

Resources