503 and 400 on uploading images in Google App Engine - google-app-engine

Recently I experienced two problems in uploading files to my Java gae app.
I'm using the tecnique described in the blobstore doc.
With regular files, occasionally (let's say 15% of times) the client receives a "503 Service Unavailable".
With high resolution images (example 7000x10000) the client always receives a "400 Bad Request".
On both cases on the server there are no error messages logged, the blobs are written correctly, but the successPath url (the callback of createUploadUrl) is never called. It seems that the GAE endpoint handling the upload crashes for some reasons.
My client is a js XMLHttpRequest, wrapped in GWT:
public native void uploadWithXMLHttpRequest(UploadForm uploadForm) /*-{
var fd = new FormData();
var files = uploadForm.#mypackage.UploadForm::getFiles()();
for (var i = 0; i < files.length; i++) {
fd.append("uploadFile"+i, files[i]);
}
var xhr = new XMLHttpRequest();
//xhr.upload.addEventListeners... omitted
xhr.open("POST", uploadForm.#mypackage.UploadForm::getUploadUrl()());
xhr.send(fd);
}
Any ideas for possible causes and solutions/workarounds?
Thx.

This issue is being discussed in a GAE ticket opened by another user having the same problem: https://code.google.com/p/googleappengine/issues/detail?id=7619 (btw, the bug tracker system has a "start" feature, which allows you to vote for the ticket and receive notifications)

Possible reason:
1 You uploading large file (> 1MB) and writing it all. You should write it portinal: 1 write = 1MB.
2 Your request takes longer than 30 sec - use backend.

In this case the 503 is caused by errors when we write the upload info into your datastore. As you are using M/S datastore then transient errors are expected from time to time. I suggest you convert your app to HRD to minimize the chances of there being errors related to writing the upload information to your datastore.
The 400 error was generated by your app & is in your application logs.

Try to use Google Cloud Storage, since the blob store service has lot of problem, so google is trying to migrate uses from Blob to GCS support
I guess the image resolution can't exceed 8000 in the app engine blob store that is the reason it caused.

Related

Dealing with large zip uploads and extracting using google cloud

I am trying to create a site for e-learning courses (zips html/css/js/media) to be uploaded to.
I am using go on google app engine with google cloud storage to store the zips and extracted courses.
I will explain the development dead ends I have encountered.
My first thought was to use the resumable upload functionality of cloud storage to send the zip file, then read it using go on app engine, unzip the files and write them back to cloud storage.
This took a while to read and understand the documentation and worked perfectly for my 2MB test zip. It failed when I tried it with a modest 67MB zip. I had encountered a hidden limitation when accessing cloud storage from app engine. No matter the client I used there was a 10MB/32MB limit.
I tried both the old and new libraries as well as blobstore.
I also looked into creating a custom oauth2 supporting client library using sockets but hit too many dead ends.
Giving up on that approach I thought even though it would mean more uploading, perhaps just extracting on the client (browser) side then uploading each file with it's own resumable upload would make the most sense. After exploring a few libraries I had extracting in browser working ready to upload.
I wrote my handler that created the datastore entry for the upload, selected a location for the upload and created all the upload urls.
When testing this I was finding that it would take a while to go through generating the long lists of files (anything over 100). I decided that it would make sense since I was using to to make the requests concurrently. I spend a day or two getting that working. After dealing with some CORS issues that weirdly did not show up earlier I had everything working.
Then I started getting errors when stress testing my approach with a large (500mb) zip/course. The uploads would fail and I discovered that when trying to send 300+ files to generate upload urls I was getting the following error
Post http://localhost:62394: dial tcp [::1]:62394: connectex: No connection could be made because the target machine actively refused it.
now I have no idea how to diagnose this. I don't know if I am hitting a rate limit and if I am I don't know how to avoid it.
This seems like creating this should be simple, but it is anything but.
I have a few options I can pursue
Try to create the resumable uploads with a batch operation(https://cloud.google.com/storage/docs/json_api/v1/how-tos/batch)
batch operations to /upload are not supported.
Maybe make requesting each url a one by one api call.
Make requesting the url happen over a channel (https://cloud.google.com/appengine/docs/go/channel/reference)
spend the next week or more adding layers of retries and fallback error handling.
Try another solution.
This should be simple. How should this be done?

Google Cloud Endpoint performance

What is the performance of Google Cloud Endpoints? In my case a large blob is being transferred. Size is anywhere from 1MB to 8MB. It seems to take about 5 to 10 minutes with my broadband speed being about 1Mb upload.
Note this is being done from a Java client calling an endpoint. The object being transferred looks like this:
public class Item
{
String type;
byte[] data;
}
On the java client side, the code looks like this:
Item item = new Item( type, s );
MyItem.Builder builder = new MyItem.Builder( new NetHttpTransport(), new GsonFactory(), null );
service = builder.build();
PutItem putItem = service.putItem( item );
putItem.execute();
Why does it take so long to send one of these up to an endpoint? Is it the JSON parsing that is slowing it down? Any ideas on how to speed this up?
Endpoints is just a wrapper around HTTP requests made to Java servlets (you mention java on client, so I'll make the assumption of java on the server).
This will introduce a very small, fixed, overhead, but the transfer speed of a large blob should be no different whether you are using endpoints or not.
As noted by Gilberto, you should probably consider using Google Cloud Storage (GCS api is slowly replacing the blobstore API). You can use it to upload directly to storage and remove the burden on your GAE app.

Google drive api 500 error in 80% of requests on app engine

I'm using Drive API on Google app engine.
Application servers 7k request per day.
During last day number of errors from api increased to nearly 60%.
{ "code" : 500, "message" : null }
I use this code to initialize drive:
HttpTransport httpTransport = new NetHttpTransport();
JsonFactory jsonFactory = new JacksonFactory();
AppIdentityCredential credential =
new AppIdentityCredential.Builder(Arrays.asList(DriveScopes.DRIVE)).build();
GoogleClientRequestInitializer keyInitializer =
new CommonGoogleClientRequestInitializer(Settings.API_KEY);
Drive service = new Drive.Builder(httpTransport, jsonFactory, null)
.setHttpRequestInitializer(credential)
.setApplicationName(APP_NAME)
.setGoogleClientRequestInitializer(keyInitializer)
.build();
return service;
Does any one have same situation?
Are there any solutions?
Thanks!
UPDATE:
Have started working without any changes from my side.
There are many bugs in Drive that can cause hard 500 errors, and also many transient internal scenarios (esp. timeouts) that can cause them. It's important that you do as much research as possible so you can distinguish between the two, since some are permanent whereas others may succeed after a backoff and retry.
In your case, I suspect you are tripping over the infrastructure issues that Google have confirmed have been affecting Drive (and perhaps other services) over the last few days. See https://plus.google.com/106160348960403302854/posts/CwH9SEDTQ4C

App Engine: Alternatives to urlfetch? Seems very unreliable

I'm using urlfetch in my app and while everything works perfectly fine in the development environment, i'm finding urlfetch to be VERY unreliable when it's actually deployed. Sometimes it works as it should (retrieving data), but then a few minutes later it might return nothing, then it'll be working fine again a few minutes after that. This is very unacceptable. I've checked to make sure it's NOT the source URL that's the problem (YQL) and, again, everything works as it should in the development environment.
Are there any third-party libraries I could try?
Example code:
url = "http://query.yahooapis.com/v1/public/yql?q=%s&format=json" % urllib.quote_plus(query)
result = urlfetch.fetch(url, deadline=10)
if result.status_code == 200:
r = json.loads(result.content)
else:
return
a = r['query']['results']
# Do stuff with 'a'
Sometimes it'll work as it should, but other times - completely randomly with no code changes - i'll get this this error:
a = r['query']['results']
TypeError: 'NoneType' object is unsubscriptable
Sometimes it'll work as it should,
but other times completely randomly with no code changes
This is a common symptom that your application's requests have exceeded the Yahoo API calls rate limit.
Quoting Yahoo developer documentations rate limit:
IP Based Limits
Our service rate limits are imposed as
a limit on the number of API calls
made per IP address during a specific
time window. If your IP address
changes during that time period, you
may find yourself with more "credit"
available. However, if someone else
had been using the address and hit the
limit, you'll need to wait until the
end of the time period to be allowed
to make more API calls.
Google App Engine uses a pool of IP addresses for outgoing urlfetch requests and your application is sharing these IP addresses with other applications that are calling the same Yahoo endpoint; when the rate limit is exceeded, the endpoint replies with a limit exceeded error causing UrlFetch to fail.
Here another case using the Twitter search API.
When you mix Google App Engine+Third party web APIs, you need to be sure that the API provides authenticated calls allowing your application to have its own quota (StackApps API for example).
import urllib2
response = urllib2.urlopen('http://python.org/')
html = response.read()
This isn't an error in URLFetch - it's an issue with the JSON being returned. Either json.loads is returning None, or r['query'] is - I'm guessing it's probably the latter. Try logging result.content to see what the service is returning. You probably also want to cehck result.status.
One possibility is that your request is being denied or ratelimited by Yahoo in production, but not on your development machine.

Uploading file on Google App Engine using Datastore and 30 sec response time limitation

Will the response timer on google app engine start upon submitting the web page's form?
If I'm going to upload a file that is greater than 1MB, I could split the files to 1MB to fit in the limitation of the Google App Engine Datastore. Now, my concern is if the client's internet connection is slow, it would eat up the 30 seconds timer right? If this is the case, it is impossible to upload large files with slow connection?
The 30 second response time limit only applies to code execution. So the uploading of the actual file as part of the request body is excluded from that. The timer will only start once the request is fully sent to the server by the client, and your code starts handling the submitted request. Hence it doesn't matter how slow your client's connection is.
As an side note, Instead of splitting your file into multiple parts, try using the blobstore. I am using it for images and it raises the storage limit to 50MB. (Remember to enable billing to get access to the blobstore)

Resources