What is the performance of Google Cloud Endpoints? In my case a large blob is being transferred. Size is anywhere from 1MB to 8MB. It seems to take about 5 to 10 minutes with my broadband speed being about 1Mb upload.
Note this is being done from a Java client calling an endpoint. The object being transferred looks like this:
public class Item
{
String type;
byte[] data;
}
On the java client side, the code looks like this:
Item item = new Item( type, s );
MyItem.Builder builder = new MyItem.Builder( new NetHttpTransport(), new GsonFactory(), null );
service = builder.build();
PutItem putItem = service.putItem( item );
putItem.execute();
Why does it take so long to send one of these up to an endpoint? Is it the JSON parsing that is slowing it down? Any ideas on how to speed this up?
Endpoints is just a wrapper around HTTP requests made to Java servlets (you mention java on client, so I'll make the assumption of java on the server).
This will introduce a very small, fixed, overhead, but the transfer speed of a large blob should be no different whether you are using endpoints or not.
As noted by Gilberto, you should probably consider using Google Cloud Storage (GCS api is slowly replacing the blobstore API). You can use it to upload directly to storage and remove the burden on your GAE app.
Related
Consider an image (avatar) uploader to Google Cloud Storage which will start from the user's web browser, and then pass through a Go appengine instance which will handle standard compression/cropping etc. and then set the resulting image as an object in Cloud Storage
How can I ensure that the appengine instance isn't overloaded by too much or bad data? In other words, I think I'm asking two questions (or possibly not):
How can I limit the amount of data allowed to be sent to an appengine instance in a single request, or is there already a default safe limit?
How can I validate the data to make sure it's proper jpg/png/gif before attempting to process it with standard go image libraries?
All App Engine requests are limited to 32MB.
You can check the size of the file being uploaded before the upload starts.
You can verify the file's mime-type and only allow correct files to be uploaded.
I have been reading all over stackoverflow concerning datastore vs blobstore for storing and retrieving image files. Everything is pointing towards blobstore except one: privacy and security.
In the datastore, the photos of my users are private: I have full control on who gets a blob. In the blobstore, however, anyone who knows the url can conceivable access my users photos? Is that true?
Here is a quote that is supposed to give me peace of mind, but it's still not clear. So anyone with the blob key can still access the photos? (from Store Photos in Blobstore or as Blobs in Datastore - Which is better/more efficient /cheaper?)
the way you serve a value out of the Blobstore is to accept a request
to the app, then respond with the X-AppEngine-BlobKey header with the
key. App Engine intercepts the outgoing response and replaces the body
with the Blobstore value streamed directly from the service. Because
app logic sets the header in the first place, the app can implement
any access control it wants. There is no default URL that serves
values directly out of the Blobstore without app intervention.
All of this is to ask: Which is more private and more secure for trafficking images, and why: datastore or blobstore? Or, hey, google-cloud-storage (which I know nothing about presently)
If you use google.appengine.api.images.get_serving_url then yes, the url returned is public. However the url returned is not guessable from a blob's key, nor does the url even exist before calling get_serving_url. (Or after calling delete_serving_url).
If you need access control on top of the data in the blobstore you can write your own handlers and add the access control there.
BlobProperty is just as private and secure as BlobStore, all depends on your application which serves the requests. your application can implement any permission checking before sending the contents to the user, so I don't see any difference as long as you serve all the images yourself and don't intentionally create publicly available URLs.
Actually, I would not even thinlk about storing photos in the BlobProperty, because this way the data ends up in the database instead of the BlobStore and it costs significantly more to store data in the database. BlobStore, on the other hand, is cheap and convenient.
I have a very simple app engine application serving 1.8Kb - 3.6Kb gzipped files stored in blobstore. A map from numeric file ID to blobkey is stored in the datastore and cached in memcache. The servlet implementation is trivial: a numeric file ID is received in the request; the blobkey is retrieved from memcache/datastore and the standard BlobstoreService.serve(blobKey, resp) is invoked to serve the response. As expected the app logs show that response sizes always match the blobstore file size that was served.
I've been doing some focused volume testing and this has revealed that the outgoing bandwidth quota utilization is consistently reported to be roughly 2x what I expect given requests received. I've been doing runs of 100k requests at a time summing bytes received at the client, comparing this with the app logs and everything balances except for the outgoing bandwith quota utilization.
Any help in understanding how the outgoing bandwidth quota utilization is determined for the simple application I describe above? What am I missing or not accounting for? Why would it not tally with the totals shown for response sizes in the app logs?
[Update 2013.03.04: I abandoned the use of the blobstore and reverted back to storing the blobs directly in the datastore. The outgoing bandwidth utilization is now as exactly as expected. It appears that the 2x multiplier is somehow related to the use of the blobstore (but it remains inexplicable). I encountered several other problems with using the blobstore service; most problematic were the additional datastore reads and writes (which are related to the blobinfo and blobindex meta data managed in the datastore - and which is what I was originally trying to reduce by migrating my data to the blobstore). A particularly serious issue for me was this: https://code.google.com/p/googleappengine/issues/detail?id=6849. I consider this a blobstore service memory leak; once you create a blob you can never delete the blob meta data in the datastore. I will be paying for this in perpetuity since I was foolish enough to run a 24 hr volume and performance test and now am unable to free the storage used during the test. It appears that the blobstore is currently only suitable for use in very specific scenarios (i.e. permanently static data). Objects with a great deal of churn or data that is frequently refreshed or altered should not be stored in the blobstore.]
The blobstore data can be deleted (i don't recommend it since it can lead to unexpected behavior), but only if you know the table that it is saved in __BlobInfo__ and __BlobFileIndex__. This is done, so your uploaded files don't have the same name, and to accidentally replace an old file.
For a full list of tables that are stored in datastore you can run SELECT * FROM __kind__.
I am not sure why your app engine app consumes 2x you outgoing bandwidth, but i will test it myself.
An alternative is to use Google Cloud Storage. If you use the default bucket for your app engine app, you get 5GB free storage
Objects with a great deal of churn or data that is frequently refreshed or altered should not be stored in the blobstore.
That's true, you can either use cloud storage or datastore (cloud storage is an immutable object storing service). Blobstore was more for uploading files via <input type='file' /> forms. (since recently, writing files from inside app, has been deprecated in favor to cloud storage)
I switched to NDB for a new app, which as I understand includes memcache support 'for free'.
So I put an entity in the datastore:
class MyStorage(ndb.Model):
pickled_data = ndb.BlobProperty()
obj = MyStorage(parent=ndb.Key('top_level_key', 'second_level_key'), pickled_data = pickle.dumps(my_attr))
obj.put()
In other requests I then retrieve using
obj = pickle.loads(MyStorage.query(ancestor = ndb.Key('top_level_key', 'second_level_key')).get().pickled_data)
But the delay in testing it when deployed on app engine tells me there's no caching going on (obviously none expected on the first call, but subsequent calls should show a speed up).
I check Memcache Viewer and sure enough, zeroes under every metric. So I'm obviously not getting something regarding free NDB caching. Can someone point out what it is?
NDB will only read from cache when you use .get_by_id() (or .get() on a Key). It won't be used when you use .query().
Recently I experienced two problems in uploading files to my Java gae app.
I'm using the tecnique described in the blobstore doc.
With regular files, occasionally (let's say 15% of times) the client receives a "503 Service Unavailable".
With high resolution images (example 7000x10000) the client always receives a "400 Bad Request".
On both cases on the server there are no error messages logged, the blobs are written correctly, but the successPath url (the callback of createUploadUrl) is never called. It seems that the GAE endpoint handling the upload crashes for some reasons.
My client is a js XMLHttpRequest, wrapped in GWT:
public native void uploadWithXMLHttpRequest(UploadForm uploadForm) /*-{
var fd = new FormData();
var files = uploadForm.#mypackage.UploadForm::getFiles()();
for (var i = 0; i < files.length; i++) {
fd.append("uploadFile"+i, files[i]);
}
var xhr = new XMLHttpRequest();
//xhr.upload.addEventListeners... omitted
xhr.open("POST", uploadForm.#mypackage.UploadForm::getUploadUrl()());
xhr.send(fd);
}
Any ideas for possible causes and solutions/workarounds?
Thx.
This issue is being discussed in a GAE ticket opened by another user having the same problem: https://code.google.com/p/googleappengine/issues/detail?id=7619 (btw, the bug tracker system has a "start" feature, which allows you to vote for the ticket and receive notifications)
Possible reason:
1 You uploading large file (> 1MB) and writing it all. You should write it portinal: 1 write = 1MB.
2 Your request takes longer than 30 sec - use backend.
In this case the 503 is caused by errors when we write the upload info into your datastore. As you are using M/S datastore then transient errors are expected from time to time. I suggest you convert your app to HRD to minimize the chances of there being errors related to writing the upload information to your datastore.
The 400 error was generated by your app & is in your application logs.
Try to use Google Cloud Storage, since the blob store service has lot of problem, so google is trying to migrate uses from Blob to GCS support
I guess the image resolution can't exceed 8000 in the app engine blob store that is the reason it caused.