Bulk file upload Appengine - google-app-engine

I have around 1500 images which are dynamically generated in my local server. I want these to upload to Appengine's datastore. How can i do this? any help, any ideas?

BlobStore
You can use the blobstore if you have a billing account.
So I assume to use the bolbstore API to upload it in chunks with a client tool over http.
Before every upload, you can ask the blobstore to give you an unique upload URL to post MIME multipart posts to it.
There is a size limit for requests I think.
Simple DataStore
If you store your images in the datastore, you can use a python tool somehow to synchronize.

Related

Is there a size limit to upload file using webapp2?

I am using appengine and writing my server code in python using webapp2. I am trying to upload video files from browser and save it to Google cloud storage. I use the form element in my HTML and webapp2 handler on server side to upload the file from browser. It works for files of smaller size, but when I try to upload a video file greater than 100MB, the browser throws the blow error
This webpage is not available
ERR_CONNECTION_RESET
I am unable to debug this on server side as it doesn't hit the post method.
Is there a config parameter in web app 2 that can be modified to upload files of greater size?
Any inputs is greatly appreciated.
App Engine has a limit of 32MB on all requests. You should upload your files directly to Google Cloud Storage, not through your server. This will also save you a lot of instance time.
EDIT: As Alex mentioned, signed URLs is a great way to let users upload and download files directly from GCS.

Upload data and blobs from local to GAE production?

I have a process that gathers and stores data on my local dev_appserver, both on the datastore and blobstore. I do NOT want to move the process to production on Google App Engine, I just want to move the result of that process (structured data on on the datastore and some blobs).
What would be the best approach, and how could I get it done in a fairly automated way?
I've have a look at the appcfg.py options of dumping data out of the datastore, but from what I've seen it does not work with blobs.
My data structure is something like:
name --> ndb.TextProperty
content --> ndb.TextProperty
image --> ndb.StructuredProperty (structured property containing image BlobKey and image Serving URL)
I believe I need to upload the blobs to my Cloud Storage on one side, upload data to the Cloud Data Store on the other side, and then make sure the BlobKey relationship between the Data and Blobs is not lost.
The Cloud Storage API does not provide a clear explanation of how to do this from local. It seems like I need to create a POST request to the Cloud Storage API and the request should have the authorization (API Key) and the blob data. Is there a App Engine API that does this, or do I need to build the request myself?
Has anyone done this before? Any suggestions?
Thanks!
You could use the Cloud Datastore API to write to your production App Engine datastore using authorized HTTP requests.
The Python API is lower-level than NDB, but the entities section of the documentation has a table describing each property type.

Is it possible for users to upload to google cloud storage?

I'd like to create an object, give my users an upload url, and let them upload data. The resulting object must be public-readable. Is this possible with google cloud storage? If so, is it possible through google app engine, and where can I find documentation and/or examples for doing it?
To have a user upload directly to Google Cloud Storage, you can use the Signed URLs feature. This allows you to grant access to issue a PUT request to an object to a single user.
If you're using Python, there is a python example demonstrating signed URLs.
You can create an upload url using the blobstore service. See the create_upload_url function.
To make the object publicly accessible you may need to play with the acls of the bucket.
See also the Cloud Storage Overview.
Another option to upload directly to Google Cloud Storage is Resumable URLs.
If your object is big, such as a video, you can upload it in chunks this way. If the upload fails (e.g. client loses internet connection), you can resume from where you left off and not have to have the user start over again. Plus you save some money by not having to restart that upload.
However if your media is small, just use Signed URLs.

Store Photos in Blobstore or as Blobs in Datastore - Which is better/more efficient /cheaper?

I have an app where each DataStore Entity of a specific kind can have a number of photos associated with it. (Imagine a car sales website - one Car has multiple photos)
Originally since all the data is being sourced from another site, I was limited to having to store the photos as DataStore Blobs, but now that its possible to write BlobStore items programatically, I'm wondering if I should change my design and store the photos as BlobStore items?
So, the question is:
Is it 'better' to store the photos in the Blobstore, or as Blobs in the Datastore? Both are possible solutions, but which would be the better/cheaper/most efficient approach, and why?
Images served from BlobStore have several advantages over Datastore:
Images are served directly from BlobStore, so request does not go through GAE frontend instance. So you are saving on frontend instances time and hence cost.
BlobStore storage cost is roughly half of Datastore storage cost ($0.13 vs $0.24). With Datastore you'd additionally pay for get() or query().
BlobStore automatically uses Google cache service, so the only cost is cost of bandwidth ($0.12/GB). You can also set this on frontend instance via cache control, but the difference is that this is done automatically for BlobStore.
Images in BlobStore can be served via ImageService and can be transformed on the fly, e.g. creating thumbnails. Transformed images are also automatically cached.
Binary blobs in Datastore are limited to 1Mb in size.
One downside of BlobStore is that it has no access controls. Anybody with an URL to blob can download it. If you need ACL (Access Control List) take a look at Google Cloud Storage.
Update:
Cost wise the biggest saving will come from properly caching the images:
Every image should have a permanent URL.
Every image URL should be served with proper cache control HTTP headers:
// 32M seconds is a bit more than one year
Cache-Control: max-age=32000000, must-revalidate
you can do this in java via:
httpResponse.setHeader("Cache-Control", "max-age=32000000, must-revalidate");
Update 2:
As Dan correctly points out in the comments, BlobStore data is served via a frontend instance, so access controls can be implemented by user code.

Store Images in datastore from local drive in Google App Engine

I couldn't find articles match to my requirements.
Basically, what I want is that:
User uploads picture to the application from their local drive.
Application stores the picture uploaded to datastore.
Application retrieves images from datastore.
Any suggestions? Urgent.
That's exactly what is discussed in the documentation for the BlobStore API.
You can do this in much the same way as you would in any other framework or platform: Create an HTML form with a 'file' input and the mimetype set to 'multipart/form-data'. On the server side, extract the file data from the form field (using self.request.POST['fieldname'].value in webapp) and store the contents in a datastore model, in a db.BlobProperty field.

Resources