I'm building application on Google AppEngine with Java (GAE/J) and all my data will be stored in Google DataStore. Now, what if I want to save some binary file, let's say Images (JPG, PNG, etc), DOC, TXT, Video Files how do i deal with these? Or what if i want to stream video files (SWF) where and how should i store those files and when I redeploy my app i don't loose any data.
Depends on whether you're talking about static files or dynamic... If they're static created by you, you can upload them subject to a 10MB/3000 file max but Google doesn't offer a CDN or anything.
If they're dynamic, uploaded by your users or created by your application, the datastore supports BlobProperties: you can dump any kind of binary data you want in there as long as it's less than 1MB per entity. If they're larger you can consider another service like S3 or Mosso's cloud files. This can be a better solution for serving files directly to users because these guys can offer CDN service but it's not cheap. On the other hand your latency back to GAE will be much higher than storing the data in Google's Datastore and you'll have to pay for transit on both sides so it's something to take into account if you're going to be processing the files on App Engine.
Google App Engine Virtual File System (GaeVFS)
Related
I am currently developing a website that allows uploading of images and other files. At this point I have a couple of doubts.
The first is, where should those files be stored? I was thinking of storing them in the file system.
And the second, what is the most correct way to serve uploaded multimedia files? If you use the file system as a storage medium, should you use static routes configured on the server or is there another better alternative?
Where should the files be stored?
If the numbers of files are limited and not very much, then you can store it in the filesystem. However, if the number is large, I would recommend to store it in some storage service like AWS S3. Process could be either of two that you store temporarily the uploaded file in filesystem and then upload it in S3 or direct upload to S3. Depends on the use case.
What is the most correct way to serve uploaded multimedia files?
In case you are using services like S3, you just have to set content type and expiration among other metadata while uploading and rest everything will be taken care by S3 only. In case you are storing data on filesystem, then you can use nginx/apache to serve the static assets files along with proper content type and other metadata.
How to backup (and restore) image files uploaded and stored as blobs in GAE Blobstore(Python)
I have gone through the GAE help doc on this topic. I could not find any way but I am sure there must be a very simple and intuitive way to do this since this is a fundamental need to develop any big commercial web app
Although a feature to download the backed up data would be better but I am even ok with Google Cloud Storage based approach if some definite guide is present for the same
I want to use the backup of my web app data in case of some accidental data deletion or corruption.. I plan to use the Datastore Admin to backup my NDB entities which could be easily restored using the same.. I was hoping for a similar solution(backup and also easy restore) for the image(picture) files stored as blobs in Blobstore..
I have gone through this GAE Blobstore help page and it does not say anything about its deprecation (Files API is deprecated and I am not using that)
I would advice against storing images in the AppEngine blobstore to store anything given that it's set for deprecation (and has been so for the last few months). So, in addition to back up I would strongly suggest migrating your app to store images directly in Google Cloud Storage asap.
The best way to back up images stored in Blobstore is to create a migration via TaskQueues. In this migration, grab each of the blobs and store them to a container which can be AWS S3 or Google Cloud Storage (via boto library). The reason you need to make is TaskQueue is because it will likely take a LONG time if you have lots of images stored in the blobstore.
Here's the SO question I asked and got a response about:
GAE Blobstore file-like API deprecation timeline (py 2.7 runtime)
I'm looking for a cloud based service which will allow my customers to upload very high resolution and print pdfs (sometimes about 60mb), store the images and create low resolution images very quickly
I've started looking at Amazon S3 but know this doesn't do anything with the files uploaded and started looking at google app engine.
I did think about using dropbox core api but i think this is really for 1 to 1 users rather than hundreds of users daily.
Any suggestions for services would be great
Thanks
David
have a look at google cloud storage:
https://cloud.google.com/products/cloud-storage
there you can upload files up to 5 tb, and as many as you can pay.
it works perfectly with lots of users. you can use buckets or folders per user, its up to you.
also its possible to reach that files with an own domain,
apis are available for many languages as well
I've got a bunch of large files in the regular cloud storage that I'd like to programmatically move over to the blobstore for further processing using the mapreduce library. (Since there is a BlobstoreLineInputReader but not a Datastore version.) I've tried making a url for the gs file and having the blobstore try reading it in itself, also I've tried buffered reads, but for large files I still hit a memory error. (I avoid the deadline exceed error (more 60 seconds) for blobstore files by opening in append mode and finalizing only at the end.) It seems like there should be an efficient way to do this since both the datastore and blobstore are part of the same application context, but I haven't found it.
I'm confused because you mention cloud storage and datastore almost interchangeably here.
If your data is in Google Cloud storage then you can create BlobKeys for the files and use them with any current Blobstore API.
i.e.
blobkey = blobstore.create_gs_key('/gs/my_bucket/my_object').
If your files are in the datastore then you'll need to use the files API to move them to Cloud Storage/Blobstore and then process them from there.
I have some big files stored on Google Storage. I would like users to be able to download them only when they are authenticated to my GAE application. The user would use a link of my GAE such as http://myapp.appspot.com/files/hugefile.bin
My first try works for files which sizes are < 32mb. Using the Google Storage experimental API, I could read the file first then serve it to the user. It required my GAE application to be a team member of the project which Google Storage was enabled. Unfortunately this doesn’t work for large files, and it hogs bandwidth by first downloading the file to GAE and then serving it to the player.
Does anyone have an idea on how to carry out that?
You can store files up to 5GB in size using the Blobstore API: http://code.google.com/appengine/docs/python/blobstore/overview.html
Here's the Stackoverflow thread on this: Upload file bigger than 40MB to Google App Engine?
One thing to note, is reading blobstore can only be done in 32MB increments, but the API provides ways to accessing portions of the file for reads: http://code.google.com/appengine/docs/python/blobstore/overview.html#Serving_a_Blob
FYI in the upcoming 1.6.4 release of AppEngine we've added the ability to pass a Google Storage object name to the blobstore.send_blob() to send Google Storage files of any size from you AppEngine application.
Here is the pre-release announcement for 1.6.4.