Allowing an authenticated user to download a big object stored on Google Storage - google-app-engine

I have some big files stored on Google Storage. I would like users to be able to download them only when they are authenticated to my GAE application. The user would use a link of my GAE such as http://myapp.appspot.com/files/hugefile.bin
My first try works for files which sizes are < 32mb. Using the Google Storage experimental API, I could read the file first then serve it to the user. It required my GAE application to be a team member of the project which Google Storage was enabled. Unfortunately this doesn’t work for large files, and it hogs bandwidth by first downloading the file to GAE and then serving it to the player.
Does anyone have an idea on how to carry out that?

You can store files up to 5GB in size using the Blobstore API: http://code.google.com/appengine/docs/python/blobstore/overview.html
Here's the Stackoverflow thread on this: Upload file bigger than 40MB to Google App Engine?
One thing to note, is reading blobstore can only be done in 32MB increments, but the API provides ways to accessing portions of the file for reads: http://code.google.com/appengine/docs/python/blobstore/overview.html#Serving_a_Blob

FYI in the upcoming 1.6.4 release of AppEngine we've added the ability to pass a Google Storage object name to the blobstore.send_blob() to send Google Storage files of any size from you AppEngine application.
Here is the pre-release announcement for 1.6.4.

Related

Access control for media files on Google Cloud Storage

I have a social media app deployed on App Engine where users can upload and share photos/videos with a private group of people. For writes, I have a POST endpoint that accepts uploaded files and writes them to one GCS bucket that's not public. For reading, a GET endpoint checks with Cloud SQL if this user is authorized to access the media file - if yes, it returns the file stream. The file is stored only for 48 hours and average retrieval is 20 times per day. Users are authenticated using Firebase email link login.
The issue with the current approach is that my GET endpoint is an expensive middleman for reading the GCS file stream and passing it on to the clients, adding to the cost as may times the API is invoked.
There's no point caching the file on App Engine because cache hit ratio will be extremely low for my use case.
The GET API could return a GCS file URL instead of File Stream, if I make the GCS bucket public. But that would mean anyone can access the file with this public URL, not just my app or limited user. Plus, the entire bucket is vulnerable now.
I could create an ACL for each GCS file object, but ACLs work only for users with Google accounts and my app uses email link authentication. There's also a limit on ACL entries per object in case the file needs to be shared with more than 100 people.
The last option I have is to create a signed link that works for a short duration, enabling limited unauthorized sharing.
Also tagging Google Photos. In case the partner sharing program can help with this problem, then I can migrate from GCS to Google Photos for storage.
This looks like a common use-case for media based apps. Are there any recommended design patterns to achieve the goal in a cost effective way?
This is my first week learning GCP, so I maybe wrong in some of the points shared above.

How can I enforce rate limit for users downloading from Google Cloud Storage bucket?

I am implementing a dictionary website using App Engine and Cloud Storage. App Engine controls the backend, like user authentication etc., and Cloud Storage is used to store a JSON file for each dictionary entry.
I would like to rate limit how much a user can download in a given time period so they can't bulk download the JSON files and result in a big charge for me. Ideally, the dictionary would display a captcha if a user downloads too much at once, and allow them to keep downloading if they pass the captcha. What is the best way to achieve this?
Is there a specific service for rate limiting based on IP address or authenticated user? Should I do this through App Engine and only access Cloud Storage through App Engine (perhaps slower since it's using some of my dynamic resources to serve static content)? Or is it possible to have the frontend access Cloud Storage and implement the rate limiting on Cloud Storage directly? Is a Cloud bucket the right service for storage, here? And how can I allow search engine indexing bots to bypass the rate limiting?
As explained by Doug Stevenson in this post
"There is no configuration for limiting the volume of downloads for
files stored in Cloud Storage."
and explaining further:
"If you want to limit what end users can do, you will need to route
them through some middleware component that you build that tracks how
they're using your provided API to download files, and restrict what
they can do based on their prior behavior. This is obviously
nontrivial to implement, but it's possible."

How to backup image files uploaded and stored as blobs in GAE Blobstore(Python)

How to backup (and restore) image files uploaded and stored as blobs in GAE Blobstore(Python)
I have gone through the GAE help doc on this topic. I could not find any way but I am sure there must be a very simple and intuitive way to do this since this is a fundamental need to develop any big commercial web app
Although a feature to download the backed up data would be better but I am even ok with Google Cloud Storage based approach if some definite guide is present for the same
I want to use the backup of my web app data in case of some accidental data deletion or corruption.. I plan to use the Datastore Admin to backup my NDB entities which could be easily restored using the same.. I was hoping for a similar solution(backup and also easy restore) for the image(picture) files stored as blobs in Blobstore..
I have gone through this GAE Blobstore help page and it does not say anything about its deprecation (Files API is deprecated and I am not using that)
I would advice against storing images in the AppEngine blobstore to store anything given that it's set for deprecation (and has been so for the last few months). So, in addition to back up I would strongly suggest migrating your app to store images directly in Google Cloud Storage asap.
The best way to back up images stored in Blobstore is to create a migration via TaskQueues. In this migration, grab each of the blobs and store them to a container which can be AWS S3 or Google Cloud Storage (via boto library). The reason you need to make is TaskQueue is because it will likely take a LONG time if you have lots of images stored in the blobstore.
Here's the SO question I asked and got a response about:
GAE Blobstore file-like API deprecation timeline (py 2.7 runtime)

Is it possible for users to upload to google cloud storage?

I'd like to create an object, give my users an upload url, and let them upload data. The resulting object must be public-readable. Is this possible with google cloud storage? If so, is it possible through google app engine, and where can I find documentation and/or examples for doing it?
To have a user upload directly to Google Cloud Storage, you can use the Signed URLs feature. This allows you to grant access to issue a PUT request to an object to a single user.
If you're using Python, there is a python example demonstrating signed URLs.
You can create an upload url using the blobstore service. See the create_upload_url function.
To make the object publicly accessible you may need to play with the acls of the bucket.
See also the Cloud Storage Overview.
Another option to upload directly to Google Cloud Storage is Resumable URLs.
If your object is big, such as a video, you can upload it in chunks this way. If the upload fails (e.g. client loses internet connection), you can resume from where you left off and not have to have the user start over again. Plus you save some money by not having to restart that upload.
However if your media is small, just use Signed URLs.

Store Binary files on GAE/J + Google DataStore

I'm building application on Google AppEngine with Java (GAE/J) and all my data will be stored in Google DataStore. Now, what if I want to save some binary file, let's say Images (JPG, PNG, etc), DOC, TXT, Video Files how do i deal with these? Or what if i want to stream video files (SWF) where and how should i store those files and when I redeploy my app i don't loose any data.
Depends on whether you're talking about static files or dynamic... If they're static created by you, you can upload them subject to a 10MB/3000 file max but Google doesn't offer a CDN or anything.
If they're dynamic, uploaded by your users or created by your application, the datastore supports BlobProperties: you can dump any kind of binary data you want in there as long as it's less than 1MB per entity. If they're larger you can consider another service like S3 or Mosso's cloud files. This can be a better solution for serving files directly to users because these guys can offer CDN service but it's not cheap. On the other hand your latency back to GAE will be much higher than storing the data in Google's Datastore and you'll have to pay for transit on both sides so it's something to take into account if you're going to be processing the files on App Engine.
Google App Engine Virtual File System (GaeVFS)

Resources