I have a social media app deployed on App Engine where users can upload and share photos/videos with a private group of people. For writes, I have a POST endpoint that accepts uploaded files and writes them to one GCS bucket that's not public. For reading, a GET endpoint checks with Cloud SQL if this user is authorized to access the media file - if yes, it returns the file stream. The file is stored only for 48 hours and average retrieval is 20 times per day. Users are authenticated using Firebase email link login.
The issue with the current approach is that my GET endpoint is an expensive middleman for reading the GCS file stream and passing it on to the clients, adding to the cost as may times the API is invoked.
There's no point caching the file on App Engine because cache hit ratio will be extremely low for my use case.
The GET API could return a GCS file URL instead of File Stream, if I make the GCS bucket public. But that would mean anyone can access the file with this public URL, not just my app or limited user. Plus, the entire bucket is vulnerable now.
I could create an ACL for each GCS file object, but ACLs work only for users with Google accounts and my app uses email link authentication. There's also a limit on ACL entries per object in case the file needs to be shared with more than 100 people.
The last option I have is to create a signed link that works for a short duration, enabling limited unauthorized sharing.
Also tagging Google Photos. In case the partner sharing program can help with this problem, then I can migrate from GCS to Google Photos for storage.
This looks like a common use-case for media based apps. Are there any recommended design patterns to achieve the goal in a cost effective way?
This is my first week learning GCP, so I maybe wrong in some of the points shared above.
Related
I have a React web application in which I allow users to upload DICOM files to Google Healthcare API. The current implementation is that the files first gets uploaded to my back-end server which uploads them to Healthcare API. I am allowing users to upload a full DICOM study (100MB - 2+GB) which could have anywhere from 1-500+ DICOM files (each usually 50KB-50MB). Our current approach as worked thus far but as we are expanding, it seems insufficient use of my server.
My goal is to allow user to directly upload to Google Cloud Storage bucket from the React app. I want to perform some validation logic before I export it to Google Healthcare API. I have looked into signed urls but since the files being uploaded are medical images I wasn't sure if they would be secure enough. The users don't necessarily have a google account.
What is the best way I can allow user to directly upload a directory to GCS bucket without going through my server? Are there dangers involved with this approach if the user uploaded a virus? Also signed urls are valid for a set amount of time, can I deactivate a signed url as soon the uploads are complete?
I am implementing a dictionary website using App Engine and Cloud Storage. App Engine controls the backend, like user authentication etc., and Cloud Storage is used to store a JSON file for each dictionary entry.
I would like to rate limit how much a user can download in a given time period so they can't bulk download the JSON files and result in a big charge for me. Ideally, the dictionary would display a captcha if a user downloads too much at once, and allow them to keep downloading if they pass the captcha. What is the best way to achieve this?
Is there a specific service for rate limiting based on IP address or authenticated user? Should I do this through App Engine and only access Cloud Storage through App Engine (perhaps slower since it's using some of my dynamic resources to serve static content)? Or is it possible to have the frontend access Cloud Storage and implement the rate limiting on Cloud Storage directly? Is a Cloud bucket the right service for storage, here? And how can I allow search engine indexing bots to bypass the rate limiting?
As explained by Doug Stevenson in this post
"There is no configuration for limiting the volume of downloads for
files stored in Cloud Storage."
and explaining further:
"If you want to limit what end users can do, you will need to route
them through some middleware component that you build that tracks how
they're using your provided API to download files, and restrict what
they can do based on their prior behavior. This is obviously
nontrivial to implement, but it's possible."
We currently use the blobstore to handle user uploads (and will likely shift to GCS). Our solution allows users to upload files but I've recently found that users could potentially upload a virus (knowingly or unknowingly).To mitigate this risk I'm considering limiting file types to images and/or pdfs (this would be checked server side). Would this prevent a virus from being uploaded or should I also perform a virus scan on the files once they're uploaded?
If running a virus scan, is there a simple for solution for doing this with GAE or do I need a separate cloud compute instance running it's own virus scan?
Thanks
Rob
Any time you delegate authority to upload an object to an untrusted client, there is risk that the client or malicious code posing as the client can upload malicious content. As far as I am aware, neither Google App Engine's Blobstore service nor Google Cloud Storage provide virus scanning as a service, so you'd have to bring your own. Limiting file types doesn't actually inhibit bad content being uploaded, as some browsers will ignore the stated file type after sniffing file content and render or execute the malicious object.
If you want to do this yourself for a Google Cloud Storage upload, the best practice would be to restrict the upload to have a private ACL, perform whatever sanitization you want, and when determined to be valid, change the ACL to allow broader permissions.
/via Vinny P:
There are online virus-scanning tools you can use programmatically, or you can run an anti-virus engine on Compute Engine or in an App Engine Flexible Environment. Alternatively, if these are supposed to be user-owned files under 25 MB, you could upload the files to Google Drive which will provide virus scanning, and retrieve the files via the Drive API.
I'd like to create an object, give my users an upload url, and let them upload data. The resulting object must be public-readable. Is this possible with google cloud storage? If so, is it possible through google app engine, and where can I find documentation and/or examples for doing it?
To have a user upload directly to Google Cloud Storage, you can use the Signed URLs feature. This allows you to grant access to issue a PUT request to an object to a single user.
If you're using Python, there is a python example demonstrating signed URLs.
You can create an upload url using the blobstore service. See the create_upload_url function.
To make the object publicly accessible you may need to play with the acls of the bucket.
See also the Cloud Storage Overview.
Another option to upload directly to Google Cloud Storage is Resumable URLs.
If your object is big, such as a video, you can upload it in chunks this way. If the upload fails (e.g. client loses internet connection), you can resume from where you left off and not have to have the user start over again. Plus you save some money by not having to restart that upload.
However if your media is small, just use Signed URLs.
I have some big files stored on Google Storage. I would like users to be able to download them only when they are authenticated to my GAE application. The user would use a link of my GAE such as http://myapp.appspot.com/files/hugefile.bin
My first try works for files which sizes are < 32mb. Using the Google Storage experimental API, I could read the file first then serve it to the user. It required my GAE application to be a team member of the project which Google Storage was enabled. Unfortunately this doesn’t work for large files, and it hogs bandwidth by first downloading the file to GAE and then serving it to the player.
Does anyone have an idea on how to carry out that?
You can store files up to 5GB in size using the Blobstore API: http://code.google.com/appengine/docs/python/blobstore/overview.html
Here's the Stackoverflow thread on this: Upload file bigger than 40MB to Google App Engine?
One thing to note, is reading blobstore can only be done in 32MB increments, but the API provides ways to accessing portions of the file for reads: http://code.google.com/appengine/docs/python/blobstore/overview.html#Serving_a_Blob
FYI in the upcoming 1.6.4 release of AppEngine we've added the ability to pass a Google Storage object name to the blobstore.send_blob() to send Google Storage files of any size from you AppEngine application.
Here is the pre-release announcement for 1.6.4.