Transferring data from gae datastore to a google docs spreadsheet - google-app-engine

My gae application collects large amounts of numerical data. Instead of having users download it, is it possible to create a google docs spreadsheet and save the outgoing bandwidth?
The idea is to create a google docs spreadsheet with the data which the user can then access and if he downloads the data to his computer, it would not count as bandwidth used by my application.

To call external APIs over Http you, or the library that you use, would need to make URLFetch calls, which count towards outgoing quota.
So you would only save on outgoing bandwidth cost if all users downloaded the same data, e.g. no per-user generated data. Even then the limits dor Google Docs apply: spreadsheet of max 20Mb in size and max 400k cells.
Also, in case of one shared spreadsheet you would not be able to control access to it. Everyone with the Url would be able to download it.

Related

Access control for media files on Google Cloud Storage

I have a social media app deployed on App Engine where users can upload and share photos/videos with a private group of people. For writes, I have a POST endpoint that accepts uploaded files and writes them to one GCS bucket that's not public. For reading, a GET endpoint checks with Cloud SQL if this user is authorized to access the media file - if yes, it returns the file stream. The file is stored only for 48 hours and average retrieval is 20 times per day. Users are authenticated using Firebase email link login.
The issue with the current approach is that my GET endpoint is an expensive middleman for reading the GCS file stream and passing it on to the clients, adding to the cost as may times the API is invoked.
There's no point caching the file on App Engine because cache hit ratio will be extremely low for my use case.
The GET API could return a GCS file URL instead of File Stream, if I make the GCS bucket public. But that would mean anyone can access the file with this public URL, not just my app or limited user. Plus, the entire bucket is vulnerable now.
I could create an ACL for each GCS file object, but ACLs work only for users with Google accounts and my app uses email link authentication. There's also a limit on ACL entries per object in case the file needs to be shared with more than 100 people.
The last option I have is to create a signed link that works for a short duration, enabling limited unauthorized sharing.
Also tagging Google Photos. In case the partner sharing program can help with this problem, then I can migrate from GCS to Google Photos for storage.
This looks like a common use-case for media based apps. Are there any recommended design patterns to achieve the goal in a cost effective way?
This is my first week learning GCP, so I maybe wrong in some of the points shared above.

How can I enforce rate limit for users downloading from Google Cloud Storage bucket?

I am implementing a dictionary website using App Engine and Cloud Storage. App Engine controls the backend, like user authentication etc., and Cloud Storage is used to store a JSON file for each dictionary entry.
I would like to rate limit how much a user can download in a given time period so they can't bulk download the JSON files and result in a big charge for me. Ideally, the dictionary would display a captcha if a user downloads too much at once, and allow them to keep downloading if they pass the captcha. What is the best way to achieve this?
Is there a specific service for rate limiting based on IP address or authenticated user? Should I do this through App Engine and only access Cloud Storage through App Engine (perhaps slower since it's using some of my dynamic resources to serve static content)? Or is it possible to have the frontend access Cloud Storage and implement the rate limiting on Cloud Storage directly? Is a Cloud bucket the right service for storage, here? And how can I allow search engine indexing bots to bypass the rate limiting?
As explained by Doug Stevenson in this post
"There is no configuration for limiting the volume of downloads for
files stored in Cloud Storage."
and explaining further:
"If you want to limit what end users can do, you will need to route
them through some middleware component that you build that tracks how
they're using your provided API to download files, and restrict what
they can do based on their prior behavior. This is obviously
nontrivial to implement, but it's possible."

What is the suitable db for bulk writes

My application is currently on app engine server. My application writes the records(for logging and reporting) continuously.
Scenario: Views count in the website. When we open the website it hits the server to add the record with time and type of view. Showing these counts in the users dashboard.
Seems these requests are huge now. For now 40/sec. Google App Engine writes are going heavy and cost is increasing like anything.
Is there any way to reduce this or any other db to log the views?
Google App Engine's Datastore is NOT suitable for such a requirement where you have to continuously write to datastore and read less often.
You need to offload this task to a third party service (either you write one or use existing one)
Better option for user tracking and analytics is Google Analytics (Although you wont be directly able to show the hit counters on website using analytics).
If you want to show your user page hit count use a page hit counter: https://www.google.com/search?q=hit+counter
In this case you should avoid Datastore.
For this kind of analytics it's best to do the following:
Dump data to GAE log (yes, this sounds counter-intuitive, but it's actually advice from google engineers). GAE log is persistent and is guaranteed to not loose data you write to it.
Periodically parse the log for your data and then export it to BigQuery.
BigQuery has a quite powerful query language so it's capable of doing complex analytics reports.
Luckily this was already done before: see the Mache framework. Also see related video.
Note: there is now a new BigQuery feature called streaming inserts, which could potentially replace the cumbersome middle step (files on Cloud Storage) used in Mache.

Exporting/Extracting data from datastore

New in GAE development and have some question regarding extracting data.
I have an app that collects data from end users and data is stored in high availability datastore, and there is a need to send subset of data the app collected to business partners on a regular basis.
Here are my questions,
1. How can I backup data in the datastore on a regular basis, say daily incremental backup and weekly full backup?
2. what are the best practices to generate daily data dump files that can be downloaded or send to my partners in a secured approach. I expect few hundred MB data files everyday and eventually will be in few GB range.
3. Can my business partners be authenticated though basic HTTP auth, or have to use OAuth?
Google is in effect backing up your data by storing it in multiple data centres.
You can however use the bulk loader if desired and back it up manually:
Uploading and Downloading Data
You can authenticate users however you choose, it's totally up to you. The "users" service is integrated into app engine directly however so if everybody has or could have google accounts that's even easier for you to use.
The users service
Due to the size of your files unless you want to piece them together from the datastore you'll have to use something else, as the datastore has a 1MB limit per model. It's perfectly possible to do that however.
But you should probably look at The Google Cloud Storage API instead as there are no file size limits.

Bandwidth usage in Google App Engine

How can I find out how my bandwidth is used in Google App Engine? I want to extract the top bandwidth hogs so I can cut down on my outgoing bandwidth usage.
App engine logs all requests. This includes information about the request (path, query string, wall/cpu/api time, and approximate data transfer out in kb) and the requester (IP address and (if the user is logged in) google account name). You should be able to compute a reasonable estimate based on this information.
You can periodically download your app's logs with appcfg. How often you need to do this will be based on how much traffic your site handles.
Also it may be helpful for you to review the usage from the admin console (all up and via the logs) as shown below - the link to the admin console is at https://appengine.google.com/

Resources