Google App Engine Data Protection - google-app-engine

I would like to know a few things about how google handles the data withing google app engine:
Where is the data located - is it possible to find out?
How long will the data be stored? And what happens the data gets deleted?
Can you backup lost data or broken data?
What kind of security is added to protect the data?

You can find most of the answers here. More on ACID
Where? Google data centers across the globe
For how long? Forever is too much to say, so let's say for as long as you need the data to be there.
If the data gets deleted?
If you delete it yourself it'll be propagated within a sensible timeframe infrastructurally speaking
If there are accidental errors on Google's side, the data is replicated and backed.
To protect against accidental errors on your side you can use DataStore Admin Console to back up your data
Data is secured on Google's side. If you need to transfer sensitive information, App Engine enabled SSL for custom domains recently. If you don't want to go through the hassle of buying and registering certificates you can always use SSL through custom App Engine domain for free. More here

Related

Design: using a backend server to circumvent great firewall of china

I have a front-end angular app using firebase to store user data.
I currently do not have a backend set up, such as a node.js server.
I would like to use the Google Docs API to upload files from my app.
Since the Great Firewall of China does not (or makes unstable) the use of Google services, is it possible to place those services on the backend server and still use them reliably?
Perhaps after they have uploaded the document to firebase, a backend script retrieves it, uploads it to google docs, and then removes the record from firebase? Just trying to see if Google or similar services are even feasible for this use case.
I suppose the crux of my question is whether or not the calling of the Google API would be taking place on the user's computer, in which case would it become unstable?
** Updates for clarity:
I am deciding whether my firebase-backed app needs a more traditional backend like a node server to do things like: upload images and documents, send mail via Mandrill, etc... It would be helpful to me if I knew whether, after putting in the time to create a server, some of the services I am after (aka APIs) are any more resilient to the GFW than they would be if they ran on the client side. So if any one has had success in such a task, I would like to know.
** Technical update:
So, for example, if I run the Google Maps API on the client side, if the user is in China and is not running a VPN, accessing the API calls will either lag or time out or (rarely) success in returning the scripts. If I was somehow able to able to process the map query "off-site" aka on the server, could I then return with a static image of the map to a Chinese user without fail?
If I was somehow able to able to process the map query "off-site" aka
on the server, could I then return with a static image of the map to a
Chinese user without fail?
Yes, of course. What you are going to miss this way is all the front-end interactive functionality Google Maps offers. But if that's ok in your use case, sure.
I have never tried it with the GCF, but what I would do is this:
Google Maps <-> Your Reverse proxy <-> User
So, instead of the user visitng the real google maps site, it will be visiting your maps.mydomain.com site, that will be sitting in between, proxying everything.
Nginx is an excellent choice for a reverse proxy. If you need more control, there are good node.js reverse proxying packages that you an use to rewrite the content extensively before serving it (perhaps to obfuscate it in case the GCF blacklists content based on pattern matching, or to change the script names/links again to avoid pattern matching).
You are misunderstanding about the great firewall of China. I consulted for a couple of Chinese companies after the dot com crash so I can say this from personal experience, not hearsay.
It is mostly high-end Cisco hardware behind gateways behind their government telecom infrastructure. Nowadays they knock off what hardware they can, every chance they can, and spend money on specialized hardware to monitor cell phones systems.
There was a brief mention of the street-level surveillance hardware on 20/20 before the crash if you are interested in looking it up.
Not to discourage you, but I say set up whatever open servers you want with whatever frontends or backends you want, but the reality is the traffic is not going to be there.
That is why they call it an oppressive regime, they do not get to decide for themselves, remember?

Avoid DoS attacks on App Engine

I have a small question with possibly a complex answer. I have tried to research around, but I think I may not know the keywords.
I want to build a web service that will send a JSON response, which would be used for another application. My goal is having the App Engine server crawl a set of webpages and store the relevant values so the second application (client) would not need to query everything. It will only go to my server with the already condensed information.
I know, it's pretty common, but how can I defend from attackers who wish to exhaust my App Engine resources/quota?
I have been thinking on limiting the amount of requests by IP (say.. 200 requests / 5 minutes), but is that feasible? Or is there a better, and more clever way of doing it?
First, you need to cache the JSON. don't hit the datastore for every request. use memcache or possibly, depending on your requirements, you can cache the JSON in a static file in Cloud Storage. This simple is the best defender against DDOS, since every request adds minimal overhead.
Also, take a look in the DDOS protection service offered by app engine:
https://developers.google.com/appengine/docs/java/config/dos
You could require users to log-in then generate and send an auth key to the client app that must accompany any requests to the app engine service.

What is the suitable db for bulk writes

My application is currently on app engine server. My application writes the records(for logging and reporting) continuously.
Scenario: Views count in the website. When we open the website it hits the server to add the record with time and type of view. Showing these counts in the users dashboard.
Seems these requests are huge now. For now 40/sec. Google App Engine writes are going heavy and cost is increasing like anything.
Is there any way to reduce this or any other db to log the views?
Google App Engine's Datastore is NOT suitable for such a requirement where you have to continuously write to datastore and read less often.
You need to offload this task to a third party service (either you write one or use existing one)
Better option for user tracking and analytics is Google Analytics (Although you wont be directly able to show the hit counters on website using analytics).
If you want to show your user page hit count use a page hit counter: https://www.google.com/search?q=hit+counter
In this case you should avoid Datastore.
For this kind of analytics it's best to do the following:
Dump data to GAE log (yes, this sounds counter-intuitive, but it's actually advice from google engineers). GAE log is persistent and is guaranteed to not loose data you write to it.
Periodically parse the log for your data and then export it to BigQuery.
BigQuery has a quite powerful query language so it's capable of doing complex analytics reports.
Luckily this was already done before: see the Mache framework. Also see related video.
Note: there is now a new BigQuery feature called streaming inserts, which could potentially replace the cumbersome middle step (files on Cloud Storage) used in Mache.

Exporting/Extracting data from datastore

New in GAE development and have some question regarding extracting data.
I have an app that collects data from end users and data is stored in high availability datastore, and there is a need to send subset of data the app collected to business partners on a regular basis.
Here are my questions,
1. How can I backup data in the datastore on a regular basis, say daily incremental backup and weekly full backup?
2. what are the best practices to generate daily data dump files that can be downloaded or send to my partners in a secured approach. I expect few hundred MB data files everyday and eventually will be in few GB range.
3. Can my business partners be authenticated though basic HTTP auth, or have to use OAuth?
Google is in effect backing up your data by storing it in multiple data centres.
You can however use the bulk loader if desired and back it up manually:
Uploading and Downloading Data
You can authenticate users however you choose, it's totally up to you. The "users" service is integrated into app engine directly however so if everybody has or could have google accounts that's even easier for you to use.
The users service
Due to the size of your files unless you want to piece them together from the datastore you'll have to use something else, as the datastore has a 1MB limit per model. It's perfectly possible to do that however.
But you should probably look at The Google Cloud Storage API instead as there are no file size limits.

Google App Engine Datastore Data Privacy

I'm playing around with the Google App Engine and the Datastore.
Really amazing stuff going on over there.
But I couldn't help and wonder what Google is allowed to do with the data my application is storing in there Datastore.
Can someone explain it in simple worlds?
Thanks
Google App Engine is governed by Google's general Privacy Policy. They promise not to share information outside of Google except in certain circumstances (court order, etc.), and they restrict access to only employees who need it. However, because they can use your data to "provide, maintain, protect and improve our services," Google may be using your data for their own purposes (probably not, but I see nothing that prevents them).
Disclaimer: I am not a lawyer. Also, this policy doesn't provide GAE-specific information.
Worth noting that Google Cloud has separate Terms of Service that includes this line:
5.2 Use of Customer Data. Google will not access or use Customer Data, except as necessary to provide the Services to Customer.

Resources