I have a small question with possibly a complex answer. I have tried to research around, but I think I may not know the keywords.
I want to build a web service that will send a JSON response, which would be used for another application. My goal is having the App Engine server crawl a set of webpages and store the relevant values so the second application (client) would not need to query everything. It will only go to my server with the already condensed information.
I know, it's pretty common, but how can I defend from attackers who wish to exhaust my App Engine resources/quota?
I have been thinking on limiting the amount of requests by IP (say.. 200 requests / 5 minutes), but is that feasible? Or is there a better, and more clever way of doing it?
First, you need to cache the JSON. don't hit the datastore for every request. use memcache or possibly, depending on your requirements, you can cache the JSON in a static file in Cloud Storage. This simple is the best defender against DDOS, since every request adds minimal overhead.
Also, take a look in the DDOS protection service offered by app engine:
https://developers.google.com/appengine/docs/java/config/dos
You could require users to log-in then generate and send an auth key to the client app that must accompany any requests to the app engine service.
Related
I am trying to figure out how to access Google App Engine Memcache service from outside Google App Engine. Any help on how this can be done would be greatly appreciated.
Thanks in advance!
I don't think this is currently possible. I don't know if there is any technical argument for this or if this decision has been made simply for billing purposes. But it seems like memcache is intended to be an integral part of App Engine. The only relevant discussion I could find is this feature request. It calls for possibility of accesing memcached data of one App Engine project by another App Engine project. It seems to me that Google didn't consider such functionality to be beneficial. You could try filing your own feature request to make memcache a standalone service. In case you do not succeed (and I am afraid you won't), here is a simple workaround.
A simple workaround:
Create a simple App Engine project which would serve as a facade over memcache service. This dummy App Engine project would simply translate your HTTP requests to memcache API calls and return the obtained data in the body of a HTTP response. For example, to retrieve a memcache record you could send a GET request such as:
https://<your-poject-id>.appspot.com/get?key=<some-particular-key>
This call would get "translated" into:
memcache.get(<some-particular-key>);
And the obtained data appended to the HTTP response.
Since accessing memcache is free, you would only have to pay for instance time. I don't know what through-put are you expecting, but I can imagine scenarios where you could even fit into the free daily quota (currently 28 hours/day). All in all, the intermediate App Engine project should not come with significant cost in neither performance nor price.
Before using this workaround:
The above snippet of code is intended for illustration purposes only. There still remain some issues to be dealt with before using this approach in production. For example, as pointed out by Suken, anyone would be able to access your memcache if they knew what requests to send. Here are four additional things I would personally do:
Address the security issues by sending some authentication token with each request. An obvious necessity would be to make the calls over HTTPS to prevent man-in-the-middle attackers from obtaining this token. Note that App Engine's appspot.com subdomains are accessible via HTTPS by default.
Prefer batch API calls such as getAll() over their single record alternatives such as get(). Retrieving multiple records in one batch call is much faster than making multiple separate API calls.
Use POST requests (instead of GET) to access the facade application. You won't have to worry about your batch requests being to large. I only used GET request in the example above because it was easier to write.
Check if such usage of App Engine doesn't violate the Terms of Service. Personally, I don't believe it does. And I don't see why Google should mind. After all, you will be paying for instance hours.
EDIT: After giving this some more thought, I believe that the suggested workaround is actually what Google presumes you to do. Given that the Goolge's objective is to earn money, it would be unreasonable to provide a free service unless it was a part of a paid one. Of course, another billing schemes could be created. For example, allowing direct access only for developers who are willing to pay for dedicated memcache. The question is whether your use case is broad enough to convince Google to take some action.
No, AFAIK the Memcache service is not available outside GAE. To be even more specific it is only available inside the GAE standard environment, it is unavailable in the GAE flexible environment.
But some of the alternate solutions suggested for GAE flexible users might be useable for you as well. From Memcache:
The Memcache service is currently not available for the App Engine
flexible environment. An alpha version of the memcache service will be
available shortly. If you would like to be notified when the service
is available, fill out this early access form.
If you need access to a memcache service immediately, you can use the
third party memcache service from Redis Labs. To access this service,
see Caching Application Data Using Redis Labs Memcache.
You can also use Redis Labs Redis Cloud, a third party fully-managed
service. To access this service, see Caching Application Data Using
Redis Labs Redis.
As stated by other users the Memcache is not offered as a service outside GAE (Google App Engine). I would like to point out that implementing GAE facade over Memcache service has security ramifications. Please note that facade GAE Memcache app will be exposed on the public internet like any other GAE service. I am assuming that you want to use Memcache for internal use only. Another aspect to think about is writing into memcache. If you intend to write to memcache from outside GAE, then definitely avoid facade implementation. If comprised anyone will be able to use you facade implementation as their own cache without paying for it ;)
My suggestion is to spin up a stack using GCP Cloud Launcher. There are various stack templates available for both Redis and Memcache stacks. Further you can configure the template to use preemptible burstable instances to reduce the cost of your Memcache.
I have a front-end angular app using firebase to store user data.
I currently do not have a backend set up, such as a node.js server.
I would like to use the Google Docs API to upload files from my app.
Since the Great Firewall of China does not (or makes unstable) the use of Google services, is it possible to place those services on the backend server and still use them reliably?
Perhaps after they have uploaded the document to firebase, a backend script retrieves it, uploads it to google docs, and then removes the record from firebase? Just trying to see if Google or similar services are even feasible for this use case.
I suppose the crux of my question is whether or not the calling of the Google API would be taking place on the user's computer, in which case would it become unstable?
** Updates for clarity:
I am deciding whether my firebase-backed app needs a more traditional backend like a node server to do things like: upload images and documents, send mail via Mandrill, etc... It would be helpful to me if I knew whether, after putting in the time to create a server, some of the services I am after (aka APIs) are any more resilient to the GFW than they would be if they ran on the client side. So if any one has had success in such a task, I would like to know.
** Technical update:
So, for example, if I run the Google Maps API on the client side, if the user is in China and is not running a VPN, accessing the API calls will either lag or time out or (rarely) success in returning the scripts. If I was somehow able to able to process the map query "off-site" aka on the server, could I then return with a static image of the map to a Chinese user without fail?
If I was somehow able to able to process the map query "off-site" aka
on the server, could I then return with a static image of the map to a
Chinese user without fail?
Yes, of course. What you are going to miss this way is all the front-end interactive functionality Google Maps offers. But if that's ok in your use case, sure.
I have never tried it with the GCF, but what I would do is this:
Google Maps <-> Your Reverse proxy <-> User
So, instead of the user visitng the real google maps site, it will be visiting your maps.mydomain.com site, that will be sitting in between, proxying everything.
Nginx is an excellent choice for a reverse proxy. If you need more control, there are good node.js reverse proxying packages that you an use to rewrite the content extensively before serving it (perhaps to obfuscate it in case the GCF blacklists content based on pattern matching, or to change the script names/links again to avoid pattern matching).
You are misunderstanding about the great firewall of China. I consulted for a couple of Chinese companies after the dot com crash so I can say this from personal experience, not hearsay.
It is mostly high-end Cisco hardware behind gateways behind their government telecom infrastructure. Nowadays they knock off what hardware they can, every chance they can, and spend money on specialized hardware to monitor cell phones systems.
There was a brief mention of the street-level surveillance hardware on 20/20 before the crash if you are interested in looking it up.
Not to discourage you, but I say set up whatever open servers you want with whatever frontends or backends you want, but the reality is the traffic is not going to be there.
That is why they call it an oppressive regime, they do not get to decide for themselves, remember?
I put one app on google app engine.
My app has one cronjob which is parse data from Internet and store into my db.
When user using my app, it will extract data from db, and show data to users.
I found that is too time consuming and too many request from db.
I want to revise each page when the cronjob running daily.
Then user can see the page without query my database.
How can I do that in GAE ?
Thank you for your reply.
Not nearly enough info in the question to help you. For example, what does "too many request from db" mean? Is it because you have a lot of traffic? Or you are querying the db too much?
Possible solutions are:
Edge cache your page: https://groups.google.com/forum/#!topic/google-appengine/6xAV2Q5x8AU/discussion
Store your page in memcache.
Optimize your database accesses. Most likely you're doing something very inefficiently.
Use a cronjob to generate your page, store it in the blobstore, and redirect your fetches to the blobstore. You can do this, but this is a pretty dumb way to go about it, given that there are better options.
I'm afraid it's a limitation of GAE, according to this post, no matter how much caching and tricky solutions you find. They just worsen google's policies.
You can't have what you need if you don't write directly inside the html file and store it on the server every time, which is far more resource consuming and, in my opinion, just pointless. Since GAE is a free service, with the purpose of testing, you should acquaint with what you have, or pay.
I have been writing a Google Chrome extension for Stack Exchange. It's a simple extension that allows you to keep track of your reputation and get notified of comments on Stack Exchange sites.
Currently I've encountered with some issues that I can't handle myself.
My extension uses Google App Engine as its back-end to make external requests to Stack Exchange API. Each single client request from extension for new comments on single site can cause plenty of requests to api endpoint to prepare response even for non-skeetish user. Average user has accounts at least on 3 sites from Stack Exchange network, some has > 10!
Stack Exchange API has request limits:
A single IP address can only make a certain number of API requests per day (10,000).
The API will cut my requests off if I make more than 30 requests over 5 seconds from single IP address.
It's clear that all requests should be throttled to 30 per 5 seconds and currently I've implemented request throttle logic based on a distributed lock with memcached. I'm using memcached as a simple lock manager to coordinate the activity of GAE instances and throttle UrlFetch requests.
But I think it's a big failure to limit such powerful infrastructure to issue no more than 30 requests per 5 sec. Such api request rate does not allow me to continue development of new interesting and useful features and one day it will stop working properly at all.
Now my app has 90 users and growing and I need come up with solution how to maximize request rate.
As known App Engine makes external UrlFetch requests via the same pool of different IP's.
My goal is to write request throttle functionality to ensure compliance with the api terms of usage and to utilize GAE distributed capabilities.
So my question is how-to provide maximum practical API throughput while complying with api terms of usage and utilizing GAE distributed capabilities.
Advise to use another platform/host/proxy is just useless in my mind.
If you are searching a way to programmatically manage Google App Engine shared pool of IPs, I firmly believe that you are out of luck.
Anyway, quoting this advice that is part of the faq, I think you have more than a chance to keep on running your awesome app:
What should I do if I need more
requests per day?
Certain types of applications -
services and websites to name two -
can legitimately have much higher
per-day request requirements than
typical applications. If you can
demonstrate a need for a higher
request quota, contact us.
EDIT:
I was wrong, actually you don't have any chance.
Google App Engine [app]s are doomed.
First off: I'm using your extension and it rocks!
Have you consider using memcached and caching the results?
Instead of taking the results from the API directly, try first to find them on the cache if they are use it and if they are not: retrieve them and cache them and let them expire after X minutes.
Second, try to batch up users requests, instead of asking the reputation of a single user ask the reputation of several users together.
I would like to write a client application for Android that uses the Google App Engine as a database backend. My Android client would connect to the App Engine to save information, then it would connect later for reports. Is it possible to use the App Engine as a backend like this?
If you're looking for something like the remote api that the App Engine has in python, then you'll be disappointed to find it missing in Java.
That said, absolutely nothing stops your from hitting your app and posting data either through POST / JSON / XML / any other format you can think of. The same thing goes for getting your reports back.
If security is a concern, the OAuth protocol allows you to authenticate to app engine from your android device.
This is an aside, but as far as reporting is concerned, you might not find the app engine a very suitable platform for reporting type apps. Just make sure you understand its limitations - the lack of joins, 1000 object limit, no sum / average, necessary indexes, etc. It's certainly not impossible, but do think carefully about how you're going to model your data.
Yes, it is possible.
Without more details in your question, any more details in the answer would be speculation.
Yes, its very much possible. It's something I am also currently working on.
My code uses HTTP GET and HTTP POST and I am using a RESTful service on the GAE.
I'm sorry I can't provide any code because I am still learning however the library I'm using is called RESTLET. They have libraries for both GAE and Android however I'm only using RESTLET on the GAE and I'm just using the HTTP library in the Android SDK for the client.
http://www.restlet.org/
The version you require is 2.0 M6 and not the stable release.
No.
In your response to Laurence, you said you want a direct DB connetion. A client cannot connect directly to the GAE datastore. You must write web handlers to interface between the client and your data. It doesn't have to be much, but it must be something.
Yes, it is very possible. You would not connect directly to the GAE database though. A better architecture would be to make your app hit a URL that writes to the DB. For example, you could set up a Struts 2 action that takes the values of your query parameters and then mutates and validates them as necessary before persisting them.