We run a Google Appengine service so our applications share an address range with other applications. We post to the MailChimp API on behalf of our customers using their API key. We recently started having occasional posts to MailChimp rejected with a 403 returned and the message
You don't have permission to access https://mailchimp...
We have confirmed with MailChimp support they have blocked the specific IP this was posting from because of prior bad behavior but we have no control over which IP appengine uses to post messages and they can post from a large range. Anyone have any suggestions for how to work around this, obviously migrating the service is one possibility
Thanks
I agree that this isn't necessarily a programming problem, but there are potential programming solutions: one is to institute limited retries for 403 errors. Maybe retry those subscribe again in 5 minutes (hoping for a new IP). Another would be to proxy those requests through a small, cheap VPS.
Unfortunately, cloud IPs are really attractive to bad actors because they're really tough to block without causing a lot of collateral damage.
Related
I'm new to web development and was curious about something. When posting to an endpoint to then receive a value from a server-side function, is it problematic if multiple users are writing to the same endpoint? Can this corrupt the value returned?
For instance, I'm using Stripe in a project and you're supposed to post to an endpoint to generate a user-specific ephemeral key. There's a 1-2 second delay in the response at times, so would there be a problem if two users posted to the same endpoint within a few milliseconds?
Capable web server software is designed with concurrency in mind, meaning a server can handle multiple user requests at the same time.
If you're curious about the specific techniques of how this is done, or web server architecture in general, this article is pretty interesting and offers some sample applications
http://www.linuxjournal.com/content/three-ways-web-server-concurrency
I host a web service, and have recently getting many HTTP requests (up to several thousand per second) from IP addresses that start with 10 according to the attached log. From my limited networking knowledge, this prefix means the IP is a local one, not a WAN IP. Why would app engine report traffic from Google's own LAN IP? Furthermore, because of this it seems I'm unable to blacklist that IP range, which has been costing me quite a lot in quota fees! Any ideas of why I'd be seeing local IP addresses in the logs for these requests, and how i can block them before they reach my application?
Sigh, embarrassingly, there is an obvious reason a Google App Engine application would see an IP with prefix 10 - It's Google's Crawler. The issue was I was generating too many unique URLs and it was trying to crawl them all, leading to the obscene traffic volumes I was seeing. So I was, in a way, DOS attacking myself by letting the crawler know about too many unique URLs. A simple robots.txt fix seems to do the job for this traffic, although one bot with "User-agent: Feedfetcher" is still hitting the site. Obvious in hindsight but maybe it will help someone else.
I have a small question with possibly a complex answer. I have tried to research around, but I think I may not know the keywords.
I want to build a web service that will send a JSON response, which would be used for another application. My goal is having the App Engine server crawl a set of webpages and store the relevant values so the second application (client) would not need to query everything. It will only go to my server with the already condensed information.
I know, it's pretty common, but how can I defend from attackers who wish to exhaust my App Engine resources/quota?
I have been thinking on limiting the amount of requests by IP (say.. 200 requests / 5 minutes), but is that feasible? Or is there a better, and more clever way of doing it?
First, you need to cache the JSON. don't hit the datastore for every request. use memcache or possibly, depending on your requirements, you can cache the JSON in a static file in Cloud Storage. This simple is the best defender against DDOS, since every request adds minimal overhead.
Also, take a look in the DDOS protection service offered by app engine:
https://developers.google.com/appengine/docs/java/config/dos
You could require users to log-in then generate and send an auth key to the client app that must accompany any requests to the app engine service.
I had been developing a GAE project that makes quite a number of logins and API calls for many Google Apps (Drive, spreadsheet, plus, groups, sites, etc). I was developing on a Google Apps for business domain with just 2 accounts and getting random errors very often. They mostly were 403 but also things like file not found when using Drive API; Of course other times the same exact things worked properly, so my guessing is this was related to API calls quota limit.
On occasions I kept getting that generic error saying 'something went wrong, that's all we know' for several minutes (up to 15-20 minutes).
I recently deployed the app to a Google Domain with over 100 accounts and all those errors seem to have vanished, which kinda confirm my guessing that they were indeed related to API calls quota limits, as the quota limit is said to be directly related to the number of accounts in the domain.
Is there any way where this quota and current usage can be checked? I can check many quotas under Google Cloud Console, but I can't find anything related to API usage.
What you observe might not be related to quota, the error is pretty explicit in that case, something like "QUOTA LIMIT EXCEEDED". I've been working with Google APIs for a long time now, and it's pretty common to get random issues like this. However, when you get a 404 from Drive it means that you don't have access to the file with the user you're using to make the API call. 403 would mean you are trying to perform a "writing" operation (update, patch) on a file with a user who has only reader access.
Anyway, to answer your question, you can now check the quota from the Developer Console under the APIs section of your project:
I have been writing a Google Chrome extension for Stack Exchange. It's a simple extension that allows you to keep track of your reputation and get notified of comments on Stack Exchange sites.
Currently I've encountered with some issues that I can't handle myself.
My extension uses Google App Engine as its back-end to make external requests to Stack Exchange API. Each single client request from extension for new comments on single site can cause plenty of requests to api endpoint to prepare response even for non-skeetish user. Average user has accounts at least on 3 sites from Stack Exchange network, some has > 10!
Stack Exchange API has request limits:
A single IP address can only make a certain number of API requests per day (10,000).
The API will cut my requests off if I make more than 30 requests over 5 seconds from single IP address.
It's clear that all requests should be throttled to 30 per 5 seconds and currently I've implemented request throttle logic based on a distributed lock with memcached. I'm using memcached as a simple lock manager to coordinate the activity of GAE instances and throttle UrlFetch requests.
But I think it's a big failure to limit such powerful infrastructure to issue no more than 30 requests per 5 sec. Such api request rate does not allow me to continue development of new interesting and useful features and one day it will stop working properly at all.
Now my app has 90 users and growing and I need come up with solution how to maximize request rate.
As known App Engine makes external UrlFetch requests via the same pool of different IP's.
My goal is to write request throttle functionality to ensure compliance with the api terms of usage and to utilize GAE distributed capabilities.
So my question is how-to provide maximum practical API throughput while complying with api terms of usage and utilizing GAE distributed capabilities.
Advise to use another platform/host/proxy is just useless in my mind.
If you are searching a way to programmatically manage Google App Engine shared pool of IPs, I firmly believe that you are out of luck.
Anyway, quoting this advice that is part of the faq, I think you have more than a chance to keep on running your awesome app:
What should I do if I need more
requests per day?
Certain types of applications -
services and websites to name two -
can legitimately have much higher
per-day request requirements than
typical applications. If you can
demonstrate a need for a higher
request quota, contact us.
EDIT:
I was wrong, actually you don't have any chance.
Google App Engine [app]s are doomed.
First off: I'm using your extension and it rocks!
Have you consider using memcached and caching the results?
Instead of taking the results from the API directly, try first to find them on the cache if they are use it and if they are not: retrieve them and cache them and let them expire after X minutes.
Second, try to batch up users requests, instead of asking the reputation of a single user ask the reputation of several users together.