Requests to GAE app fail with Connection Reset - google-app-engine

Our GAE python app exposes an API that is hit by an external client system (Java-based, if that matters). The large majority of requests (tens of thousands per day) work fine, however some few requests (less than 10 per day) fail with the client side reporting 'Connection Reset by Peer' error. When that happens, the client system has fired multiple API calls that finish successfully, so we rule out the case of connectivity issues on the client side.
The GAE logs show only app-related failures but other kinds of failures (e.g. connection errors) don't appear in the logs, so we can't really tell why these API calls are failing.
Is there any way to better identify such issues other that the logs?
The GAE module that accepts the API calls has the following scaling properties
instance_class: B2
basic_scaling:
max_instances: 5
idle_timeout: 1m
and at the time of failure, only 2 (out of maximum 5) instances where running, so the GAE module is below its scaling limits. The API calls are served on-average in less that 500ms and we have never seen a log error for exceeding the 60" limit of requests. Overall, the module doesn't seem overloaded. Could it be something else?

Seems to me like the best solution to fix your issue is to try and have an exponential backoff algorithm in your code when the connection gets reset, as it would "fail gracefully" and retry it.
I would also suggest moving to automatic scaling, where playing with your max pending latency and min pending latency you might help with these kind of issues. I don't have specifics on what would fix your issue, but I guess that fiddling with these could provide some results.

Related

Concurrent requests handling on Google App Engine

I was experimenting with concurrent request handling on few platforms.
The aim of the experiment was to have a broad measure of the capacity bounds of some selected technologies.
I set up a Linux VM on my machine with a basic Go http server (the vanilla http.HandleFunc of the http default package).
The server would then compute a modified version of the fasta algorithm that restricted threads and processes to 1, and return the result. N was set to 100000.
The algorithm runs in roughly 2 seconds.
I used the same algorithm and logic on a Google App Engine project.
The algorithm is written using the same code, just the handler set up is done on init() instead of main() as per GAE requirements.
On the other end an Android client is spawning 500 threads each one issuing in parallel a GET request to the fasta computing server, with a request timeout of 5000 ms.
I was expecting the GAE application to scale and answer back to each request and the local Go server to fail on some of the 500 requests but results were the opposite:
the local server correctly replied to each request within the timeout bounds while the GAE application was able to handle just 160 requests out of 500. The remaining requests timed out.
I checked on the Cloud Console and I verified that 18 GAE instances were spawned, but still the vast majority of requests failed.
I thought that most of them failed because of the start-up time of each GAE instance, so I repeated the experiment right after but I had the same results: most of the requests timed out.
I was expecting GAE to scale to accomodate ALL the requests, believing that if a single local VM could successfully reply to 500 concurrent requests GAE would have done the same, but this is not what happened.
The GAE console doesn't show any error and correctly reports the number of incoming requests.
What could be the cause of this?
Also, if a single instance could handle all the incoming requests on my machine by virtue of only goroutines, how come that GAE needed to scale so much at all?
To make optimal usage in terms of minimizing costs you need to configure few things in app.yaml:
Enable threadsafe: true - actually it's from Python config and not applicable to Go but I would set it just in case.
Adjust scaling section:
max_concurrent_requests - set to maximum 80
max_idle_instances - set to minimum 0
max_pending_latency - set it to automatic or greater then min_pending_latency
min_idle_instances - set it to 0
min_pending_latency - set to higher number. If you are OK to get 1 second latency and you handlers take on average 100ms to process set it to 900ms.
Then you should be able to proceed a lot of request on single instance.
If you OK to burn cash for the sake of responsiveness & scalabiluty - increase min_idle_instances & max_idle_instances.
Also do you use similar instance types for VM and GAE? The GAE F1 instance is not too fast and is more optimal for async tasks like working with IO (datastore,http,etc.). You can configure usage of more powerful instance to better scale for computation intensive tasks.
Also do you test on paid account? Free accounts have quotas and AppEngine would refuse percentage of requests if it believe the load would exceed the daily quota if continuous with the same pattern.
Extending on Alexander's answer.
The GAE scaling logic is based on incoming traffic trend analysis.
The key for being able to handle your case - sudden spikes in traffic (which can't be takes into account in the trend analysis due to its variation speed) - is to have sufficient resident (idle) instances configured for your application to handle such traffic until GAE spins up additional dynamic instances. It can handle as high peaks as you want (if your pockets are deep enough).
See Scaling dynamic instances for more details.
Thanks everyone for their help.
Many interesting points and insights have been made by the answers I had on this topic.
The fact the the Cloud Console were reporting no errors led me to believe that the bottleneck was happening after the real request processing.
I found the reason why the results were not as expected: bandwidth.
Each response had a payload of roughly 1MB and thus responding to 500 simultaneous connections from the same client would clog the lines, resulting in timeouts.
This was obviously not happening when requesting to the VM, where the bandwith is much larger.
Now GAE scaling is in line with what I expected: it successfully scales to accomodate each incoming request.

GAE Occasionally stops serving for 1-5 minutes

Starting about 1 week ago, my app will occasionally and randomly completely stop serving for 1-5 minutes. Requests during this time hang for the full timeout and then return a 500.
The System Status dashboard reads OK, I have no cron jobs or anything special that might cause this disruption (that I know of).
Has anyone experienced this, and is there a solution?
If you have 'threadsafe: false' in your app.yaml configuration, App Engine will not send concurrent requests to your app. If you have a request that's blocking for a really long time, all other requests coming in will line up (and possibly time out) before being serviced. If this is the cause of your problem, either make your app thread-safe or have a look in your logs to find requests that take a long time and fix them.
Alternatively, if your app gets very little traffic, your instances might be getting shut down after they've been idle for a while. If your app takes a long time to start up, that would explain the behavior you're seeing. In app.yaml, you can set 'min_idle_instances' to some value greater than zero to avoid this startup penalty.

GAE: Why do I experience loading requests even though I have fixed the number of instances to exactly one?

I have a low-load application which experienced latency spikes (requests taking up to 10s to return) due to loading requests, as seen in the logs:
This request caused a new process to be started for your application, and thus caused your application code to be loaded for the first time.
Here I assume that "new process" means "new instance".
In order to avoid this, I fixed the number of idle instances to exactly one (max=1 and min=1), so there is always one instance running ("resident instance") and GAE shouldn't start new ones. Billing is enabled.
However, I still experience loading requests. Why? Can anything be done about this?
Idle instances are "reserve" instances - they are meant to handle spikes when traffic increases, not the "normal" traffic. Idle instances are used only during the spin-up of the dynamic instances.
So, when you have one idle instance and no dynamic instances running and you get a request, than the idle instance should handle the request, but a new dynamic instance will still be spun up.
I too experienced the same problem with my low-traffic app and here is the practical solution that almost always prevents my users to face a cold start :
- 1 resident F4 instance
- pending latency to 15 sec
- i worked so that my warmup request are as fast as possible (under 10 sec), still quite long cause i use the frameWork Play (Java)
- and when i really don t want to have any problems i create fake traffic by pinging my app.
With this config, the resident usually serves around 50 requests, during that time, a dynamic instance receives a warmup and then start serving.

App Engine loading request even when idle instance available

I have a simple app running on App Engine but I'm having odd problems with latency. It's a Python 2.7 app and a loading request takes between 1.5 and 10 secs (I guess depending on how GAE is feeling). This is a low traffic site right now, so previously GAE was sitting with no idle instances and most request were loading requests, resulting in a long wait time on the first page view.
I've tried configuring the minimum number of idle instances to "1" so that these infrequent page views can immediately hit a warm instance.
However, I've seen several cases now where even with one instance sitting unused, GAE will route an incoming request to a loading instance, leaving the warm instance untouched:
gae dashboard showing odd scheduling
How can I prevent this from happening? I feel I must be understanding something wrong, because I certainly don't expect this behavior.
Update: Also, what makes this even less comprehensible is that the app has threadsafe enabled, so I really don't understand why GAE would get flustered and spin up an instance for a single, lone request.
Actually, I believe this is normal behavior. Idle instances are supposed to guarantee a minimum number of instances always available (for spiky load).
So, when some requests start coming in, they are initially served by idle instances, but at the same time AE scheduler will start launching new instances to always guarantee the same amount of idle instances even during suddenly increased load. That is, to "cover" for those idle instances that became busy serving requests.
It is described in details on Adjusting Application Performance page.
Arrrgh! Suffer from this myself. This topic-area has come up in several threads (GAE groups & SO). If someone can dial-in the settings for a low-traffic site (billing on/off), that would be a real benefit. IIRC, someone with what I think is deep GAE experience noted in one thread that the Scheduler does not do well with very low volume apps. I have also seen wildly different startup times within a relatively short period of time. Painful to see a spinup take 700ms then 7000ms just a few minutes later. Overall the issue is not so much the cost to me, but more so the waste of infrastructure resources. In testing I've had two instances running despite having pinged the app with an RPC once every few minutes. If 50k other developers are similarly testing, that could accumulate into a significant waste.

Google App Engine - Request was aborted after waiting too long to attempt to service your request

I get this error sometimes.
Request was aborted after waiting too
long to attempt to service your
request. Most likely, this indicates
that you have reached your
simultaneous dynamic request limit.
This is almost always due to
excessively high latency in your app.
Please see
http://code.google.com/appengine/docs/quotas.html
for more details.
The request that causes it has 10 seconds of latency and 0ms of cpu time. It is a simple jsp page that doesn't do anything that takes long at all. Also, my app is very low traffic, and all the times it has happened, it is the only request being processed.
What causes this?
If your application is low-traffic, it's possibly the startup time. There seems to be an ongoing issue where it takes so long to start an instance up, that they breach the time limit.
Some people have "worked around" this by having a cron/scheduled request that runs every few minutes that does nothing (though personally I think is counter-productive, somewhat undermining the reason Google spin your app down!).
There was an issue in their bugtracker about this:
http://code.google.com/p/googleappengine/issues/detail?id=2456
It's now marked as fixed for version 1.4, and there's a little info on it here:
http://googleappengine.blogspot.com/2010/12/happy-holidays-from-app-engine-team-140.html
Always On - For high-priority applications with low or variable traffic, you can now reserve instances via App Engine's Always On feature. Always On is a premium feature costing $9 per month which reserves three instances of your application, never turning them off, even if the application has no traffic. This mitigates the impact of loading requests on applications that have small or variable amounts of traffic.

Resources