We just migrated to google cloud endpoints v2 / java8 and found that latency has gone up. We see this kind of request in traces often:
https://servicecontrol.googleapis.com/v1/services/<myapi>.endpoints.<myappid>.cloud.goog:check
Which uses around 14ms. Also, somehow memory usage went up and our B2 frontends suddenly start blocking and having delays of 10s often, which could be a problem with connection pooling not done right, but was somehow not present with endpoints-v1 & java7 before.
At the same time, we see 0 errors reported per instance (which is not true, it is aborting requests after around 10-30s all the time) and we cannot get any stack traces to see where a request was aborted like before.
Killing / restarting an instance will solve the 10s problem for some time, but that is naturally not a solution.
Are there any steps that have to be done to get to the promised performance improvements of v2?
TL;DR - GCE 2.0 alone is faster and more reliable than GCE 1.0, but don't use API Management or you'll give back all those gains and then some.
I too was seeing major slowness issues when testing out GCE 2.0, and I couldn't possibly justify subjecting my users to such terrible latency drops, so I set out to determine what's going on.
Here was my methodology:
I set up a minimum viable App Engine app consisting of just one simple API call that returns a server timestamp using Endpoints 1.0, Endpoints 2.0, and Endpoints 2.0 with API Management. You can see all the code for these here: https://github.com/ubragg/cloud-endpoints-testing
I deployed each of these to a separate App Engine app and tested the API using the API Explorer at these links (so you can try for yourself):
GCE 1.0
GCE 2.0
GCE 2.0+AM
The results?
Here are the results of a bunch of requests in rapid succession on each of the APIs:
GCE 1.0 GCE 2.0 GCE 2.0+AM
average 434 ms 80 ms 482 ms
median 90 ms 81 ms 527 ms
high 2503 ms 85 ms 723 ms
low 75 ms 73 ms 150 ms
As you can see, GCE 2.0 without AM was both fast and consistent. Even GCE 1.0 usually was pretty fast, but would occasionally have some troublesome outliers. GCE 2.0 with AM was pretty much always unacceptably slow, only dipping into the "maybe acceptable" range on rare occasions.
Note that all of these times are from the client perspective reported by the API Explorer. Here are the server reported averages for the same requests from the App Engine dashboard over the same time period:
GCE 1.0 GCE 2.0 GCE 2.0+AM
average 24 ms 14 ms 395 ms
So bottom line is, if you care about latency, API Management isn't really an option. If you're curious about how to run GCE 2.0 without API Management, simply be sure NOT to follow any of the instructions here: https://cloud.google.com/endpoints/docs/frameworks/python/adding-api-management.
Using the base API framework without the management library (of which the 14ms calls you mentioned are a part), you should get some improved latency. There is some increased memory usage in the v2 frameworks, as it is now incorporating code that was previously a separate service. If you are not using API management, I would suggest removing the library and seeing if it helps. It should eliminate the 14ms of latency and reduce memory use a fair amount, as you won't be loading as much code or data.
Related
I am currently experiencing really long latency issues in my .net core 2.2 applications.
The setup consist of an .net API's through the app engine (memory 2g, 1 cpu, resting instance of 2) which talk to spanner tables with indexes. Whenever our system comes under load we tend to get a spike where our instances jump and the latency raises considerably.
On average our request time for an api request is 30ms but this then jumps to 208s even on instances that do not change. The spanner requests are quite short averaging around 0.072502. The just shows a blue bar spanning the whole of the request time. Checking for row locks but these are simply just GET requests and show nothing.
Is there anything else I can look at?
I was experimenting with concurrent request handling on few platforms.
The aim of the experiment was to have a broad measure of the capacity bounds of some selected technologies.
I set up a Linux VM on my machine with a basic Go http server (the vanilla http.HandleFunc of the http default package).
The server would then compute a modified version of the fasta algorithm that restricted threads and processes to 1, and return the result. N was set to 100000.
The algorithm runs in roughly 2 seconds.
I used the same algorithm and logic on a Google App Engine project.
The algorithm is written using the same code, just the handler set up is done on init() instead of main() as per GAE requirements.
On the other end an Android client is spawning 500 threads each one issuing in parallel a GET request to the fasta computing server, with a request timeout of 5000 ms.
I was expecting the GAE application to scale and answer back to each request and the local Go server to fail on some of the 500 requests but results were the opposite:
the local server correctly replied to each request within the timeout bounds while the GAE application was able to handle just 160 requests out of 500. The remaining requests timed out.
I checked on the Cloud Console and I verified that 18 GAE instances were spawned, but still the vast majority of requests failed.
I thought that most of them failed because of the start-up time of each GAE instance, so I repeated the experiment right after but I had the same results: most of the requests timed out.
I was expecting GAE to scale to accomodate ALL the requests, believing that if a single local VM could successfully reply to 500 concurrent requests GAE would have done the same, but this is not what happened.
The GAE console doesn't show any error and correctly reports the number of incoming requests.
What could be the cause of this?
Also, if a single instance could handle all the incoming requests on my machine by virtue of only goroutines, how come that GAE needed to scale so much at all?
To make optimal usage in terms of minimizing costs you need to configure few things in app.yaml:
Enable threadsafe: true - actually it's from Python config and not applicable to Go but I would set it just in case.
Adjust scaling section:
max_concurrent_requests - set to maximum 80
max_idle_instances - set to minimum 0
max_pending_latency - set it to automatic or greater then min_pending_latency
min_idle_instances - set it to 0
min_pending_latency - set to higher number. If you are OK to get 1 second latency and you handlers take on average 100ms to process set it to 900ms.
Then you should be able to proceed a lot of request on single instance.
If you OK to burn cash for the sake of responsiveness & scalabiluty - increase min_idle_instances & max_idle_instances.
Also do you use similar instance types for VM and GAE? The GAE F1 instance is not too fast and is more optimal for async tasks like working with IO (datastore,http,etc.). You can configure usage of more powerful instance to better scale for computation intensive tasks.
Also do you test on paid account? Free accounts have quotas and AppEngine would refuse percentage of requests if it believe the load would exceed the daily quota if continuous with the same pattern.
Extending on Alexander's answer.
The GAE scaling logic is based on incoming traffic trend analysis.
The key for being able to handle your case - sudden spikes in traffic (which can't be takes into account in the trend analysis due to its variation speed) - is to have sufficient resident (idle) instances configured for your application to handle such traffic until GAE spins up additional dynamic instances. It can handle as high peaks as you want (if your pockets are deep enough).
See Scaling dynamic instances for more details.
Thanks everyone for their help.
Many interesting points and insights have been made by the answers I had on this topic.
The fact the the Cloud Console were reporting no errors led me to believe that the bottleneck was happening after the real request processing.
I found the reason why the results were not as expected: bandwidth.
Each response had a payload of roughly 1MB and thus responding to 500 simultaneous connections from the same client would clog the lines, resulting in timeouts.
This was obviously not happening when requesting to the VM, where the bandwith is much larger.
Now GAE scaling is in line with what I expected: it successfully scales to accomodate each incoming request.
I'm running into a performance issue with Google Cloud Bigtable Python Client. I'm working on a flask API that writes to and reads from a GCP Bigtable instance. The API uses the python client to communicate with Bigtable, and was deployed to GCP App Engine flexible environment.
Under low traffic, the API works fine. However during a load test, the endpoints that reads and writes to Bigtable suffers a huge performance decrease compare to a similar endpoint that doesn't communicate with Bigtable. Also, a large percentage of requests went to the endpoint receives a 502 Bad Gateway, even when health check was turned off in App Engine.
I'm aware of that the client is currently in Alpha. I wonder if the performance issue is known, or if anyone also ran into the same issue
Update
I found a documentation from Google stating:
There are issues with the network connection. Network issues can
reduce throughput and cause reads and writes to take longer than
usual. In particular, you'll see issues if your clients are not
running in the same zone as your Cloud Bigtable cluster.
In my case, my client is in a different region, by moving it to the same region had a huge increase in performance. However the performance issue still exist, and the recommendation from the documentation is to put client in the same zone as Bigtable.
I also considered using Container engine or Compute Engine where it is easier to specify the zone, but I want stay with App Engine for its autoscale functionality and managed services.
Bigtable client take somewhere between 3 ms to 20 ms to complete each request, and because python is single threaded, during that period of time it will just wait until the response comes back. The best solution we found was for any writes, publish the request to Pubsub, then use Dataflow to write to Bigtable. It is significantly faster because publishing a message in Python would take way below 1 ms to complete, and because Dataflow can be set to exactly the same region as Bigtable, and it is easy to parallel, it can write much faster.
Though it doesn't solve the scenario where you need frequent read or write need to be instantaneous
I did an experiment and installed the same application on Google Cloud Platform, with the same database, and the same buckets (for images) on 2 different locations: us-central and europe-west. However, the loading times are hugely different. I am in Spain, and surprisingly the us-central one is much faster.
Application info:
region: us-central VS europe-west
PHP 5.5
SQL (both the same):
MySQL First Generation master
MySQL 5.6
tier: D1
activation policy: on demand
Preferred location: follow app
Storage (Google Buckets):
Default storage class: Multi-Regional
Location: EU (for europe-west) and US (for us-central)
Loading times (after some refreshes for caching purposes):
us-central: 2.26s https://practia-delta.appspot.com/
europe-west: 9.96s http://gamma.practia.org/
The one in europe-west is so slow it is not practical. Why this difference? Or what did I configure wrong here? Is there anything else that I should look out for in the configuration to make europe-west run as fast as us-central?
OK, so talking to Google Support, I found out that there indeed is a loading difference, but it is not due to the servers being in any way slower.
The difference came from a call to the CloudStorageTools API, namely CloudStorageTools.getImageServingUrl(). Accessing the API from Europe was ~100ms to ~200ms slower than from US servers for each call. As I was making more or less 15 calls on average, this led to a noticeable difference in loading times per page.
The solution in my case was to cache the call to CloudStorageTools.getImageServingUrl() on image creation and save the result in the database. Then on displaying the images, just load the Url from the database and avoid the call to the API each time.
Google Support confirmed the difference in the CloudStorageTools API access times between Europe and US was expected behaviour, although undocumented.
Lately, I have seen GAE taking much, much longer to process requests than it did just a week ago. Nothing changed in my code, but GAE now is taking 4000-12000ms to respond to requests. What makes is worse is that I have plenty of instances available with 0 requests on them.
Has anyone else seen this happen?
What can I do to fix it?I have gone as far as to spin up 15 extra instances (and paid through the nose for them) but nothing seems to send requests to the other idle instances reliably.
My bill has gone from 70-90c/day to $5-8/day without any code change or increase in traffic. In fact, I am losing traffic because of the huge latency.
QPS* Latency* Requests Errors Age Memory Availability
0.000 0.0 ms 1378 0 10:10:09 57.9 MBytes Dynamic
0.000 0.0 ms 1681 0 15:39:57 57.2 MBytes Dynamic
0.017 9687.0 ms 886 0 10:19:10 56.7 MBytes Dynamic
I recommend installing AppStats to get a picture of what's taking so long in each request. I'd guess that you're having some contention issues or large numbers of reads/writes caused by some new data configuration.
The idle instances won't help decrease latency - it looks like every request takes a long time, and with less than one request per minute (in this sample anyway), 10s requests could run serially on the same instance.
We have a similar problem in our app. In our case, we are under the impression that GAE's scheduler did a poor job in balancing requests to existing instances.
In some cases, the scheduler decided to spin up new instances instead of re-using already existing ones. Since spinning a new instance took 5 to >45 seconds, I suspect this might be what happened to you.
Try to investigate the following and see if it helps you:
Make sure your app has thread-safe enabled so that you could process concurrent requests. You could configure this in your app.yaml if you are using Python, or in your appengine-web.xml if you use Java. Of course, you also need to make sure that the code in your app is threadsafe.
In your application settings, if it is still set on automatic, change the minimum pending latency to a non-automatic setting. I'd suggest around 10 seconds for now, but you could experiment later on which setting would suit you the most. This force the scheduler to wait for a certain time to see if any instance is available within the time before spinning up a new instance.
Now, to answer your original question regarding sending all requests to same instance, as far as I know there is no way to address a specific front-end instance in order to direct the requests to that particular instance.
What you could do is migrate your app to use backend instances instead of the regular frontend instance. Backends provides a way to directly target any particular instance within it. You could deploy your app in a single backend to have more control on the number of instance that you spawn. And since using the backend bypass the scheduler, you would not encounter latencies caused by new instances spinning up.
The major drawback of using this approach is that you lose the auto-scalability benefit of using front-end instances. But seeing from your low daily billing, I think scalability is not yet a major concern for the scale of your app.