What can cause high variability of Untraced Time in App Engine requests? - google-app-engine

I just ran a load test against my app. I noticed some very large variability in latency for two identical requests: 3 s vs. 30 s. When I dug into the traces I found the following:
| | Traced (ms) | Untraced (ms) |
|----------------------+-------------+---------------|
| High-latency Request | 193 | 29948 |
| Low-latency Request | 305 | 2934 |
Here are screen shots for the traces:
Low overall latency
High overall latency
I cannot make sense of a 10-to-1 difference in runtime performance.
I only see these high-latency requests under load. Could something in my code account for this variability (assuming the same path through the code was followed for both requests)?

In my experience, those extraordinarily high latencies are caused by a fresh instance starting up. There was a discussion in the AppEngine Google Group which is about similar issues: https://groups.google.com/d/msg/google-appengine/MBveo1KSTyY/mkYdyCmfAgAJ
Two things you can do:
reduce time it takes an instance to start by reducing overall size of your application (code, libraries, assets) and heavy use of lazy initialization. Using a higher instance class might help too (stronger instances boot up faster)
tweak your scaling options: always have one (or two) idle instances hanging around so there is no need to wait for an instance to boot during traffic peaks
Hope that helps. In my case, using a higher instance class helped a lot already!

I had this problem too, google supports suggested to reduce max_concurrent_requests. The hypothesis at that time was the CPU was throttled. After reducing it, the latency gotten better in general. I still wasn't sure what's going on because I didn't expect the application to be using that much CPU. Unfortunately no tool to profile the CPU on GAE to pin point where exactly the CPU intensive part

Related

Use Google Cloud Functions to speed up GAE app

I have a GAE standard Python app that does some fairly computational processing. I need to complete the processing within the 60 second request time limit, and ideally I'd like to do it faster for a better user experience.
Splitting the work to multiple threads don't seem to be a good solution because the threads would likely run on the same CPU and thus wouldn't give a speed up.
I was wondering if Google Cloud Functions (GCF) could be used in a similar manner as threads. For example, if I create a GCF to do the processing, split my work into 10 chunks, and make 10 GCF calls in parallel, can I expect to get results 10x faster? (aside from latency and GCF startup costs)
Each function invocation runs in its own server instance, and a function will scale up to 1000 instances to handle concurrent requests in parallel. So yes, you can do this, if you are willing to potentially pay the cold start cost of each server instance as it's allocated for its first request.
If you're able to split the workload in smaller chunks that you'd be launching in parallel via separate (external) requests I'd suspect you'd get a better performance (and cost) by using GAE itself (maybe in a separate service) instead of CFs:
GAE standard environment instances can have higher CPU speeds - a B8 instance has 4.8 GHz, the max CF CPU speed is 2.4 GHz
you have better control over the GAE scaling configuration and starting time penalties
I suspect networking delays would be at least the same if not better on GAE - not going to another product infra (unsure though)
GAE costs would likely be smaller since you pay per instance hours (regardless of how many requests the instance handles) not per request/invocations

Optimizing Solr 4 on EC2 debian instance(s)

My Solr 4 instance is slow and I don't know why.
I am attempting to modify the configurations of JVM, Tomcat6 and Solr 4 in order
to optimize performance, with queries per second as the key metric.
Currently I am running on an EC2 small tier with Debian squeeze, but ready to switch to Ubuntu if needed.
There is nothing special about my use case. The index is small. Queries do include a moderate number of unions (e.g. 10), plus faceting, but I don't think that's unusual.
My understanding is that these areas could need tweaking:
Configuring the JVM Garbage collection schedule and memory allocation ("GC tuning is a precise art form", ref)
Other JVM settings
Solr's Query Result cache, Filter cache, Document cache settings
Solr's Auto-warming settings
There are a number of ways to monitor the performance of Solr:
SolrMeter
Sematext SPM
New Relic
But none of these methods indicate which settings need to be adjusted, and there's no guide that I know of that steps through an exhaustive list of settings that could possibly improve performance. I've reviewed the following pages (one, two, three, four), and gone through some rounds of trial and error so far without improvement.
Questions:
How to tell JVM to use all the 2 GB memory on the small EC2 instance?
How to debug and optimize JVM Garbage Collection?
How do I know when I/O throttling, such as the new EBS IOPS pricing, is the issue?
Using figures like the NewRelic examples below, how to detect what is problematic behavior, and how to approach solutions.
Answers:
I'm looking for link to good documentation for setting up and optimizing Solr 4, from a DevOps or server admin perspective (not index or application design).
I'm looking for the top trouble spots in catalina.sh, solrconfig.xml, solr.xml (other?) that are most likely causes of problems.
Or any tips you think address the questions.
First, you should not focus on switching your linux distribution. A different distribution might bring some changes but considering the information you gave, nothing prove that these changes may be significant.
You are mentionning lots of possibilities for your optimisations, this can be overwhelming. You should consider an tweaking area only once you have proven that the problem lies in that particular part of your stack.
JVM Heap Sizing
You can use the parameter -mx1700m to give a maximum of 1.7GB of RAM to the JVM. Hotspot might not need it, so don't be surprised if your heap capacity does not reach that number.
You should set the minimum heap size to a low value, so that Hotspot can optimise its memory usage. For instance, to set a minimal heap size at 128MB, use -mx128m.
Garbage Collector
From what you say, you have limited hardware (1-core at 1.2GHz max, see this page)
M1 Small Instance
1.7 GiB memory
1 EC2 Compute Unit (1 virtual core with 1 EC2 Compute Unit)
...
One EC2 Compute Unit provides the equivalent CPU capacity of a 1.0-1.2
GHz 2007 Opteron or 2007 Xeon processor
Therefore, using that low-latency GC (CMS) won't do any good. It won't be able to run concurrently with your application since you have only one core. You should switch to the Throughput GC using -XX:+UseParallelGC -XX:+UseParallelOldGC.
Is the GC really a problem ?
To answer that question, you need to turn on GC logging. It is the only way to see whether GC pauses are responsible for your application response time. You should turn these on with -Xloggc:gc.log -XX:+PrintGCDetails.
But I don't think the problem lies here.
Is it a hardware problem ?
To answer this question, you need to monitor resource utilization (disk I/O, network I/O, memory usage, CPU usage). You have a lot of tools to do that, including top, free, vmstat, iostat, mpstat, ifstat, ...
If you find that some of these resources are saturating, then you need a bigger EC2 instance.
Is it a software problem ?
In your stats, the document cache hit rate and the filter cache hit rate are healthy. However, I think the query result cache hit rate is pretty low. This implies a lot of queries operations.
You should monitor the query execution time. Depending on that value you may want to increase the cache size or tune the queries so that they take less time.
More links
JVM options reference : http://jvm-options.tech.xebia.fr/
A feedback that I did on some application performance audit : http://www.pingtimeout.fr/2013/03/petclinic-performance-tuning-about.html
Hope that helps !

App Engine High CPU Warning - Is it really an issue?

I recently finished my app and have since began testing. One of my
main resources makes 14 RPCs, I have made every effort to decrease CPU
time, i.e., marking properties as unindexed, however I really cannot
reduce the number of GETs, PUTs and queries that occur in the request.
The average CPU times associated with the request today is:
ms=381 cpu_ms=1192 api_cpu_ms=1122 cpm_usd=0.033597
This does fluctuate and can be less, however it rarely falls low
enough to have the high CPU warnings removed.
I've noticed my app still scales fine, I have about 15 instances
running whilst testing.
So my question is, should I be concerned at the high CPU warnings or
is it something most people experience?
As long as your wallclock time (the first time in those timings) averages less than 1000ms, your app will continue to autoscale just fine, regardless of how many CPU ms your app uses. As systempuntoout explains, the high CPU warnings are a holdover from when we limited high CPU requests on a per-url basis, and no longer apply except as a guide for optimisation.
If your application can't be optimized but the CPU usage is fine for your daily budget, I think you could ignore this warning without any concern.
Years ago, the number of High CPU Requests in a specific time was limited and the warning was useful to monitor and correct the application to stay above that specific quota; that limit was then removed and the High CPU warning is now useful just to identify part of the program that should be checked for optimization (where possible).
In practice, I've never had an issue with not enough instances being spun up to handle user requests even if I am running many longer-running jobs in the background [e.g., an expensive mapreduce job]. Of course, your budget needs to cover your CPU expenses or your app won't be able to operate.
However, I remember reading in the docs that if the majority of your requests take a long time, then app engine might not spin up additional instances to handle requests. However, if the majority of your requests are served quickly, then a minority of long-running jobs shouldn't cause any problems. (Unfortunately, I can't find the article/docs with the specific guidance on longer running requests at the moment.)

Is app engine more expensive when it's slower?

There have been quite a few occasions recently when app engine appears to run slower. To some degree that's understandable with the architecture of their cloud platform. I'm not talking about new server instances - just requests to warm servers. I'm also just referring to CPU, not datastore API, but I do wonder about that as well.
It seems that during these slow periods I get a lot more yellow warnings on my requests - saying I am using a lot of CPU. Certainly they take longer to complete during this period. What concerns me is that during these slow periods, my billable CPU seems to go up.
So to be clear - when app engine is fast, a request might complete in 100ms. In a slow period, it might take more than 1s for the same request. Same URI, same caching, same processing path, same datastore, same indexes - much more CPU. The yellow warnings, as I understand it, are referring to billable CPU usage, and there's many more of them when app engine is slower.
This seems to set up a bizarre situation where my app costs more to run when app engine performance is worse. This means google makes more money the more poorly the platform performs (up to the point where it fails or customers leave). Maybe I've got the situation all wrong, and it doesn't work like that - but if it does work like that, then as a customer the pressures and balances there are all wrong. That's not intimating any wrong-doing on google's part - just that the relationships between those two things don't seem right.
It almost seems like google's algorithm goes something like - 'If I give a processing job to a CPU and start my watch, then stop it when the job returns I get the billable CPU figure.' i.e. it doesn't measure CPU work at all. Surely that time should be divided by the number of processing jobs being concurrently executed plus some extra to cover the additional context switching. I'm sure that stuff is hard to measure - perhaps that's the reason.
I guess you could argue it is fair that you pay more when app engine is in high demand, but that makes budgeting close to impossible - you can't generate stats like '100 users costs me $1 a day', because that could change for a whole host of reasons - including app engine onboarding more customers than the infrastructure can realistically handle. If google over-subscribes app engine then all customers pay more - it's another relationship that doesn't sound right. Surely google's costs should go down as they onboard more customers, and those customers use more resources - based on economies of scale.
Should I expect two identical requests in my app to cost me roughly the same amount each time they run - regardless of how much wall-time app engine takes to actually complete them? Have I misunderstood how this works? If I haven't, is there a reason why I shouldn't be worried about it in the long term? Is there some documentation which makes this situation clearer? Cheers,
Colin
It would be more complicated, but they could change the billing algorithm to be a function of load. Or perhaps they could normalize the CPU measurements based on the performance of similar calls in the past.
I agree that this presents problems for the developers.
Yes this is true. It is a bummer. It also takes them over a second to start up my Java application (which I was billed for) every time they decided my site was in low demand, and didn't need the resources.
I ended up using a cron to auto ping my site every minute to keep it warm.. doing all the wasted work made my bill cheaper, as it didn't have the startup time, instead it just had lots of 2ms pings...
This question appears old and I think the pricing scheme must have changed...
The Google App Engine charges for "instance hours" and the instances currently spawned are viewable in the GAE console. And Google provides adjustments so you can decide cost vs latency for your app.
https://developers.google.com/appengine/docs/adminconsole/performancesettings
I did noticed that if the front-end is bogged down hitting a common backend resource that GAE will spawn a bunch of instances to get latency down. And you will pay for those instance hours even though latency/throughput doesn't improve. The adjustments I mentioned seem to help with that.

Database Network Latency

I am currently working on an n-tier system and battling some database performance issues.
One area we have been investigating is the latency between the database server and the application server. In our test environment the
average ping times between the two boxes is in the region of 0.2ms however on the clients site its more in the region of 8.2 ms. Is that
somthing we should be worried about?
For your average system what do you guys consider a resonable latency and how would you go about testing/measuring the latency?
Yes, network latency (measured by ping) can make a huge difference.
If your database response is .001ms then you will see a huge impact from going from a 0.2ms to 8ms ping. I've heard that database protocols are chatty, which if true means that they would be affected more by slow network latency versus HTTP.
And more than likely, if you are running 1 query, then adding 8ms to get the reply from the db is not going to matter. But if you are doing 10,000 queries which happens generally with bad code or non-optimized use of an ORM, then you will have wait an extra 80seconds for an 8ms ping, where for a 0.2ms ping, you would only wait 4 seconds.
As a matter of policy for myself, I never let client applications contact the database directly. I require that client applications always go through an application server (e.g. a REST web service). That way, if I accidentally have an "1+N" ORM issue, then it is not nearly as impactful. I would still try to fix the underlying problem...
In short : no !
What you should monitor is the global performance of your queries (ie transport to the DB + execution + transport back to your server)
What you could do is use a performance counter to monitor the time your queries usually take to execute.
You'll probably see your results are over the millisecond area.
There's no such thing as "Reasonable latency". You should rather consider the "Reasonable latency for your project", which would vary a lot depending on what you're working on.
People don't have the same expectation for a real-time trading platform and for a read only amateur website.
On a linux based server you can test the effect of latency yourself by using the tc command.
For example this command will add 10ms delay to all packets going via eth0
tc qdisc add dev eth0 root netem delay 10ms
use this command to remove the delay
tc qdisc del dev eth0 root
More details available here:
http://devresources.linux-foundation.org/shemminger/netem/example.html
All applications will differ, but I have definitely seen situations where 10ms latency has had a significant impact on the performance of the system.
One of the head honchos at answers.com said according to their studies, 400 ms wait time for a web page load is about the time when they first start getting people canceling the page load and going elsewhere. My advice is to look at the whole process, from original client request to fulfillment and if you're doing well there, there's no need to optimize further. 8.2 ms vs 0.2 ms is exponentially larger in a mathematical sense, but from a human sense, no one can really perceive an 8.0 ms difference. It's why they have photo finishes in races ;)

Resources