App Engine High CPU Warning - Is it really an issue? - google-app-engine

I recently finished my app and have since began testing. One of my
main resources makes 14 RPCs, I have made every effort to decrease CPU
time, i.e., marking properties as unindexed, however I really cannot
reduce the number of GETs, PUTs and queries that occur in the request.
The average CPU times associated with the request today is:
ms=381 cpu_ms=1192 api_cpu_ms=1122 cpm_usd=0.033597
This does fluctuate and can be less, however it rarely falls low
enough to have the high CPU warnings removed.
I've noticed my app still scales fine, I have about 15 instances
running whilst testing.
So my question is, should I be concerned at the high CPU warnings or
is it something most people experience?

As long as your wallclock time (the first time in those timings) averages less than 1000ms, your app will continue to autoscale just fine, regardless of how many CPU ms your app uses. As systempuntoout explains, the high CPU warnings are a holdover from when we limited high CPU requests on a per-url basis, and no longer apply except as a guide for optimisation.

If your application can't be optimized but the CPU usage is fine for your daily budget, I think you could ignore this warning without any concern.
Years ago, the number of High CPU Requests in a specific time was limited and the warning was useful to monitor and correct the application to stay above that specific quota; that limit was then removed and the High CPU warning is now useful just to identify part of the program that should be checked for optimization (where possible).

In practice, I've never had an issue with not enough instances being spun up to handle user requests even if I am running many longer-running jobs in the background [e.g., an expensive mapreduce job]. Of course, your budget needs to cover your CPU expenses or your app won't be able to operate.
However, I remember reading in the docs that if the majority of your requests take a long time, then app engine might not spin up additional instances to handle requests. However, if the majority of your requests are served quickly, then a minority of long-running jobs shouldn't cause any problems. (Unfortunately, I can't find the article/docs with the specific guidance on longer running requests at the moment.)

Related

Use Google Cloud Functions to speed up GAE app

I have a GAE standard Python app that does some fairly computational processing. I need to complete the processing within the 60 second request time limit, and ideally I'd like to do it faster for a better user experience.
Splitting the work to multiple threads don't seem to be a good solution because the threads would likely run on the same CPU and thus wouldn't give a speed up.
I was wondering if Google Cloud Functions (GCF) could be used in a similar manner as threads. For example, if I create a GCF to do the processing, split my work into 10 chunks, and make 10 GCF calls in parallel, can I expect to get results 10x faster? (aside from latency and GCF startup costs)
Each function invocation runs in its own server instance, and a function will scale up to 1000 instances to handle concurrent requests in parallel. So yes, you can do this, if you are willing to potentially pay the cold start cost of each server instance as it's allocated for its first request.
If you're able to split the workload in smaller chunks that you'd be launching in parallel via separate (external) requests I'd suspect you'd get a better performance (and cost) by using GAE itself (maybe in a separate service) instead of CFs:
GAE standard environment instances can have higher CPU speeds - a B8 instance has 4.8 GHz, the max CF CPU speed is 2.4 GHz
you have better control over the GAE scaling configuration and starting time penalties
I suspect networking delays would be at least the same if not better on GAE - not going to another product infra (unsure though)
GAE costs would likely be smaller since you pay per instance hours (regardless of how many requests the instance handles) not per request/invocations

How to Gain Visibility and Optimize Quota Usage in Google App Engine?

How do I go about optimizing my Google App Engine app to reduce instance hours I am currently using/paying for?
I have been using app engine for a while and the cost has been creeping upwards. I now spend enough on GAE to invest time into reducing the expense. More than half of my GAE bill is due to frontend instance hours, so it's the obvious place to start. But before I can start optimizing, I have to figure out what's using the instance hours.
However, I am having difficulty trying to determine what is currently using so many of my frontend instance hours. My app serves many ajax requests, dynamic HTML pages, cron jobs, and deferred tasks. For all I know there could be some runaway process that is causing my instance usage to be so high.
What methods or techniques are available to allow me to gain visibility into my app to see where I am using instance hours?
Besides code changes (all suggestions in the other answer are good) you need to look into the instances over time graph.
If you have spikes and constant use, the instances created during the spikes wont go to sleep because appengine will keep using them. In appspot application settings, change the "idle instances" max to a low number like 1 (or your actual daily average).
Also, change min latency to a higher number so less instances will be created on spikes.
All these suggestions can make an immediate effect on lowering your bill, but its just a complement to the code optimizations suggested in the other answer.
This is a very broad question, but I will offer a few pointers.
First, examine App Engine's console Dashboard and logs. See if there are any errors. Errors are expensive both in terms of lost business and in extra instance hours. For example, tasks are retried several times, and these reties may easily prolong the life of an instance beyond what is necessary.
Second, the Dashboard shows you the summary of your requests over 24 hours period. Look for requests with high latency. See if you can improve them. This will both improve the user experience and may reduce the number of instance hours as more requests can be handled by each instance.
Also look for data points that surprise you as a developer of your app. If you see a request that is called many more times that you think is normal, zero in on it and see what it is happening.
Third, look at queues execution rates. When you add multiple tasks to a queue, do you really need all of them to be executed within seconds? If not, reduce the execution rate so that the queue never needs more than one instance.
Fourth, examine your cron jobs. If you can reduce their frequency, you can save a bunch of instance hours. If your cron jobs must run frequently and do a lot of computing, consider moving them to a Compute Engine instance. Compute Engine instances are many times cheaper, so having such an instance run for 24 hours may be a better option than hitting an App Engine instance every 15 minutes (or even every hour).
Fifth, make sure your app is thread-safe, and your App Engine configuration states so.
Finally, do the things that all web developers do (or should do) to improve their apps/websites. Cache what can be cached. Minify what needs to be minified. Put images in sprites. Split you code if it can be split. Use Memcache. Etc. All of these steps reduce latency and/or client-server roundtrips, which helps to reduce the number of instances for the same number of users.
Ok, my other answer was about optimizing at the settings level.
To trace the performance at a granular level use the new cloud trace relased today at google i/o 2014.
http://googledevelopers.blogspot.com/2014/06/cloud-platform-at-google-io-enabling.html

App Engine loading request even when idle instance available

I have a simple app running on App Engine but I'm having odd problems with latency. It's a Python 2.7 app and a loading request takes between 1.5 and 10 secs (I guess depending on how GAE is feeling). This is a low traffic site right now, so previously GAE was sitting with no idle instances and most request were loading requests, resulting in a long wait time on the first page view.
I've tried configuring the minimum number of idle instances to "1" so that these infrequent page views can immediately hit a warm instance.
However, I've seen several cases now where even with one instance sitting unused, GAE will route an incoming request to a loading instance, leaving the warm instance untouched:
gae dashboard showing odd scheduling
How can I prevent this from happening? I feel I must be understanding something wrong, because I certainly don't expect this behavior.
Update: Also, what makes this even less comprehensible is that the app has threadsafe enabled, so I really don't understand why GAE would get flustered and spin up an instance for a single, lone request.
Actually, I believe this is normal behavior. Idle instances are supposed to guarantee a minimum number of instances always available (for spiky load).
So, when some requests start coming in, they are initially served by idle instances, but at the same time AE scheduler will start launching new instances to always guarantee the same amount of idle instances even during suddenly increased load. That is, to "cover" for those idle instances that became busy serving requests.
It is described in details on Adjusting Application Performance page.
Arrrgh! Suffer from this myself. This topic-area has come up in several threads (GAE groups & SO). If someone can dial-in the settings for a low-traffic site (billing on/off), that would be a real benefit. IIRC, someone with what I think is deep GAE experience noted in one thread that the Scheduler does not do well with very low volume apps. I have also seen wildly different startup times within a relatively short period of time. Painful to see a spinup take 700ms then 7000ms just a few minutes later. Overall the issue is not so much the cost to me, but more so the waste of infrastructure resources. In testing I've had two instances running despite having pinged the app with an RPC once every few minutes. If 50k other developers are similarly testing, that could accumulate into a significant waste.

Is app engine more expensive when it's slower?

There have been quite a few occasions recently when app engine appears to run slower. To some degree that's understandable with the architecture of their cloud platform. I'm not talking about new server instances - just requests to warm servers. I'm also just referring to CPU, not datastore API, but I do wonder about that as well.
It seems that during these slow periods I get a lot more yellow warnings on my requests - saying I am using a lot of CPU. Certainly they take longer to complete during this period. What concerns me is that during these slow periods, my billable CPU seems to go up.
So to be clear - when app engine is fast, a request might complete in 100ms. In a slow period, it might take more than 1s for the same request. Same URI, same caching, same processing path, same datastore, same indexes - much more CPU. The yellow warnings, as I understand it, are referring to billable CPU usage, and there's many more of them when app engine is slower.
This seems to set up a bizarre situation where my app costs more to run when app engine performance is worse. This means google makes more money the more poorly the platform performs (up to the point where it fails or customers leave). Maybe I've got the situation all wrong, and it doesn't work like that - but if it does work like that, then as a customer the pressures and balances there are all wrong. That's not intimating any wrong-doing on google's part - just that the relationships between those two things don't seem right.
It almost seems like google's algorithm goes something like - 'If I give a processing job to a CPU and start my watch, then stop it when the job returns I get the billable CPU figure.' i.e. it doesn't measure CPU work at all. Surely that time should be divided by the number of processing jobs being concurrently executed plus some extra to cover the additional context switching. I'm sure that stuff is hard to measure - perhaps that's the reason.
I guess you could argue it is fair that you pay more when app engine is in high demand, but that makes budgeting close to impossible - you can't generate stats like '100 users costs me $1 a day', because that could change for a whole host of reasons - including app engine onboarding more customers than the infrastructure can realistically handle. If google over-subscribes app engine then all customers pay more - it's another relationship that doesn't sound right. Surely google's costs should go down as they onboard more customers, and those customers use more resources - based on economies of scale.
Should I expect two identical requests in my app to cost me roughly the same amount each time they run - regardless of how much wall-time app engine takes to actually complete them? Have I misunderstood how this works? If I haven't, is there a reason why I shouldn't be worried about it in the long term? Is there some documentation which makes this situation clearer? Cheers,
Colin
It would be more complicated, but they could change the billing algorithm to be a function of load. Or perhaps they could normalize the CPU measurements based on the performance of similar calls in the past.
I agree that this presents problems for the developers.
Yes this is true. It is a bummer. It also takes them over a second to start up my Java application (which I was billed for) every time they decided my site was in low demand, and didn't need the resources.
I ended up using a cron to auto ping my site every minute to keep it warm.. doing all the wasted work made my bill cheaper, as it didn't have the startup time, instead it just had lots of 2ms pings...
This question appears old and I think the pricing scheme must have changed...
The Google App Engine charges for "instance hours" and the instances currently spawned are viewable in the GAE console. And Google provides adjustments so you can decide cost vs latency for your app.
https://developers.google.com/appengine/docs/adminconsole/performancesettings
I did noticed that if the front-end is bogged down hitting a common backend resource that GAE will spawn a bunch of instances to get latency down. And you will pay for those instance hours even though latency/throughput doesn't improve. The adjustments I mentioned seem to help with that.

What advice can you give me for writing a meaningful benchmark?

I have developed a framework that is used by several teams in our organisation. Those "modules", developed on top of this framework, can behave quite differently but they are all pretty resources consuming even though some are more than others. They all receive data in input, analyse and/or transform it, and send it further.
We planned to buy new hardware and my boss asked me to define and implement a benchmark based on the modules in order to compare the different offers we have got.
My idea is to simply start sequentially each module with a well chosen bunch of data as input.
Do you have any advice? Any remarks on this simple procedure?
Your question is pretty broad, so unfortunately my answer will not be very specific either.
First, benchmarking is hard. Do not underestimate the effort necessary to produce meaningful, repeatable, high-confidence results.
Second, what is your performance goal? Is it throughput (transaction or operations per second)? Is it latency (time it takes to execute a transaction)? Do you care about average performance? Do I care about worst case performance? Do you care about the absolute worst case or I care that 90%, 95% or some other percentile get adequate performance?
Depending on which goal you have, then you should design your benchmark to measure against that goal. So, if you are interested in throughput, you probably want to send messages / transactions / input into your system at a prescribed rate and see if the system is keeping up.
If you are interested in latency, you would send messages / transactions / input and measure how long it takes to process each one.
If you are interested in worst case performance you will add load to the system until up to whatever you consider "realistic" (or whatever the system design says it should support.)
Second, you do not say if these modules are going to be CPU bound, I/O bound, if they can take advantage of multiple CPUs/cores, etc. As you are trying to evaluate different hardware solutions you may find that your application benefits more from a great I/O subsystem vs. a huge number of CPUs.
Third, the best benchmark (and the hardest) is to put realistic load into the system. Meaning, you record data from a production environment, and put the new hardware solution through this data. Getting this done is harder than it sounds, often, this means adding all kinds of measure points in the system to see how it behaves (if you do not have them already,) modifying the existing system to add record/playback capabilities, modifying the playback to run at different rates, and getting a realistic (i.e., similar to production) environment for testing.
The most meaningful benchmark is to measure how your code performs under everyday usage. That will obviously provide you with the most realistic numbers.
Choose several real-life data sets and put them through the same processes your org uses every day. For extra credit, talk with the people that use your framework and ask them to provide some "best-case", "normal", and "worst-case" data. Anonymize the data if there are privacy concerns, but try not to change anything that could affect performance.
Remember that you are benchmarking and comparing two sets of hardware, not your framework. Treat all of the software as a black box and simply measure the hardware performance.
Lastly, consider saving the data sets and using them to similarly evaluate any later changes you make to the software.
If you're system is supposed to be able to handle multiple clients all calling at the same time, then your benchmark should reflect this. Note that some calls will not play well together. For example, having 25 threads post the same bit of information at the same time could lead to locks on the server end, thus skewing your results.
From a nuts-and-bolts point of view, I've used Perl and its Benchmark module to gather the information I care about.
If you're comparing differing hardware, then measuring the cost per transaction will give you a good comparison of the trade offs of hardware for performance. One configuration may give you the best performance, but costs too much. A less expensive configuration may give you adequate performance.
It's important to emulate the "worst case" or "peak hour" of load. It's also important to test with "typical" volumes. It's a balancing act to get good server utilization, that doesn't cost too much, that gives the required performance.
Testing across hardware configurations quickly becomes expensive. Another viable option is to first measure on the configuration you have, then simulate that behavior across virtual systems using a model.
If you can, try to record some operations users (or processes) are doing with your framework, ideally using a clone of the real system. That gives you the most realistic data. Things to consider:
Which functions are most often used?
How much data is transferred?
Do not assume anything. If you think "that is going to be fast/slow", don't bet on it. In 9 out of 10 cases, you're wrong.
Create a top ten for 1+2 and work from that.
That said: If you replace old hardware with new hardware, you can expect roughly 10% faster execution for each year that has passed since you bought the first set (if the systems are otherwise pretty equal).
If you have a specialized system, the numbers may be completely different but usually, new hardware doesn't change much. For example, adding an useful index to a database can reduce the runtime of a query from two hours to two seconds. Hardware will never give you that.
As I see it, there are two kinds of benchmarks when it comes to benchmarking software. First, microbenchmarks, when you try to evaluate a piece of code in isolation or how a system deals with narrowly defined workload. Compare two sorting algorithms written in Java. Compare two web browsers how fast can each perform some DOM manipulation operation. Second, there are system benchmarks (I just made the name up), when you try to evaluate a software system under a realistic workload. Compare my Python based backend running on Google Compute Engine and on Amazon AWS.
When dealing with Java and such like, keep in mind that the VM needs to warm up before it can give you realistic performance. If you measure time with the time command, the JVM startup time will be included. You almost always want to either ignore start-up time or keep track of it separately.
Microbenchmarking
During the first run, CPU caches are getting filled with the necessary data. The same goes for disk caches. During few subsequent runs the VM continues to warm up, meaning JIT compiles what it deems helpful to compile. You want to ignore these runs and start measuring afterwards.
Make a lot of measurements and compute some statistics. Mean, median, standard deviation, plot a chart. Look at it and see how much it changes. Things that can influence the result include GC pauses in the VM, frequency scaling on the CPU, some other process may start some background task (like virus scan), OS may decide move the process on a different CPU core, if you have NUMA architecture, the results would be even more marked.
In case of microbenchmarks, all of this is a problem. Kill what processes you can before you begin. Use a benchmarking library that can do some of it for you. Like https://github.com/google/caliper and such like.
System benchmarking
In case of benchmarking a system under a realistic workload, these details do not really interest you and your problem is "only" to know what a realistic workload is, how to generate it and what data to collect. It is always best if you can instrument a production system and collect data there. You can usually do that, because you are measuring end-user characteristics (how long did a web page render) and these are I/O bound so the code gathering data does not slow down the system. (The page needs to be shipped to the user over the network, it does not matter if we also log a few numbers in the process).
Be mindful of the difference between profiling and benchmarking. Benchmarking can give you absolute time spent doing something, profiling gives you relative time spent doing something compared to everything else that needed doing. This is because profilers run heavily instrumented programs (common technique is to stop-the-world every few hundred ms and save a stack trace) and the instrumentation slows everything down significantly.

Resources