Take these stats from a post from the App Engine blog as an example:
real = 107ms
cpu = 141ms
api = 388ms
overhead = 1ms
RPC Total: 63ms (388ms api)
Grand Total: 107ms (530ms cpu + api)
I think I understand overhead: it gives the amount of time taken to write the logs, excluding the time it took to store the logs in memcache.
I am confused by the other numbers:
What exactly do real, cpu and api mean?
How is api different from RPC total?
What is the "Grand Total"?
This is my understanding:
real is the time as measured by a clock. This is time elapsed.
api usage is the time spent on RPC's, such as accessing the datastore. This is not truly a time, but some amount of computing resources measured in time.
cpu usage is the time spent executing code. Again, it's not really a time but resource usage as measured in time.
api is different than RPC Total only in that RPC total shows the amount of clock time that has elapsed during the api time. It's possible to do 388ms of computation in 63ms because of parallelism. So, RPC Total shows both clock-time spent, as well as resoure usage.
Grand Total is the total wall time (the same as real), with the sum of cpu, api, and overhead. In this case, 530ms of quota are used in 107ms.
overhead is, of course, time "wasted" waiting for "real" work to be done. This mostly includes the resources taken by AppStats itself.
See the document Appstats: RPC Instrumentation for Google App Engine by Guido van Rossum for details.
Guido van Rossum gave a talk at Google I/O 2010 called Appstats - Instrumentation for App Engine where he discusses this briefly. It's a great talk to learn about App Engine, and optimization and instrumentation in general. It's about an hour long.
Related
I created a site on App Engine and chose the smallest F1 instance class which according to the docs has a CPU Limit of 600 MHz.
I limited the app to 1 instance only as a test and let it run several days then checked the CPU utilization on the dashboard. Here's part of the chart:
As you can see the utilization which is given in Megacycles/sec which I assume equals to one MHz is between like 700 and 1500.
The app uses one F1 instance only, runs without problems, there are no quota errors, but then what does the 600 Mhz CPU limit mean if the utilization is usually above it?
Megacycles/sec is not MHz in this graph. As explained in Interface QuotaService:
Measures the duration that the current request has spent so far
processing the request within the App Engine sandbox. Note that time
spent in API calls will not be added to this value. The unit the
duration is measured is Megacycles. If all instructions were to be
executed sequentially on a standard 1.2 GHz 64-bit x86 CPU, 1200
megacycles would equate to one second physical time elapsed.
In App Engine Flex, you get an entire CPU core from the machine you are renting out, but in App Engine Standard, it shows the Megacycles since it uses a sandbox.
Note that there is a feature request in the issue tracker on adding CPU% metric under gae_app for App Engine standard and I have relayed your concern about it to the Cloud App Engine product team. However, there is no guarantee of the implementation and ETA at this time. I recommend to star the ticket so that you would receive updates about it.
I have attached two Google Stackdriver Profilers for our backend server.
The backend api simply trying to read from Memcache first, if it doesn't exist or has timeout error, then retrieve the data from Big Table.
Based on the wall time and CPU time profiles, I'd like to know
Why the libjvm_so process consumes that much (45.8%) of CPU time, is it because the server allocates lots of memory that cause garbage collection using a lot of CPU?
The wall time profile shows 97% of thread time is waiting for a resource, I guess it's waiting for Memcache/big table server to return the data, is it true? Does this mean the server doesn't need that much CPU? (currently 16 CPUs, with an average load of 35%)
Any other insights? Where can be improved etc?
I have an app engine application with some services being based on webapp2 framework and some service being based on endpoints-v2 framework.
The issue that i am facing over is that some time the OPTIONS request being sent from front end takes a huge amount of time get the response back which varies from 10 secs to 15 secs which is adding latency to my entire application. On digging down deeper into the issue and found the it is due to instance startup time that is costing us this much latency.
So my question is
Does starting up an instance takes this much of time ?
If not then how can i reduce my startup time for instances ?
How the instances start so that i can optimise those situations in my code?
Java instance takes a long time to spin up. You can hide the latency by configuring warmup request and min-idle-instances (see here) in your appengine-web.xml.
GAE dashboard shows stats for different URIs of your app. It includes Req/Min, Requests, Runtime MCycles, and Avg Latency.
The help provided seems to be outdated, here what it states:
The current load table provides two data points for CPU usage, "Avg CPU (API)" and "% CPU". The "Avg CPU (API)" displays the average amount of CPU a request to that URI has consumed over the past hour, measured in megacycles. The "% CPU" column shows the percentage of CPU that URI has consumed since midnight PST with respect to the other URIs in your application.
So I assume Runtime MCycles is what the help calls Avg CPU (API)?
How do I map this number to the request stats in the logs?
For example one of the requests has this kind of logs: ms=583 cpu_ms=519 api_cpu_ms=402.
Do I understand correctly that ms includes cpu_ms and cpu_ms includes api_cpu_ms?
So then cpu_ms is the Runtime MCycles which is shown as average for the given URI on dashboard?
I have a F1 instance with 600Mhz and concurrency enabled for my app. Does it mean this instance throughput it 600 MCycles per second? So if average request takes 100 Mcycles, it should handle 5-6 request on average?
I am digging into this to try to predict the costs for my app under load.
This blog post (by Nick Johnson) is a useful summary of what the request log fields mean: http://blog.notdot.net/2011/06/Demystifying-the-App-Engine-request-logs
How much of a compute-intensive gain can one expect on GAE MapReduce? The scenario of interest to me is compute intensive, so for example: multiplying a trillion random floats in a single threaded single core application. Then imagine 1000 MapReduce workers multiplying a billion random numbers each and announcing "finished" when all workers have finished. Assume billing is enabled if that matters. (It might not).
Edit: A commenter asked for clarification. The title has been revised. If the task takes 50000 seconds single threaded and in an alternative implementation 1000 MapReduce workers are employed and they finish after 500 seconds, then the performance gain is 100 times. 1000 workers: 100 times gain, only slightly disappointing, but so be it for this example. How can I get finished sooner? Can I ask for 10,000 workers? This question may have to do with limits and quotas. Assume an adequate budget. Does MapReduce's compute-intensive performance gain head to an asymptote and if so what is the performance gain at that asymptote? There was also information in the comment about MapReduce being suitable for large amounts of data generated by a user facing URL however, my question is not in regard to a Datastore-intensive application's performance versus the same application rewritten for MapReduce. Datastore activity will be minimal in this compute-intensive scenario. I realize there will always be some Datastore activity in any MapReduce application, but since this is a compute-intensive scenario, the Datastore activity and the size of the Datastore entities is not going to be a big influence on the performance gain calculated. The task will use the Datastore for less than 1% of the elapsed time. Nor is the scenario involving a large amount of communication bandwidth (other than the minimum necessary to hit the task queued URLs that MapReduce uses). The question is in regard to comparing a compute-intensive single threaded non-MapReduce task's elapsed time to the same task's elapsed time on MapReduce which is inherently multi-threaded given there are multiple workers. I use the word "task" generically, in other words, "task means work". The gain might (but not necessarily) be a function of the number of workers hence I mentioned 1000 workers in the example.
It's not clear exactly what you're asking here. Are you asking how efficient it is? How cheap it is? How fast it is?
In general, App Engine is designed for serving user-facing sites, and the App Engine mapreduce API exists to assist with that - processing large amounts of data generated by the user-facing site. If you have a large amount of data that's hosted outside App Engine, and you want to do some sort of large-scale data processing on it, App Engine is probably not the tool for you.
Regarding performance, you can expect each worker to execute tasks as fast as they would be if you were executing them serially, so your items-per-second is roughly the number of workers multiplied by the regular rate - there's relatively little overhead. There can be some delay at the end, though, when different workers finish at different times, and how much this is depends on how good a job mapreduce does of sharding your data. With datastore input, this used to be fairly poor, but it's a lot better now.
As to how many mappers you can have, that depends on a number of things: Whether or not your app has billing enabled, how much other traffic your app gets, and how long your mapper tasks take per element. The only real way to determine this is to experiment a bit.