Is it possible to track only slow requests with appstats? - google-app-engine

Our application processes several dozens of requests per second and small portion of them takes significantly more time to process than others. We are interested to 'profile' those slow requests, however appstats seems to keep just small window of processed requests, so the ones we are interested in fades out very fast. Is it possible to configure appstats somehow to keep log of just requests taking more time than specified threshold ?

A detailed list of configuration options regarding Appstats, is presented at Sample Appstats Configuration Example in the GAE SDK. Based on this file, it seems that currently it's not possible to capture requests based on their execution time.
Hope this helps.

Unfortunately, there's no built in mechanism for this.
You could add it yourself by monkeypatching the end_recording method, or the Recorder.save method.

Related

How Can You Determine When a Request Started on GAE Managed VM?

On Google App Engine, there are multiple ways a request can start: a web request, a cron job, a taskqueue, and probably others as well.
How could you (especially on Managed VM) determine the time when your current request began?
One solution is to instrument all of your entry points, and save the start time somewhere, but it would be nice if there was an environment variable or something that told when the request started. The reason this is important is because many GAE requests have deadlines (either 60 seconds or 10 minutes in various scenarios), and it's helpful to determine how much time you have left in a request when you are doing some additional work.
We don't specifically expose anything that lets you know how much time is left on the current request. You should be able to do this by recording the time at the entrypoint of a request, and storing it in a thread local static.
The need for this sounds... questionable. Why are you doing this? It may be a better idea to use a worker / queue pattern with polling for something that could take a long time.
You can see all this information in the logs in your Developer console. You can also add more data to the logs in your code, as necessary.
See Writing Application Logs.

Appengine responses becoming slower?

my ajax calls to AppEngine doing some very basic logic (and doing all the actual processing in the background, isolated from the frontend) tend to be at least 200% slower than they used to. Like taking 3 seconds instead of one out of a sudden since a week or so.
I am wondering if you guys had a similar experience or something changed in the meantime I am not aware of, quota wise maybe. I am using the free quota.
Thanks
Zac
To my knowledge there is no particular change going on, but we can't be sure. However slow response time can have multiple root causes.
If you have no traffic on your application then you might have zero instance running, therefore when you make your request there is the time for an instance to start up.
If you have a lot of traffic, depending on your configuration the request can take more time. You need to fine tune wether the request waits to be handled by an "overloaded" instance or if another instance should start.
If you use an API maybe there is something wrong with it.
I would suggest you enable appstats in your app, it will show you what takes time in your request: you will definitely see if this is something on your side or not.

Identify why Google app engine is slow

I developed an application for client that uses Play framework 1.x and runs on GAE. The app works great, but sometimes is crazy slow. It takes around 30 seconds to load simple page but sometimes it runs faster - no code change whatsoever.
Are there any way to identify why it's running slow? I tried to contact support but I couldnt find any telephone number or email. Also there is no response on official google group.
How would you approach this problem? Currently my customer is very angry because of slow loading time, but switching to other provider is last option at the moment.
Use GAE Appstats to profile your remote procedure calls. All of the RPCs are slow (Google Cloud Storage, Google Cloud SQL, ...), so if you can reduce the amount of RPCs or can use some caching datastructures, use them -> your application will be much faster. But you can see with appstats which parts are slow and if they need attention :) .
For example, I've created a Google Cloud Storage cache for my application and decreased execution time from 2 minutes to under 30 seconds. The RPCs are a bottleneck in the GAE.
Google does not usually provide a contact support for a lot of services. The issue described about google app engine slowness is probably caused by a cold start. Google app engine front-end instances sleep after about 15 minutes. You could write a cron job to ping instances every 14 minutes to keep the nodes up.
Combining some answers and adding a few things to check:
Debug using app stats. Look for "staircase" situations and RPC calls. Maybe something in your app is triggering RPC calls at certain points that don't happen in your logic all the time.
Tweak your instance settings. Add some permanent/resident instances and see if that makes a difference. If you are spinning up new instances, things will be slow, for probably around the time frame (30 seconds or more) you describe. It will seem random. It's not just how many instances, but what combinations of the sliders you are using (you can actually hurt yourself with too little/many).
Look at your app itself. Are you doing lots of memory allocations in the JVM? Allocating/freeing memory is inherently a slow operation and can cause freezes. Are you sure your freezing is not a JVM issue? Try replicating the problem locally and tweak the JVM xmx and xms settings and see if you find similar behavior. Also profile your application locally for memory/performance issues. You can cut down on allocations using pooling, DI containers, etc.
Are you running any sort of cron jobs/processing on your front-end servers? Try to move as much as you can to background tasks such as sending emails. The intervals may seem random, but it can be a result of things happening depending on your job settings. 9 am every day may not mean what you think depending on the cron/task options. A corollary - move things to back-end servers and pull queues.
It's tough to give you a good answer without more information. The best someone here can do is give you a starting point, which pretty much every answer here already has.
By making at least one instance permanent, you get a great improvement in the first use. It takes about 15 sec. to load the application in the instance, which is why you experience long request times, when nobody has been using the application for a while

State of Map-Reduce on Appengine?

There is appengine-mapreduce which seems the official way to do things on AppEngine. But there seems no documentation besides some hacked together Wiki Pages and lengthy videos. There are statements that the lib only supports the map step. But the source indicates that there are also implementations for shuffle.
A Version of this appengine-mapreduce library seems also to be included in the SDK but it not blessed for public use. So you basically are expected to load the library twice into your runtime.
Then there is appengine-pipeline. "A primary use-case of the API is connecting together various App Engine MapReduces into a computational pipeline." But there also seems pipeline-related code in the appengine-mapreduce library.
So where do I start to find out how this all fits together? Which is the library to call from my project. Is there any decent documentation on appengine-mapreduce besides parsing change logs?
Which is the library to call from my project.
They serve different purposes, and you've provided no details about what you're attempting to do.
The most fundamental layer here is the task queue, which lets you schedule background work that can be highly parallelized. This is fan-out. Let's say you had a list of 1000 websites, and you wanted to check the response time for each one and send an email for any site that takes more than 5 seconds to load. By running these as concurrent tasks, you can complete the work much faster than if you checked all 1000 sites in sequence.
Now let's say you don't want to send an email for every slow site, you just want to check all 1000 sites and send one summary email that says how many took more than 5 seconds and how many took fewer. This is fan-in. It's trickier with the task queue, because you need to know when all tasks have completed, and you need to collect and summarize their results.
Enter the Pipeline API. The Pipeline API abstracts the task queue to make fan-in easier. You write what looks like synchronous, procedural code, but uses Python futures and is executed (as much as possible) in parallel. The Pipeline API keeps track of task dependencies and collects results to facilitate building distributed workflows.
The MapReduce API wraps the Pipeline API to facilitate a specific type of distributed workflow: mapping the results of a piece of work into a set of key/value pairs, and reducing multiple sets of results to one by combining their values.
So they provide increasing layers of abstraction and convenience around a common system of distributed task execution. The right solution depends on what you're trying to accomplish.
There is offical documentation here: https://developers.google.com/appengine/docs/java/dataprocessing/

working with new channel creation limits

Google app engine seems to have recently made a huge decrease in free quotas for channel creation from 8640 to 100 per day. I would appreciate some suggestions for optimizing channel creation, for a hobby project where I am unwilling to use the paid plans.
It is specifically mentioned in the docs that there can be only one client per channel ID. It would help if there were a way around this, even if it were only for multiple clients on one computer (such as multiple tabs)
It occurred to me I might be able to simulate channel functionality by repeatedly sending XHR requests to the server to check for new messages, therefore bypassing limits. However, I fear this method might be too slow. Are there any existing libraries that work on this principle?
One Client per Channel
There's not an easy way around the one client per channel ID limitation, unfortunately. We actually allow two, but this is to handle the case where a user refreshes his page, not for actual fan-out.
That said, you could certainly implement your own workaround for this. One trick I've seen is to use cookies to communicate between browser tabs. Then you can elect one tab the "owner" of the channel and fan out data via cookies. See this question for info on how to implement the inter-tab communication: Javascript communication between browser tabs/windows
Polling vs. Channel
You could poll instead of using the Channel API if you're willing to accept some performance trade-offs. Channel API deliver speed is on the order of 100-200ms; if you could accept 500ms average then you could poll every second. Depending on the type of data you're sending, and how much you can fit in memcache, this might be a workable solution. My guess is your biggest problem is going to be instance-hours.
For example, if you have, say, 100 clients you'll be looking at 100qps. You should experiment and see if you can serve 100 requests in a second for the data you need to serve without spinning up a second instance. If not, keep increasing your latency (ie., decreasing your polling frequency) until you get to 1 instance able to serve your requests.
Hope that helps.

Resources