I've been reviewing our costs on appengine using appstats and I've seen that I have quite a few situations where TaskQueue Adds are taking 100ms (or more) to complete.
The documentation says they should take around 5ms
I realise 100ms isn't that long, but when the whole requests takes 600ms then 100ms is quite a big chunk of time.
Some of these tasks are using task_names. Is that what's slowing it down?
Related
I made an experiment with my Apache Flink streaming application trying out different sizes to the tumbling time window. I fed the application the same data each time and measured the time the application took to emit data after doing some calculations. As expected, a large time window took a longer time to emit the output than a smaller window, but only up to a point. When the window became smaller then say, 14ms, the overhead costs (I guess) of the computation made it so that the time to emit the output took longer than when using say, a 16ms window.
How would you explain this, specifically in an Apache Flink streaming application? What are the specifics of these overhead costs? The application is integrated with Kinesis Data Analytics.
The network memory buffer's default value is 100ms. It stores your records for at most 100ms before sending them, or if the buffer is full.
To me, measuring the performance of any window size under this value will have little meaning. I guess the overhead you mention might become dominant, since your system is also waiting for the 100 ms to be elapsed (I'm assuming you're not filling the buffers, which default to 32KiB; which would be 2MiB/s if they're filled every 15ms).
Try setting execution.buffer-timeout to 5 (ms) for optimizing latency, or to -1 for optimizing throughput, and reexecute your workload.
Since this is Kinesis Data Analytics, you might have to do it programatically:
env.setBufferTimeout(5);
env.generateSequence(1,10).map(new MyMapper()).setBufferTimeout(5);
I have an app engine application with some services being based on webapp2 framework and some service being based on endpoints-v2 framework.
The issue that i am facing over is that some time the OPTIONS request being sent from front end takes a huge amount of time get the response back which varies from 10 secs to 15 secs which is adding latency to my entire application. On digging down deeper into the issue and found the it is due to instance startup time that is costing us this much latency.
So my question is
Does starting up an instance takes this much of time ?
If not then how can i reduce my startup time for instances ?
How the instances start so that i can optimise those situations in my code?
Java instance takes a long time to spin up. You can hide the latency by configuring warmup request and min-idle-instances (see here) in your appengine-web.xml.
I develop an appengine application right now in python and I am surprised by the instance hours quotas I get, while I try to optimize my app for costs and performance.
I am testing right now one specific task_queue. (nothing else is running during that - before I start no instance is up)
the queue is configured with a rate/s of 100 with 100 buckets.
no configured limit for max_concurrent_requests
900 tasks will get pushed in this queue.
10-11 instances pop-up in this moment to deal with it.
everything takes far less than 30 seconds and every task is executed.
I check my instance hours quotas before and after that and I consume about 0.25 - 0.40 instance hours.
why is that?
shouldn't it be much less? is there an inital cost or a minimum amount which will be charged if one instance opens?
When an instance is opened it will cost you at least 15 minutes. Your 10-11 instances should cost you a total of around 2.5 hours.
If you don't need such a fast processing you should limit the amount of parallel processing of the queue using max_concurrent_requests.
I am pretty sure that the Scheduler will increase instance count when there is a backlog of tasks on a high-rate queue. 100/100 is a very high rate. You are telling the Scheduler to do these very quickly which means it fires up instances to do so.
Unless you need to process these tasks very quickly, you should use a much lower rate. This will result in fewer instances, and a longer queue of tasks. Depending on your processing requirements, you might be able to use a pull queue. Doing so allows you to lease and process hundreds of tasks at a time and take advantage of batch put()s etc. Really depends on what you are doing.
Today AppEngine went down for a while:
http://code.google.com/status/appengine/detail/serving/2012/10/26#ae-trust-detail-helloworld-get-latency
The result was that all requests were kept as pending, for some for as long as 24 minutes. Here is an excerpt from my server log. These requests are in general handled in less than 200 ms.
https://www.evernote.com/shard/s8/sh/ad3b58bf-9338-4cf7-aa35-a255d96aebbc/4b90815ba1c8cd2080b157a54d714ae0
My quota (8$ per day) was exploded in a matter of minutes when it previously was at around 2$ per day.
How can I prevent pending_ms to eat all my quota, even though my actual request is still responding very fast? I had the pending delay from 300 ms to Automatic. Does limiting the maximum to 10 seconds prevent that type of outbreak?
blackjack75,
You're right, raising the pending latency to something like 10 seconds will help reduce the number of instances started.
It looks like the long running requests tied up your instances. When this happens, app engine spins up new instances to handle the new requests, and of course instances cost money.
Lowering your min and max idle instances to smaller numbers should also help.
On your dashboard, you can look at your instance graph you to see how long the burst of instances was left idle after the request load was finished.
You can look at your typical usage to help estimate a safe max.
Lowering them can cause slowness when legitimate traffic needs to spin up a new instance, especially with bursty traffic, so you would want to adjust this to match your budget. For comparision, on a non-production appspot having the min and max set to 1 works fine.
Besides that, general techniques for reducing app engine resource usage will help. It sounds like you've gone through that already since your typical request time is low. Enabling concurrent requests could help here if your code will handle threads correctly (no globals, etc.) and your instances have enough free memory handle multiple requests.
How much of a compute-intensive gain can one expect on GAE MapReduce? The scenario of interest to me is compute intensive, so for example: multiplying a trillion random floats in a single threaded single core application. Then imagine 1000 MapReduce workers multiplying a billion random numbers each and announcing "finished" when all workers have finished. Assume billing is enabled if that matters. (It might not).
Edit: A commenter asked for clarification. The title has been revised. If the task takes 50000 seconds single threaded and in an alternative implementation 1000 MapReduce workers are employed and they finish after 500 seconds, then the performance gain is 100 times. 1000 workers: 100 times gain, only slightly disappointing, but so be it for this example. How can I get finished sooner? Can I ask for 10,000 workers? This question may have to do with limits and quotas. Assume an adequate budget. Does MapReduce's compute-intensive performance gain head to an asymptote and if so what is the performance gain at that asymptote? There was also information in the comment about MapReduce being suitable for large amounts of data generated by a user facing URL however, my question is not in regard to a Datastore-intensive application's performance versus the same application rewritten for MapReduce. Datastore activity will be minimal in this compute-intensive scenario. I realize there will always be some Datastore activity in any MapReduce application, but since this is a compute-intensive scenario, the Datastore activity and the size of the Datastore entities is not going to be a big influence on the performance gain calculated. The task will use the Datastore for less than 1% of the elapsed time. Nor is the scenario involving a large amount of communication bandwidth (other than the minimum necessary to hit the task queued URLs that MapReduce uses). The question is in regard to comparing a compute-intensive single threaded non-MapReduce task's elapsed time to the same task's elapsed time on MapReduce which is inherently multi-threaded given there are multiple workers. I use the word "task" generically, in other words, "task means work". The gain might (but not necessarily) be a function of the number of workers hence I mentioned 1000 workers in the example.
It's not clear exactly what you're asking here. Are you asking how efficient it is? How cheap it is? How fast it is?
In general, App Engine is designed for serving user-facing sites, and the App Engine mapreduce API exists to assist with that - processing large amounts of data generated by the user-facing site. If you have a large amount of data that's hosted outside App Engine, and you want to do some sort of large-scale data processing on it, App Engine is probably not the tool for you.
Regarding performance, you can expect each worker to execute tasks as fast as they would be if you were executing them serially, so your items-per-second is roughly the number of workers multiplied by the regular rate - there's relatively little overhead. There can be some delay at the end, though, when different workers finish at different times, and how much this is depends on how good a job mapreduce does of sharding your data. With datastore input, this used to be fairly poor, but it's a lot better now.
As to how many mappers you can have, that depends on a number of things: Whether or not your app has billing enabled, how much other traffic your app gets, and how long your mapper tasks take per element. The only real way to determine this is to experiment a bit.