Configuring hystrix thread pool for a system with high RPS and 99%ile response time ~500ms - hystrix

As per hystrix docs thread pool should be configured using below formula :
ThreadPoolSize = requests per second at peak when healthy × 99th percentile latency in seconds + some breathing room
We have an external service whose 99%ile response time is ~500ms and expected RPS is 200. so with the formulae the thread pool size comes out to be 100. I don't think it is a good idea to configure thread pool with such a high number.
Please suggest what can be done in this scenario. What should be the ideal value for thread pool size and queue.

Related

Increasing Requests Per Seconds in Gatling

I'm trying to increase Requests Per Second in Gatling with fewer Users (Each User-created will send API requests in a loop). I achieved 300 RPS with 35 Users. However, even if I increase the users to 70 or 150, I cannot get a higher rps than 300. With increased user count, the RPS is more than 300 for the initial few seconds but later just hovers around 300.
I tried both atOnceUsers and rampUsers, but still couldn't achieve higher RPS.
Is there any way to increase RPS with fewer​ users?
You need to find out where your constraint is - look at the results coming back from the simulation and examine how the response time changes with the number of requests. You may be being throttled by your application under test

GCP Documentation - Task Queue bucket_size and rate

I read a lot of articles and answers here about Google Task, my doubt is "rate" and "bucket_size" behavior.
I read this documentation:
https://cloud.google.com/appengine/docs/standard/java/configyaml/queue
The snippet is:
Configuring the maximum number of concurrent requests
If using the default max_concurrent_requests settings are not
sufficient, you can change the settings for max_concurrent_requests,
as shown in the following example:
If your application queue has a rate of 20/s and a bucket size of 40,
tasks in that queue execute at a rate of 20/s and can burst up to 40/s
briefly. These settings work fine if task latency is relatively low;
however, if latency increases significantly, you'll end up processing
significantly more concurrent tasks. This extra processing load can
consume extra instances and slow down your application.
For example, let's assume that your normal task latency is 0.3
seconds. At this latency, you'll process at most around 40 tasks
simultaneously. But if your task latency increases to 5 seconds, you
could easily have over 100 tasks processing at once. This increase
forces your application to consume more instances to process the extra
tasks, potentially slowing down the entire application and interfering
with user requests.
You can avoid this possibility by setting max_concurrent_requests to a
lower value. For example, if you set max_concurrent_requests to 10,
our example queue maintains about 20 tasks/second when latency is 0.3
seconds. However, when the latency increases over 0.5 seconds, this
setting throttles the processing rate to ensure that no more than 10
tasks run simultaneously.
queue:
# Set the max number of concurrent requests to 50
- name: optimize-queue
rate: 20/s
bucket_size: 40
max_concurrent_requests: 10
I understood that queue works like this:
The bucket is the unit that determine amount of tasks that are execute.
The rate is amount of bucket are fill to execute per period.
max_concurrent_requests is the max simultaneously can be executed.
This snippet here maybe strange:
But if your task latency increases to 5 seconds, you could easily have
over 100 tasks processing at once. This increase forces your
application to consume more instances to process the extra tasks,
potentially slowing down the entire application and interfering with
user requests.
Imagine that max_concurrent_requests is not setted.
For me, it is impossible execute more than 100 tasks because the bucket_size is 40. For me, the low tasks would impact on time that tasks will be wait for a empty bucket.
Why the documentation said that tasks can have over 100?
if the bucket is 40, can more than 40 run simultaneously?
Edit
The bucket is fill up just the all tasks were executed or if some bucket is free in next rate will be increase?
Example:
40 buckets are executing.
1 bucket finished.
Imagine that each bucket spend more than 0.5 seconds and some bucket more than 1s.
When 1 bucket is free, this will fill up in next second or the bucket wait all tasks finishing before bucket fill up again?
Bucket size is defined more precisely in the doc you link, but one way to think of it is as a kind of initial burst limit.
Here's how I understand it would work, based on the parameters you provided in your question:
bucket_size: 40
rate: 20/s
max_concurrent_requests: 10
In the first second (t1) 40 tasks will start processing. At the same time 20 tokens (based on the rate) will be added to the bucket. Thus, at t2, 20 tasks will be primed for processing and another 20 tokens will be added to the bucket.
If there is no max_concurrent_setting, those 20 tasks would start processing. If max_concurrent_setting is 10, nothing will happen because more than 10 processes are already in use.
App Engine will continue to add tokens to the bucket at a rate of 20/s, but only if there is room in the bucket (bucket_size). Once there are 40 tokens in the bucket, it will stop until some of the running processes finish and there is more room.
After the initial burst of 40 tasks is finished, there should never be more than 10 tasks executing at a time.

Getting JMeter to work with Throughput Shaping timer and Concurrency Thread Group

I am trying to shape a JMeter test involving a Concurrency Thread Group and a Throughput Shaping Timer as documented here and here. the timer is configured to run ten ramps and stages with RPS from 1 to 333.
I want to set up the Concurrency Thread Group to use the schedule feedback function and added the formula in the Target concurrency field (I have updated the example from tst-name to the actual timer name). ramp-up time and steps I have set to 1 as I assume the properties are not that important if the throughput is managed by the timer; the Hold Target Rate time is 8000, which is longer than the steps added in the timer (6200).
When I run the test, it ends without any exceptions within 3 seconds or so. The log file shows a few rows about starting and ending threads but nothing alarming.
The only thing I find suspicious is the Log entry "VirtualUserController: Test limit reached, thread is done plus thread name.
I am not getting enough clues from the documentation linked here to figure this out myself, do you have any hints?
According to the documentation rampup time and steps should be blank:
When using this approach, leave Concurrency Thread Group Ramp Up Time and Ramp-Up Steps Count fields blank"
So your assumption that setting them to 1 is OK, seems false...

App Engine: Can I set a req/min rate instead of req/sec in queue.yaml?

Google has a 15000/min limit on the number of reads and writes. To stay under this limit, I calculated 15000/min == 250/sec, so my queue config is:
name: mapreduce-queue
rate: 200/s
max_concurrent_requests: 200
Can I directly set a rate of 15000/min in queue.yaml? I used 200/s because 15000/min == 250/sec adjusted for bursts. Also, I feel like I should not need the max_concurrent_requests limit at all?
Yes you can.
However, use 15000/m instead of 15000/min
From the docs
rate (push queues only)
How often tasks are processed on this queue. The value is a number
followed by a slash and a unit of time, where the unit is s for
seconds, m for minutes, h for hours, or d for days. For example, the
value 5/m says tasks will be processed at a rate of 5 times per
minute.
If the number is 0 (such as 0/s), the queue is considered "paused,"
and no tasks are processed.
and
max_concurrent_requests (push queues only)
Sets the maximum number of tasks that can be executed at any given
time in the specified queue. The value is an integer. By default, this
directive is unset and there is no limit on the maximum number of
concurrent tasks. One use of this directive is to prevent too many
tasks from running at once or to prevent datastore contention.
Restricting the maximum number of concurrent tasks gives you more
control over your queue's rate of execution. For example, you can
constrain the number of instances that are running the queue's tasks.
Limiting the number of concurrent requests in a given queue allows you
to make resources available for other queues or online processing.
It seems to me that for your situation, max_concurrent_requests is something you don't want to leave out

Persistent Connection on a web server HTTP1.1

I'm trying to write a web server in C under Linux using protocol HTTP1.1 .
I've used select for multiple requests and I'd like to implement persistent connections but it didn't work so far 'cause I can't set a timeout properly. How can I do it? I think about setsockopt function:
setsockopt(connsd, SOL_SOCKET, SO_RCVTIMEO, (char *)&tv, sizeof(tv))
where tv is a struct timeval. This isn't working either.
Any suggestions?
SO_RCVTIMEO will only work when you are actually reading data. select() won't honor it. select() takes a timeout parameter in its last argument. If you have a timer data structure to organize which connections should timeout in what order, then you can pass the soonest to timeout time to select(). If the return value is 0, then a timeout has occurred, and you should expire all timed out connections. After processing live connections (and re-setting their idle timeout in your timer data structure), you should again check to see if any connections should be timed out before calling select() again.
There are various data structures you can use, but popular ones include the timing wheel and timer heap.
A timing wheel is basically an array organized as a circular buffer, where each buffer position represents a time unit. If the wheel units is in seconds, you could construct a 300 element array to represent 5 minutes of time. There is a sticky index which represents the last time any timers were expired, and the current position would be the current time modulo the size of the array. To add a timeout, calculate the absolute time it needs to be timed out, modulo that by the size of the array, and add it to the list at that array position. All buckets between the last index and the current position whose time out has been reached need to be expired. After expiring the entries, the last index is updated to the current position. To calculate the time until the next expiration, the buckets are scanned starting from the current position to find a bucket with an entry that will expire.
A timer heap is basically a priority queue, where entries that expire sooner have higher priority than entries that expire later. The top of a non-empty heap determines the time to next expiration.
If your application is inserting a lots and lots of timers all the time, and then cancelling them all the time, then a wheel may be more appropriate, as inserting into the wheel and removing from the wheel is more efficient than inserting and removing from a priority queue.
The simplest solution is probably to keep a last-time-request-received for each connection, then regularly check that time and if it's too long ago then close the connection.

Resources