Increasing Requests Per Seconds in Gatling - gatling

I'm trying to increase Requests Per Second in Gatling with fewer Users (Each User-created will send API requests in a loop). I achieved 300 RPS with 35 Users. However, even if I increase the users to 70 or 150, I cannot get a higher rps than 300. With increased user count, the RPS is more than 300 for the initial few seconds but later just hovers around 300.
I tried both atOnceUsers and rampUsers, but still couldn't achieve higher RPS.
Is there any way to increase RPS with fewer​ users?

You need to find out where your constraint is - look at the results coming back from the simulation and examine how the response time changes with the number of requests. You may be being throttled by your application under test

Related

In Gatling: How can I inject two scenarios concurrently while changing injection properties throughout the test?

My simulation runs two concurrent scenarios, one ramps up to 1000 users and iterates the scenario flow throughout the whole test duration
while the second one injects 500 users every hour for 10 minutes. I am doing this to simulate a peak every hour.
The simulation looks more or less as the following:
setUp(steadyStateScenario.inject(rampConcurrentUsers(1) to (numberOfTestUsers) during (Config.rampUpDuration minutes))
.protocols(httpconf),
peakScenario.inject(nothingFor(60 minutes), rampUsers(numberOfPeakTestUsers) during (Config.PeakRampUpDuration minutes))
.protocols(httpconf))
.maxDuration(Config.scenarioDuration hours)
.assertions(global.successfulRequests.percent.is(100))
My issue with the current simulation is that I am unable to randomise peak duration and number of peak users.
To overcome this I would like to execute peakScenario.inject every hour and change values in numberOfPeakTestUsers and Config.PeakRampUpDuration each time it
is executed. This should be done while steadyStateScenario is running at the background without interruption. Is that possible?
Cheers.

Batch Apex callout limits

does callout limitation depends on the number of times the execute method is invoked in a batch class?
I have read that it depends on the number of callouts per execute method, so we should use batch size of 1 if have to utilize the maximum of 100 callouts, but if we have 25000 records, and the batch size is 1, will it reach the maximum limit for callouts?
In batch job every start,execute and finish get fresh set of governor limits because they're separate transactions. So you get 100 callouts. You still must complete them under 120 seconds - but that's a different limit.
I'm not aware of any limit how many callouts you can make within 24 h so probably there isn't one. There's this limit though
The maximum number of asynchronous Apex method executions (batch Apex,
future methods, Queueable Apex, and scheduled Apex) per a 24-hour
period: 250,000 or the number of user licenses in your org multiplied
by 200, whichever is greater
You will have to balance it all out. If the callouts take 1 second (so you can do all 100) - feel free to set the batch size to 1 and really use 250K execute's. I can't imagine what functionality would require 100 webservice calls to process one record - but in theory you can.
If you need to process more than 250K records daily - well, increase the batch size but then you grand total of possible callouts goes down.

Gatling: difference between Response Time Percentiles and Latency Percentiles over time

On my Gatling reports, I noticed that "Response Time Percentiles" and "Latency Percentiles over time" charts are quite identical. In which way are they different?
I saw this post, which makes me even more unsure:
Latency Percentiles over Time (OK) – same as Response Time Percentiles
over Time (OK), but showing the time needed for the server to process
the request, although it is incorrectly called latency. By definition
Latency + Process Time = Response time. So this graphic is supposed to
give the time needed for a request to reach the server. Checking
real-life graphics I think this graphic shows not the Latency, but the
real Process Time. You can get an idea of the real Latency by taking
one and the same second from Response Time Percentiles over Time (OK)
and subtract values from current graphs for the same second.
Thanks in advance for your help.
Latency basically tells how long it takes to receive the first packet for each page request throughout the duration of your load test. If you look at this chart in the Gatling documentation, the first spike is just before 21:30:20 on the x axis and tells you that 100% of the pages requested took longer than 1000 milliseconds to get the first packet from source to destination, but that number fell significantly after 21:30:20.

GCP Documentation - Task Queue bucket_size and rate

I read a lot of articles and answers here about Google Task, my doubt is "rate" and "bucket_size" behavior.
I read this documentation:
https://cloud.google.com/appengine/docs/standard/java/configyaml/queue
The snippet is:
Configuring the maximum number of concurrent requests
If using the default max_concurrent_requests settings are not
sufficient, you can change the settings for max_concurrent_requests,
as shown in the following example:
If your application queue has a rate of 20/s and a bucket size of 40,
tasks in that queue execute at a rate of 20/s and can burst up to 40/s
briefly. These settings work fine if task latency is relatively low;
however, if latency increases significantly, you'll end up processing
significantly more concurrent tasks. This extra processing load can
consume extra instances and slow down your application.
For example, let's assume that your normal task latency is 0.3
seconds. At this latency, you'll process at most around 40 tasks
simultaneously. But if your task latency increases to 5 seconds, you
could easily have over 100 tasks processing at once. This increase
forces your application to consume more instances to process the extra
tasks, potentially slowing down the entire application and interfering
with user requests.
You can avoid this possibility by setting max_concurrent_requests to a
lower value. For example, if you set max_concurrent_requests to 10,
our example queue maintains about 20 tasks/second when latency is 0.3
seconds. However, when the latency increases over 0.5 seconds, this
setting throttles the processing rate to ensure that no more than 10
tasks run simultaneously.
queue:
# Set the max number of concurrent requests to 50
- name: optimize-queue
rate: 20/s
bucket_size: 40
max_concurrent_requests: 10
I understood that queue works like this:
The bucket is the unit that determine amount of tasks that are execute.
The rate is amount of bucket are fill to execute per period.
max_concurrent_requests is the max simultaneously can be executed.
This snippet here maybe strange:
But if your task latency increases to 5 seconds, you could easily have
over 100 tasks processing at once. This increase forces your
application to consume more instances to process the extra tasks,
potentially slowing down the entire application and interfering with
user requests.
Imagine that max_concurrent_requests is not setted.
For me, it is impossible execute more than 100 tasks because the bucket_size is 40. For me, the low tasks would impact on time that tasks will be wait for a empty bucket.
Why the documentation said that tasks can have over 100?
if the bucket is 40, can more than 40 run simultaneously?
Edit
The bucket is fill up just the all tasks were executed or if some bucket is free in next rate will be increase?
Example:
40 buckets are executing.
1 bucket finished.
Imagine that each bucket spend more than 0.5 seconds and some bucket more than 1s.
When 1 bucket is free, this will fill up in next second or the bucket wait all tasks finishing before bucket fill up again?
Bucket size is defined more precisely in the doc you link, but one way to think of it is as a kind of initial burst limit.
Here's how I understand it would work, based on the parameters you provided in your question:
bucket_size: 40
rate: 20/s
max_concurrent_requests: 10
In the first second (t1) 40 tasks will start processing. At the same time 20 tokens (based on the rate) will be added to the bucket. Thus, at t2, 20 tasks will be primed for processing and another 20 tokens will be added to the bucket.
If there is no max_concurrent_setting, those 20 tasks would start processing. If max_concurrent_setting is 10, nothing will happen because more than 10 processes are already in use.
App Engine will continue to add tokens to the bucket at a rate of 20/s, but only if there is room in the bucket (bucket_size). Once there are 40 tokens in the bucket, it will stop until some of the running processes finish and there is more room.
After the initial burst of 40 tasks is finished, there should never be more than 10 tasks executing at a time.

App Engine: Can I set a req/min rate instead of req/sec in queue.yaml?

Google has a 15000/min limit on the number of reads and writes. To stay under this limit, I calculated 15000/min == 250/sec, so my queue config is:
name: mapreduce-queue
rate: 200/s
max_concurrent_requests: 200
Can I directly set a rate of 15000/min in queue.yaml? I used 200/s because 15000/min == 250/sec adjusted for bursts. Also, I feel like I should not need the max_concurrent_requests limit at all?
Yes you can.
However, use 15000/m instead of 15000/min
From the docs
rate (push queues only)
How often tasks are processed on this queue. The value is a number
followed by a slash and a unit of time, where the unit is s for
seconds, m for minutes, h for hours, or d for days. For example, the
value 5/m says tasks will be processed at a rate of 5 times per
minute.
If the number is 0 (such as 0/s), the queue is considered "paused,"
and no tasks are processed.
and
max_concurrent_requests (push queues only)
Sets the maximum number of tasks that can be executed at any given
time in the specified queue. The value is an integer. By default, this
directive is unset and there is no limit on the maximum number of
concurrent tasks. One use of this directive is to prevent too many
tasks from running at once or to prevent datastore contention.
Restricting the maximum number of concurrent tasks gives you more
control over your queue's rate of execution. For example, you can
constrain the number of instances that are running the queue's tasks.
Limiting the number of concurrent requests in a given queue allows you
to make resources available for other queues or online processing.
It seems to me that for your situation, max_concurrent_requests is something you don't want to leave out

Resources