GCP Documentation - Task Queue bucket_size and rate - google-app-engine

I read a lot of articles and answers here about Google Task, my doubt is "rate" and "bucket_size" behavior.
I read this documentation:
https://cloud.google.com/appengine/docs/standard/java/configyaml/queue
The snippet is:
Configuring the maximum number of concurrent requests
If using the default max_concurrent_requests settings are not
sufficient, you can change the settings for max_concurrent_requests,
as shown in the following example:
If your application queue has a rate of 20/s and a bucket size of 40,
tasks in that queue execute at a rate of 20/s and can burst up to 40/s
briefly. These settings work fine if task latency is relatively low;
however, if latency increases significantly, you'll end up processing
significantly more concurrent tasks. This extra processing load can
consume extra instances and slow down your application.
For example, let's assume that your normal task latency is 0.3
seconds. At this latency, you'll process at most around 40 tasks
simultaneously. But if your task latency increases to 5 seconds, you
could easily have over 100 tasks processing at once. This increase
forces your application to consume more instances to process the extra
tasks, potentially slowing down the entire application and interfering
with user requests.
You can avoid this possibility by setting max_concurrent_requests to a
lower value. For example, if you set max_concurrent_requests to 10,
our example queue maintains about 20 tasks/second when latency is 0.3
seconds. However, when the latency increases over 0.5 seconds, this
setting throttles the processing rate to ensure that no more than 10
tasks run simultaneously.
queue:
# Set the max number of concurrent requests to 50
- name: optimize-queue
rate: 20/s
bucket_size: 40
max_concurrent_requests: 10
I understood that queue works like this:
The bucket is the unit that determine amount of tasks that are execute.
The rate is amount of bucket are fill to execute per period.
max_concurrent_requests is the max simultaneously can be executed.
This snippet here maybe strange:
But if your task latency increases to 5 seconds, you could easily have
over 100 tasks processing at once. This increase forces your
application to consume more instances to process the extra tasks,
potentially slowing down the entire application and interfering with
user requests.
Imagine that max_concurrent_requests is not setted.
For me, it is impossible execute more than 100 tasks because the bucket_size is 40. For me, the low tasks would impact on time that tasks will be wait for a empty bucket.
Why the documentation said that tasks can have over 100?
if the bucket is 40, can more than 40 run simultaneously?
Edit
The bucket is fill up just the all tasks were executed or if some bucket is free in next rate will be increase?
Example:
40 buckets are executing.
1 bucket finished.
Imagine that each bucket spend more than 0.5 seconds and some bucket more than 1s.
When 1 bucket is free, this will fill up in next second or the bucket wait all tasks finishing before bucket fill up again?

Bucket size is defined more precisely in the doc you link, but one way to think of it is as a kind of initial burst limit.
Here's how I understand it would work, based on the parameters you provided in your question:
bucket_size: 40
rate: 20/s
max_concurrent_requests: 10
In the first second (t1) 40 tasks will start processing. At the same time 20 tokens (based on the rate) will be added to the bucket. Thus, at t2, 20 tasks will be primed for processing and another 20 tokens will be added to the bucket.
If there is no max_concurrent_setting, those 20 tasks would start processing. If max_concurrent_setting is 10, nothing will happen because more than 10 processes are already in use.
App Engine will continue to add tokens to the bucket at a rate of 20/s, but only if there is room in the bucket (bucket_size). Once there are 40 tokens in the bucket, it will stop until some of the running processes finish and there is more room.
After the initial burst of 40 tasks is finished, there should never be more than 10 tasks executing at a time.

Related

In Gatling: How can I inject two scenarios concurrently while changing injection properties throughout the test?

My simulation runs two concurrent scenarios, one ramps up to 1000 users and iterates the scenario flow throughout the whole test duration
while the second one injects 500 users every hour for 10 minutes. I am doing this to simulate a peak every hour.
The simulation looks more or less as the following:
setUp(steadyStateScenario.inject(rampConcurrentUsers(1) to (numberOfTestUsers) during (Config.rampUpDuration minutes))
.protocols(httpconf),
peakScenario.inject(nothingFor(60 minutes), rampUsers(numberOfPeakTestUsers) during (Config.PeakRampUpDuration minutes))
.protocols(httpconf))
.maxDuration(Config.scenarioDuration hours)
.assertions(global.successfulRequests.percent.is(100))
My issue with the current simulation is that I am unable to randomise peak duration and number of peak users.
To overcome this I would like to execute peakScenario.inject every hour and change values in numberOfPeakTestUsers and Config.PeakRampUpDuration each time it
is executed. This should be done while steadyStateScenario is running at the background without interruption. Is that possible?
Cheers.

Batch Apex callout limits

does callout limitation depends on the number of times the execute method is invoked in a batch class?
I have read that it depends on the number of callouts per execute method, so we should use batch size of 1 if have to utilize the maximum of 100 callouts, but if we have 25000 records, and the batch size is 1, will it reach the maximum limit for callouts?
In batch job every start,execute and finish get fresh set of governor limits because they're separate transactions. So you get 100 callouts. You still must complete them under 120 seconds - but that's a different limit.
I'm not aware of any limit how many callouts you can make within 24 h so probably there isn't one. There's this limit though
The maximum number of asynchronous Apex method executions (batch Apex,
future methods, Queueable Apex, and scheduled Apex) per a 24-hour
period: 250,000 or the number of user licenses in your org multiplied
by 200, whichever is greater
You will have to balance it all out. If the callouts take 1 second (so you can do all 100) - feel free to set the batch size to 1 and really use 250K execute's. I can't imagine what functionality would require 100 webservice calls to process one record - but in theory you can.
If you need to process more than 250K records daily - well, increase the batch size but then you grand total of possible callouts goes down.

Increasing Requests Per Seconds in Gatling

I'm trying to increase Requests Per Second in Gatling with fewer Users (Each User-created will send API requests in a loop). I achieved 300 RPS with 35 Users. However, even if I increase the users to 70 or 150, I cannot get a higher rps than 300. With increased user count, the RPS is more than 300 for the initial few seconds but later just hovers around 300.
I tried both atOnceUsers and rampUsers, but still couldn't achieve higher RPS.
Is there any way to increase RPS with fewer​ users?
You need to find out where your constraint is - look at the results coming back from the simulation and examine how the response time changes with the number of requests. You may be being throttled by your application under test

How to compute the percentage of failed tasks in Google AppEngine?

I am using push task queue from GAE (python). There are times when after X minutes, Y% of the tasks have failed.
For this situation, I want to purge the task queue (there is no need to execute them, eventually many will fail).
I can configure for a task to stop executing if it retries more than 2 times, but if I have 100 tasks that failed (300 runs = 100 + 200 retries) how can I stop the remaining tasks to execute?
queue.yaml:
queue:
- name: my-queue
mode: push
rate: 1/s
bucket_size: 10
max_concurrent_requests: 10
retry_parameters:
task_retry_limit: 2
I would store some values in memcache, like number of tasks in queue and timestamps of tasks that failed.
Each task would need to perform these tasks:
on start, calculate the rolling percentage (within last X minutes) and exit if the percentage is too high, or increment the number of tasks counter and proceed.
on failure,
decrement the number of tasks counter and add a timestamp to the
failed tasks list.
on success, decrement the number of tasks
counter.
To calculate the rolling percentage, take the entire list of failed tasks and filter out those timestamps that are too old (over X minutes ago). Put the new list back into memcache. Then, take the number of tasks counter and calculate 100.0 * (number of failed tasks) / (number of tasks) to get your percentage. If it exceeds the Y% threshold, exit your task immediately.

App Engine: Can I set a req/min rate instead of req/sec in queue.yaml?

Google has a 15000/min limit on the number of reads and writes. To stay under this limit, I calculated 15000/min == 250/sec, so my queue config is:
name: mapreduce-queue
rate: 200/s
max_concurrent_requests: 200
Can I directly set a rate of 15000/min in queue.yaml? I used 200/s because 15000/min == 250/sec adjusted for bursts. Also, I feel like I should not need the max_concurrent_requests limit at all?
Yes you can.
However, use 15000/m instead of 15000/min
From the docs
rate (push queues only)
How often tasks are processed on this queue. The value is a number
followed by a slash and a unit of time, where the unit is s for
seconds, m for minutes, h for hours, or d for days. For example, the
value 5/m says tasks will be processed at a rate of 5 times per
minute.
If the number is 0 (such as 0/s), the queue is considered "paused,"
and no tasks are processed.
and
max_concurrent_requests (push queues only)
Sets the maximum number of tasks that can be executed at any given
time in the specified queue. The value is an integer. By default, this
directive is unset and there is no limit on the maximum number of
concurrent tasks. One use of this directive is to prevent too many
tasks from running at once or to prevent datastore contention.
Restricting the maximum number of concurrent tasks gives you more
control over your queue's rate of execution. For example, you can
constrain the number of instances that are running the queue's tasks.
Limiting the number of concurrent requests in a given queue allows you
to make resources available for other queues or online processing.
It seems to me that for your situation, max_concurrent_requests is something you don't want to leave out

Resources