Gatling: Keep fixed number of users/requests at any instant

Gatling: Keep fixed number of users/requests at any instant - gatling

how can we keep fixed number of active concurrent users/requests at time for a scenario.
I have an unique testing problem where I am required to do the performance testing of services with fixed number of request at a given moment for a given time periods like 10 minutes or 30 minutes or 1 hour.
I am not looking for per second thing, what I am looking for is that we start with N number of requests and as any of request out of N requests completes we add one more so that at any given moment we have N concurrent requests only.
Things which I tried are rampUsers(100) over 10 seconds but what I see is sometimes there are more than 50 users at a given instance.
constantUsersPerSec(20) during (1 minute) also took the number of requests t0 50+ for sometime.
atOnceUsers(20) seems related but I don't see any way to keep it running for given number of seconds and adding more requests as previous ones completes.
Thankyou community in advance, expecting some direction from your side.

There is a throttling mechanism (https://gatling.io/docs/3.0/general/simulation_setup/#throttling) which allow you to set max number of requests, but you must remember that users are injected to simulation independently of that and you must inject enough users to produce that max number of request, without that you will end up with lower req/s. Also users that will be injected but won't be able to send request because of throttling will wait in queue for they turn. It may result in huge load just after throttle ends or may extend your simulation, so it is always better to have throttle time longer than injection time and add maxDuration() option to simulation setup.
You should also have in mind that throttled simulation is far from natural way how users behave. They never wait for other user to finish before opening page or making any action, so in real life you will always end up with variable number of requests per second.

Use the Closed Work Load Model injection supported by Gatling 3.0. In your case, to simulate and maintain 20 active users/requests for a minute, you can use an injection like,
Script.<Controller>.<Scenario>.inject(constantConcurrentUsers(20) during (60 seconds))

Related

Configuring a task queue and instance for non urgent work

I am using an F4 instance (because of memory needs) with automatic scheduling to do some background processing. It is run from a task queue. It takes 40s to 60s to complete each invocation. Because of the high memory needs, each instance should only handle one request at a time.
The action that needs to be done is not urgent. If it doesn't get scheduled for 30 minutes that isn't a problem. Even 60 minutes is acceptable and I'd rather make use of that time rather than spin up more instances. However, if the service gets popular and the is getting more than 60 requests an hour I want to spin up more instances to make sure there isn't more than a 60 minute wait.
I am having trouble figuring out how to configure the instance and queue parameters to keep my costs down but be able to scale in that way. My initial thought was something like this:
<queue>
<name>non-urgent-queue</name>
<target>slow-service</target>
<rate>1/m</rate>
<bucket-size>1</bucket-size>
<max-concurrent-requests>1</max-concurrent-requests>
</queue>
<automatic-scaling>
<min-idle-instances>0</min-idle-instances>
<max-idle-instances>0</max-idle-instances>
<min-pending-latency>20m</min-pending-latency>
<max-pending-latency>1h</max-pending-latency>
<max-concurrent-requests>1</max-concurrent-requests>
</automatic-scaling>
First of all those latency settings are invalid, but I can't find documentation on the valid range or units. Can anyone direct me to that info?
Secondly, if I understand the queue settings correctly, this configuration would limit it to 60 invocations an hour getting to the service, even if the task queue had 60+ jobs waiting.
Thanks for your help!

Indeed, throttling at the queue level basically defeats the ability to scale when needed. So you can't use the <rate> in the queue configuration at the values you have right now, you need to use the value matching the maximum rate you're willing to accept (with you max number of instances running simultaneously):
the max rate of requests that can go through the queue being limited at 1/min means you can't scale above 60/h
the <bucket-size> set at 1 means no peaks above the rate can be handled (as soon as one task starts the token bucket empties).
the <max-concurrent-requests> set at 1 will basically prevent multiple instances dealing simultaneouly with the queued workload. They may be started by the autoscaler because of the request latencies, but they won't be able to help since only one queue task can be handled at a time.
In the <automatic-scaling> section the <max-concurrent-requests> set to 1 is good - this ensures no instance handles more than 1 request at a time - which is what you want.
The bad news is that the max values for the latencies appear to be 15s. At least when using the app.yaml config for python (but I think it's unlikely for that to differ across language sandboxes):
Error 400: --- begin server output ---
automatic_scaling.min_pending_latency (30s), must be in the range [0.010000s,15.000000s].
--- end server output ---
and
Error 400: --- begin server output ---
automatic_scaling.max_pending_latency (60s), must be in the range [0.010000s,15.000000s].
--- end server output ---
Which probably also explains why your 5m and 1h values aren't accepted - I used 30s and 60s and got the above errors.
This means you won't be able to use the autoscaling parameters to tune such a slow-moving processing like you desire.
The only alternative I can think of is to have 2 queues:
a fast one feeding just trigger tasks for the slow-service jobs, but which your service intercepts and saves in the datastore. Maybe performed by some faster service (you don't want these stuck behind a slow-service job execution as it can cause unnecessary instance launching. Maybe, depending on the rest of your implementation, you can replace this queue completely with just storing the job info in the datastore instead of enqueing tasks in the fast queue.
a slow one for the actual slow-service job execution tasks
You'd also have a cron job executing once a minute, checking how many triggers are pending in the datastore, decide how much to scale and enqueue the corresponding number of slow-service job tasks in the slow queue. The autoscaler would simply bring up the corresponding number of instances (if needed). Low latency autoscaling configs would be desirable in this case - you already decided how you want your app to scale.

This is how I ended up doing it. I use a slow queue and a fast queue configured like this:
<queue>
<name>slow-queue</name>
<target>pdf-service</target>
<rate>2/m</rate>
<bucket-size>1</bucket-size>
<max-concurrent-requests>1</max-concurrent-requests>
</queue>
<queue>
<name>fast-queue</name>
<target>pdf-service</target>
<rate>10/m</rate>
<bucket-size>1</bucket-size>
<max-concurrent-requests>5</max-concurrent-requests>
</queue>
The max-concurrent-requests in the slow queue ensures only one task will run at a time, so there will only be one instance active.
Before I post to the slow queue I check to see how many items are already on the queue. The result may not be totally reliable, but for my purposes it is sufficient. In java:
QueueStatistics queueStats = queue.fetchStatistics();
if(queueStats.getNumTasks()<30) {
//post to slow queue
} else {
//post to fast queue
}
So when my slow queue gets too full, I post to the fast queue which allows concurrent requests.
The instance is configured like this:
<automatic-scaling>
<min-idle-instances>0</min-idle-instances>
<max-idle-instances>automatic</max-idle-instances>
<min-pending-latency>15s</min-pending-latency>
<max-pending-latency>15s</max-pending-latency>
<max-concurrent-requests>1</max-concurrent-requests>
</automatic-scaling>
So it will create new instances as slowly as possible (15s is the max latency) and make sure only one process runs on an instance at a time.
With this configuration I'll have a max of 6 instances at a time but that should do about 500/hr. I could increase the rate and concurrent requests to do more.
The negative of this solution is an element of unfairness. Under heavy load, some tasks will be stuck in the slow queue while others will get processed more quickly in the fast queue.
Because of that, I have decreased the max items on the slow queue to 13 so the unfairness won't be so extreme, maybe a 10 minute wait for jobs that go to the slow queue when it is full.

Does task queue truly run tasks in parallel?

We have an application that takes some input from a user and makes ~50 RPC calls. Each call takes around 4-5 minutes.
In the backend we are using a push queue and enqueuing each of these 50 calls as tasks. This is our queue spec:
queue:
- name: some-name
rate: 500/s
bucket_size: 100
max_concurrent_requests: 500
My understanding is that all 50 requests should be run in parallel, and thus all of them should be complete in 4-5 minutes. But what's actually happening is that only around ~15 of these requests are returning results, while the rest cross the 10 min limit and time out. Another thing to note is that this seems to work fine if we bring down the number of requests to < 10.
There's always the possibility that the requests that timed out did so because the RPC response actually took that long. But what I wanted to confirm is :
My understanding of the tasks running in parallel is correct.
Our queue config and the number of tasks we're enqueuing has nothing to do with these requests timing out.
Are these correct ?

(1) Parallel execution
Yes, tasks can be executed in parallel (up to 500 in your case), but in push queues, your app has no control in which particular order the tasks in a push queue are executed and no direct control how many tasks are executed at once. (Your app can control in which sequence tasks are added to a queue though, see the pattern in (2) below)
App Engine uses certain factors to decide how fast and which tasks are executed, especially the queue configuration and also the scaling configuration (e.g. in app.yaml). Since you pay for every first 15 minutes of an instance, it could get very expensive to really have 50 instances launched, then idling for 15 minutes before shutting them down (until the next request). In this regard, the mechanism that spawns new instances is a little smarter, whether it is HTTP requests by users or task queues.
(2) Request time outs
Yes, it is very unlikely that the enqueuing has anything to do with these request time outs. Unless the time-outs are an unintentional side-effect of the wrong assumption that a particular task was executed before.
In order to avoid request time outs in general, it makes sense to split a task into multiple tasks. For example, if you have a task do_foo and those executions exceed the time outs frequently (or memory limits), you could instead have do_foo load off work to other tasks that will do the actual jobs.
For some migration tasks I use this pattern in a linear / sequential way. E.g. classmethod do_foo just queries entities of a certain kind (ordered by creation timestamp for example), maybe filtered, by page (e.g. 50 in transactions with ancestor). It does some writes to the entities first, and only at the very end after successful commit it creates a new transactional do_foo task with cursor parameter to the next page, eventually with a countdown of 1 sec to avoid transaction errors. The next execution of do_foo will continue with the next page (of course only after the task with the previous page completed).
Depending on the nature of the tasks, you could alternatively have each task fan out into multiple tasks per execution, e.g. do_foo triggers do_bar, do_something and do_more. Also note that up to five tasks can be created transactionally inside a transaction.

Limit open connections at any one time, maintain rate of X users per second?

I'm writing a Gatling load test which simply bombards a given endpoint over HTTP for a given period of time. I have it gradually ramp up connections per second, and then hold it there for the duration of the test. My setup looks like this:
setUp(
scn.inject(
rampUsersPerSec(10 to 70 during(1 minute),
constantUsersPerSec(70) during(9 minutes)
).protocols(httpConf).throttle(jumpToRps(70) holdFor(10 minutes))
)
This works, but the problem is that our requests take a long time, sometimes much longer than a second.
What ends up happening is that the server slows down and requests start taking longer and longer, and instead of maintaining 70 connections to the server at a time, this quickly grows linearly and I'll have something like 1000 open connections at any given time.
Is there a way to "limit the pool" of Gatling users to maintain X open connections at a given time? I've so far been unsuccessful in trying to throttle it.

What you want is a closed injection model.
In order to do that with Gatling, you have to wrap your scenario content with a loop, and possible flush the HTTP caches and cookie jars. Search the doc.
Note that this model is nowhere realistic, except if your system indeed limit the number of users that it lets enter, with an upfront queue. Typical use case is a call center.

How strictly enforced is the Parse.com API call limit?

Suppose my parse.com api call limit is 30 api calls per second (the free tier). Suppose also that when opening an app I've created, I issue five api calls (1 call to the cloud code, three queries, and one save object).
Suppose 60 users happen to open the app at the same time. Would Parse begin rejecting some API calls?
The typical use case for my app would be 1 or maybe 2 api calls per second with 1000 active users. However, it is possible in some rare situations that I may issue 45 api calls per second. Is there a way around this without having to pay for a large number of API calls per second? It feels like I'm paying for cable TV (24 hours of 200 channels while I only see 2-3 channels 1 hour a day).

One of the Parse guys mentioned recently that they count calls per minute. So the limit is actually 30*60/min or 1,800/min.
This allows for short bursts of activity to not cause problems.
After the 1,800th call in a minute, all further calls will be rejected.

Is there a max number of requests an Asp.net mvc 3 application can handle and what happens it's exceeded?

I have an application where you can select entries in a table to be updated. You can select 50 and hit 'send it' and it will send all 50 as 50 individual ajax calls, each of which calls controller X that updates the database table Y. The user said they selected 25 but I see 12 in the logs and no trace of the other 13. I was wondering if maybe I hit some limit with the number of requests or the length of the queue. Any Ideas?
(Running this locally it had no problem with 50-100 being sent at once it just took like 2 seconds for all of them to finally callback).

This question cannot really be answered as there's no "hard limit" that would cause your ajax calls to be lost. However, overloading the server (if you only have one) can cause requests to be queued up and/or take a long time which in turn may cause requests to timeout (depending on your settings). Chances are if you are making 50 ajax calls instead of just one then there is some room for optimization there and I'm not sure that classifies as premature.
In the end, I think the most and best advice you will get is to profile and load test your code and hardware to be sure otherwise we're all just guessing.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight