JMeter Constant Throughput Timer per thread? - timer

I am creating a JMeter test with 4 threads. Each thread has multiple Transaction controllers and between 45 and 98 HTTP requests.
Now the problem I have is that I want to add a constant throughput timer that will make sure that x amount of threads are done per minute. At the moment I need to specify the exact amount of HTTP requests and then it will do the thread in 1 minute. However I want it to start thread 1. Do it as fast as possible, then start it again on minute 2, and so on and so forth.
Can this be done with the Constant Throughput Timer or do I need to use an alternative?
Thanks in advance.
-Blacksoil

Constant Throughput Timer will be enough for your scenario. Despite word "Constant" in its name it doesn't have to be constant, you can put a JMeter Property in the "Throughput" input field and change it on the fly. See How to use JMeter's Throughput Constant Timer guide for more details.
By the way, there is an extended version of the CTT - Throughput Shaping Timer which provides more flexible throughput control options and even visual representation of the throughput on test scenario timeline:

Related

JMeter: how different type of timers can affect each others

I need to create a load test for a certain number of requests in a given time. I could successfully setup Precise Throughput Timer and I believe I understand how it works. What I don't understand is how other timers, specifically Gaussian Random Timer would affect it.
I have run my test plan with and without Gaussian Random Timer but I don't see that much of difference in the results. I'm wondering whether adding Gaussian Random Timer would help me to better simulate my users behavior?
I would say that these timers are mutually exclusive
Precise Throughput Timer allows you to reach and maintain the desired throughput (number of requests per given amount of time)
Gaussian Random Timer - allows you to simulate "think time"
If your goal is to mimic real users behavior as close as possible - go for the Gaussian Random Timer because real users don't hammer the application under test non-stop, they need some time to "think" between operations, i.e. locate the button and move the mouse pointer there, read something, type something, etc. So if your test assumes simulating real users using real browsers - go for Gaussian Random Timer and put realistic think times between operations. If you need your test to produce certain amount of hits per second - just increase the number of threads (virtual users) accordingly. Check out What is the Relationship Between Users and Hits Per Second? for comprehensive explanation if needed.
On the other hand Precise Thorughput Timer is handy when there are no "real users", for example you're testing an API or a database or a message queue and need to send a specific number of requests per second.

Gatling: Keep fixed number of users/requests at any instant

how can we keep fixed number of active concurrent users/requests at time for a scenario.
I have an unique testing problem where I am required to do the performance testing of services with fixed number of request at a given moment for a given time periods like 10 minutes or 30 minutes or 1 hour.
I am not looking for per second thing, what I am looking for is that we start with N number of requests and as any of request out of N requests completes we add one more so that at any given moment we have N concurrent requests only.
Things which I tried are rampUsers(100) over 10 seconds but what I see is sometimes there are more than 50 users at a given instance.
constantUsersPerSec(20) during (1 minute) also took the number of requests t0 50+ for sometime.
atOnceUsers(20) seems related but I don't see any way to keep it running for given number of seconds and adding more requests as previous ones completes.
Thankyou community in advance, expecting some direction from your side.
There is a throttling mechanism (https://gatling.io/docs/3.0/general/simulation_setup/#throttling) which allow you to set max number of requests, but you must remember that users are injected to simulation independently of that and you must inject enough users to produce that max number of request, without that you will end up with lower req/s. Also users that will be injected but won't be able to send request because of throttling will wait in queue for they turn. It may result in huge load just after throttle ends or may extend your simulation, so it is always better to have throttle time longer than injection time and add maxDuration() option to simulation setup.
You should also have in mind that throttled simulation is far from natural way how users behave. They never wait for other user to finish before opening page or making any action, so in real life you will always end up with variable number of requests per second.
Use the Closed Work Load Model injection supported by Gatling 3.0. In your case, to simulate and maintain 20 active users/requests for a minute, you can use an injection like,
Script.<Controller>.<Scenario>.inject(constantConcurrentUsers(20) during (60 seconds))

Configuring a task queue and instance for non urgent work

I am using an F4 instance (because of memory needs) with automatic scheduling to do some background processing. It is run from a task queue. It takes 40s to 60s to complete each invocation. Because of the high memory needs, each instance should only handle one request at a time.
The action that needs to be done is not urgent. If it doesn't get scheduled for 30 minutes that isn't a problem. Even 60 minutes is acceptable and I'd rather make use of that time rather than spin up more instances. However, if the service gets popular and the is getting more than 60 requests an hour I want to spin up more instances to make sure there isn't more than a 60 minute wait.
I am having trouble figuring out how to configure the instance and queue parameters to keep my costs down but be able to scale in that way. My initial thought was something like this:
<queue>
<name>non-urgent-queue</name>
<target>slow-service</target>
<rate>1/m</rate>
<bucket-size>1</bucket-size>
<max-concurrent-requests>1</max-concurrent-requests>
</queue>
<automatic-scaling>
<min-idle-instances>0</min-idle-instances>
<max-idle-instances>0</max-idle-instances>
<min-pending-latency>20m</min-pending-latency>
<max-pending-latency>1h</max-pending-latency>
<max-concurrent-requests>1</max-concurrent-requests>
</automatic-scaling>
First of all those latency settings are invalid, but I can't find documentation on the valid range or units. Can anyone direct me to that info?
Secondly, if I understand the queue settings correctly, this configuration would limit it to 60 invocations an hour getting to the service, even if the task queue had 60+ jobs waiting.
Thanks for your help!
Indeed, throttling at the queue level basically defeats the ability to scale when needed. So you can't use the <rate> in the queue configuration at the values you have right now, you need to use the value matching the maximum rate you're willing to accept (with you max number of instances running simultaneously):
the max rate of requests that can go through the queue being limited at 1/min means you can't scale above 60/h
the <bucket-size> set at 1 means no peaks above the rate can be handled (as soon as one task starts the token bucket empties).
the <max-concurrent-requests> set at 1 will basically prevent multiple instances dealing simultaneouly with the queued workload. They may be started by the autoscaler because of the request latencies, but they won't be able to help since only one queue task can be handled at a time.
In the <automatic-scaling> section the <max-concurrent-requests> set to 1 is good - this ensures no instance handles more than 1 request at a time - which is what you want.
The bad news is that the max values for the latencies appear to be 15s. At least when using the app.yaml config for python (but I think it's unlikely for that to differ across language sandboxes):
Error 400: --- begin server output ---
automatic_scaling.min_pending_latency (30s), must be in the range [0.010000s,15.000000s].
--- end server output ---
and
Error 400: --- begin server output ---
automatic_scaling.max_pending_latency (60s), must be in the range [0.010000s,15.000000s].
--- end server output ---
Which probably also explains why your 5m and 1h values aren't accepted - I used 30s and 60s and got the above errors.
This means you won't be able to use the autoscaling parameters to tune such a slow-moving processing like you desire.
The only alternative I can think of is to have 2 queues:
a fast one feeding just trigger tasks for the slow-service jobs, but which your service intercepts and saves in the datastore. Maybe performed by some faster service (you don't want these stuck behind a slow-service job execution as it can cause unnecessary instance launching. Maybe, depending on the rest of your implementation, you can replace this queue completely with just storing the job info in the datastore instead of enqueing tasks in the fast queue.
a slow one for the actual slow-service job execution tasks
You'd also have a cron job executing once a minute, checking how many triggers are pending in the datastore, decide how much to scale and enqueue the corresponding number of slow-service job tasks in the slow queue. The autoscaler would simply bring up the corresponding number of instances (if needed). Low latency autoscaling configs would be desirable in this case - you already decided how you want your app to scale.
This is how I ended up doing it. I use a slow queue and a fast queue configured like this:
<queue>
<name>slow-queue</name>
<target>pdf-service</target>
<rate>2/m</rate>
<bucket-size>1</bucket-size>
<max-concurrent-requests>1</max-concurrent-requests>
</queue>
<queue>
<name>fast-queue</name>
<target>pdf-service</target>
<rate>10/m</rate>
<bucket-size>1</bucket-size>
<max-concurrent-requests>5</max-concurrent-requests>
</queue>
The max-concurrent-requests in the slow queue ensures only one task will run at a time, so there will only be one instance active.
Before I post to the slow queue I check to see how many items are already on the queue. The result may not be totally reliable, but for my purposes it is sufficient. In java:
QueueStatistics queueStats = queue.fetchStatistics();
if(queueStats.getNumTasks()<30) {
//post to slow queue
} else {
//post to fast queue
}
So when my slow queue gets too full, I post to the fast queue which allows concurrent requests.
The instance is configured like this:
<automatic-scaling>
<min-idle-instances>0</min-idle-instances>
<max-idle-instances>automatic</max-idle-instances>
<min-pending-latency>15s</min-pending-latency>
<max-pending-latency>15s</max-pending-latency>
<max-concurrent-requests>1</max-concurrent-requests>
</automatic-scaling>
So it will create new instances as slowly as possible (15s is the max latency) and make sure only one process runs on an instance at a time.
With this configuration I'll have a max of 6 instances at a time but that should do about 500/hr. I could increase the rate and concurrent requests to do more.
The negative of this solution is an element of unfairness. Under heavy load, some tasks will be stuck in the slow queue while others will get processed more quickly in the fast queue.
Because of that, I have decreased the max items on the slow queue to 13 so the unfairness won't be so extreme, maybe a 10 minute wait for jobs that go to the slow queue when it is full.

Library design methodology

I want to make the "TRAP AGENT" library. The trap agent library keeps the tracks of the various parameter of the client system. If the parameter of the client system changes above threshold then trap agent library at client side notifies to the server about that parameter. For example, if CPU usage exceeds beyond threshold then it will notify the server that CPU usage is exceeded. I have to measure 50-100 parameters (like memory usage, network usage etc.) at client side.
Now I have the basic idea about the design, but I am stuck with the entire library design.
I have thought of below solutions:
I can create a thread for each parameter (i.e. each thread will monitor single parameter).
I can create a process for each parameter (i.e. each process will monitor single parameter).
I can classify the various parameters into the various groups, like data usage parameter will fall into network group, CPU memory usage parameter will fall into the system group, and then will create thread for each group.
Now 1st solution is looking good as compare to 2nd. If I am adopting 1st solution then it may fail when I want to upgrade my library for 100 to 1000 parameters. Because I have to create 1000 threads at that time, which is not good design (I think so; if I am wrong correct me.)
3rd solution is good, but response time will be high since many parameters will be monitored in single thread.
Is there any better approach?
In general, it's a bad idea to spawn threads 1-to-1 for any logical mapping in your code. You can quickly exhaust the available threads of the system.
In .NET this is very elegantly handled using thread pools:
Thread vs ThreadPool
Here is a C++ discussion, but the concept is the same:
Thread pooling in C++11
Processes are also high overhead on Windows. Both designs sound like they would ironically be quite taxing on the very resources you are trying to monitor.
Threads (and processes) give you parallelism where you need it. For example, letting the GUI be responsive while some background task is running. But if you are just monitoring in the background and reporting to a server, why require so much parallelism?
You could just run each check, one after the other, in a tight event loop in one single thread. If you are worried about not sampling the values as often, I'd say that's actually a benefit. It does no help to consume 50% CPU to monitor your CPU. If you are spot-checking values once every few seconds that is probably fine resolution.
In fact high resolution is of no help if you are reporting to a server. You don't want to denial-of-service-attack your server by doing a HTTP call to it multiple times a second once some value triggers.
NOTE: this doesn't mean you can't have a pluggable architecture. You could create some base class that represents checking a resource and then create subclasses for each specific type. Your event loop could iterate over an array or list of objects, calling each one successively and aggregating the results. At the end of the loop you report back to the server if any are out of range.
You may want to add logic to stop checking (or at least stop reporting back to the server) for some "cool down period" once a trap hits. You don't want to tax your server or spam your logs.
You can follow below methodology:
1.You can have two threads one thread is dedicated to measure emergency parameter and second thread monitors non emergency parameter.
hence response time for emergency parameter will be less.
2.You can define 3 threads.First thread will monitor the high priority(emergency parameter).Second thread will monitor the intermediate priority parameter. and last thread will monitor lowest priority parameter.
So overall response time will be improved as compared to first solution.
3.If response time is not concern then you can monitor all the parameters in single thread.But in this case response time becomes worst when you upgrade your library to monitor 100 to 1000 parameters.
So in 1st case there will be more response time for non emergency parameter.While in 3rd case there will be definitely very high response time.
So solution 2 is better.

Limit open connections at any one time, maintain rate of X users per second?

I'm writing a Gatling load test which simply bombards a given endpoint over HTTP for a given period of time. I have it gradually ramp up connections per second, and then hold it there for the duration of the test. My setup looks like this:
setUp(
scn.inject(
rampUsersPerSec(10 to 70 during(1 minute),
constantUsersPerSec(70) during(9 minutes)
).protocols(httpConf).throttle(jumpToRps(70) holdFor(10 minutes))
)
This works, but the problem is that our requests take a long time, sometimes much longer than a second.
What ends up happening is that the server slows down and requests start taking longer and longer, and instead of maintaining 70 connections to the server at a time, this quickly grows linearly and I'll have something like 1000 open connections at any given time.
Is there a way to "limit the pool" of Gatling users to maintain X open connections at a given time? I've so far been unsuccessful in trying to throttle it.
What you want is a closed injection model.
In order to do that with Gatling, you have to wrap your scenario content with a loop, and possible flush the HTTP caches and cookie jars. Search the doc.
Note that this model is nowhere realistic, except if your system indeed limit the number of users that it lets enter, with an upfront queue. Typical use case is a call center.

Resources