How to send out messages with a defined rate in ZeroMQ? - c

I want to test how many subscribers I can connect to a publisher, which is sending out messages fast, but not with a maximum speed, e.g. every microsecond.
The reason is, if I send out messages with maximum speed, I miss messages at the receiver ( High-water-mark ).
I thought, I can use nanosleep(), and it works nice with 20 messages a second ( sleep: 50000000 [ns] ). But with a shorter sleeping time, it gets worse: 195 (5000000), 1700(500000), 16000 (50000) messages. And with even shorter sleeping times, I don't really get more messages. It seems that the sleep-function itself needs some time, I can see this, if I print out timestamps.
So, I think, it is the wrong way to run a function with a specific rate. But I didn't find a way to do that in another way.
Is there a possibility to send out roughly 1000000 messages a second?

Q: How to send out messages with a defined rate?
Given API is v.4.2.3+, one can use a { pgm:// | epgm:// }-transport class for this very purpose and setup the adequately tuned .setsockopt( ZMQ_RATE, <kbps> ) plus exhibit some additional performance related tweaking of buffer-sizing ( ZMQ_SNDBUF, ZMQ_IMMEDIATE, ZMQ_AFFINITY, ZMQ_TOS and ZMQ_MULTICAST_MAXTPDU ) with some priority-mapping e.t.c., so as to safely get as close to the hardware limits as needed.
Q: Is there a possibility to send out roughly 1,000,000 messages a second?
Well, given not more than about a 1000 [ns] per a message-to-wire dispatch latency, a carefull engineering is due to take place.
The best candidate for such rate would be to use the inproc:// transport class, as this does not rely on ZeroMQ's Context-instance IO-thread(s) performance / bottlenecks inside an external O/S scheduler ( and will definitely work faster than any other kind of available transport-classes ). Still, it depends, if it can meet less than the required 1000 [ns] latency, based on your application design and message sizes ( Zero-Copy being our friend here to better meet the latency deadline ).

Related

Flink Dashboard Throughput doesn't add up

I have two operators, a source and a map. The incoming throughput of of the map is stuck at just above 6K messages/s whereas the message count reaches the size of the whole stream (~ 350K) in under 20s (see duration). 350000/20 means that I have a throughput of at least 17500 and not 6000 as flink suggests! What's going on here?
as shown in the picture:
start time = 13:10:29
all messages are already read by = 13:10:46 (less than 20s)
I checked the flink library code and it seems that the numRecordsOutPerSecond statistic (as well as the rest similar ones) operate on a window. This means that they display average throughput but of the last X seconds. It's not the average throughput of the whole execution

Gatling: difference between Response Time Percentiles and Latency Percentiles over time

On my Gatling reports, I noticed that "Response Time Percentiles" and "Latency Percentiles over time" charts are quite identical. In which way are they different?
I saw this post, which makes me even more unsure:
Latency Percentiles over Time (OK) – same as Response Time Percentiles
over Time (OK), but showing the time needed for the server to process
the request, although it is incorrectly called latency. By definition
Latency + Process Time = Response time. So this graphic is supposed to
give the time needed for a request to reach the server. Checking
real-life graphics I think this graphic shows not the Latency, but the
real Process Time. You can get an idea of the real Latency by taking
one and the same second from Response Time Percentiles over Time (OK)
and subtract values from current graphs for the same second.
Thanks in advance for your help.
Latency basically tells how long it takes to receive the first packet for each page request throughout the duration of your load test. If you look at this chart in the Gatling documentation, the first spike is just before 21:30:20 on the x axis and tells you that 100% of the pages requested took longer than 1000 milliseconds to get the first packet from source to destination, but that number fell significantly after 21:30:20.

Getting JMeter to work with Throughput Shaping timer and Concurrency Thread Group

I am trying to shape a JMeter test involving a Concurrency Thread Group and a Throughput Shaping Timer as documented here and here. the timer is configured to run ten ramps and stages with RPS from 1 to 333.
I want to set up the Concurrency Thread Group to use the schedule feedback function and added the formula in the Target concurrency field (I have updated the example from tst-name to the actual timer name). ramp-up time and steps I have set to 1 as I assume the properties are not that important if the throughput is managed by the timer; the Hold Target Rate time is 8000, which is longer than the steps added in the timer (6200).
When I run the test, it ends without any exceptions within 3 seconds or so. The log file shows a few rows about starting and ending threads but nothing alarming.
The only thing I find suspicious is the Log entry "VirtualUserController: Test limit reached, thread is done plus thread name.
I am not getting enough clues from the documentation linked here to figure this out myself, do you have any hints?
According to the documentation rampup time and steps should be blank:
When using this approach, leave Concurrency Thread Group Ramp Up Time and Ramp-Up Steps Count fields blank"
So your assumption that setting them to 1 is OK, seems false...

Gatling simulation with fixed number of users for specific period of time

I have my Gatling scenario set up and now I want to configure a simulation with fixed number of users for specific period of time - number of users should initially be increased gradually to specific value and then kept there by adding new as required as users finish.
I specifically don't want to use constantUsersPerSec (which injects users at a constant rate) but something like .throttle(reachUsers(100) in rampUpTime, holdFor(10 minute)) which should inject users when required.
If it's still relevant: Gatling supports a throttle method pretty much as you outlined it. You can use the following building blocks (taken from the docs):
reachRps(target) in (duration): target a throughput with a ramp over a given duration.
jumpToRps(target): jump immediately to a given targeted throughput.
holdFor(duration): hold the current throughput for a given duration.
So a modified example for your use case could look something like this:
setUp(scn.inject(constantUsersPerSec(100) during(10 minutes))).throttle(
reachRps(100) in (1 minute),
holdFor(9 minute)
)

which size of chunk will yield to best performance using master-worker with MPI?

Im using MPI to parrlel a program that is trying to solve the Metric TSP problem. I have P processors , and N cities to pass .
Each thread asks for work from the master, recieves a chunk - which is a range of permutation that he should check and calculates the minimal among them. I am optimizing this by pruning bad routes in advance.
There are total (N-1)! routes to calculate. each worker get a chunk with a number that represnt the first route he has to check and the also the last. In addition the master sends him the most recent best result known , so can easly prone bad routes in advance with some lower bound on thier remains.
Each time a worker is finding result that is better that the global , he asyncrounsly sends it to the all other workers and to the master.
Im not looking for better solution- I'm just trying to determine which chunk size is the best.
The best chunk size i've found so far is (n!)/(n/2)! , but it doesnt yield so good result .
please help me understand which chunk size is the best here. I'm trying to balance between the amount of computation and communication
thanks
This depends heavily on factors beyond your control: MPI implementation, total load on the machine, etc. However, I'd hazard a guess that it also heavily depends on how many worker processes there are. On that note, understand that MPI spawns processes, not threads.
Ultimately, as is often the case with most optimization questions, the answer is simply "test a lot of different settings and see which one is best". You may want to do this manually, or write a tester app that implements some sort of heuristic (e.g. a genetic algorithm).

Resources