Akka stream timeout in long pipeline - akka-stream

If I have a very long akka-stream pipeline, is there a way to handle timeouts such that the timeout doesn't start until the first element gets to a given spot in a pipeline?
For example, let's say I have a pipeline in which it takes the first element 2+ minutes to reach the final sink, but after that, elements should come in every second or so. Is this something akka has taken into account? Or do I have to set timeouts on my graph shapes individually in this case?

Related

Does Flink's windowing operation process elements at the end of window or does it do a rolling processing?

I am having some trouble understanding the way windowing is implemented internally in Flink and could not find any article which explain this in depth. In my mind, there are two ways this can be done. Consider a simple window wordcount code as below
env.socketTextStream("localhost", 9999)
.flatMap(new Splitter())
.groupBy(0)
.window(Time.of(500, TimeUnit.SECONDS)).sum(1)
Method 1: Store all events for 500 seconds and at the end of the window, process all of them by applying the sum operation on the stored events.
Method 2: We use a counter to store a rolling sum for every window. As each event in a window comes, we do not store the individual events but keep adding 1 to previously stored counter and output the result at the end of the window.
Could someone kindly help to understand which of the above methods (or maybe a different approach) is used by Flink in reality. The reason is, there are pros and cons to both approach and is important to understand in order configure the resources for the cluster correctly.
eg: The Method 1 seems very close to batch processing and might potentially have issues related to spike in processing at every 500 sec interval while sitting idle otherwise etc while Method2 would need to maintain a common counter between all task managers.
sum is a reducing function as mentioned here(https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/operators/windows/#reducefunction). Internally, Flink will apply reduce function to each input element, and simply save the reduced result in ReduceState.
For other windows functions, like windows.apply(WindowFunction). There is no aggregation so all input elements will be saved in the ListState.
This document(https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/operators/windows/#window-functions) about windows stream mentions about how the internal elements are handled in Flink.

Flink Dashboard Throughput doesn't add up

I have two operators, a source and a map. The incoming throughput of of the map is stuck at just above 6K messages/s whereas the message count reaches the size of the whole stream (~ 350K) in under 20s (see duration). 350000/20 means that I have a throughput of at least 17500 and not 6000 as flink suggests! What's going on here?
as shown in the picture:
start time = 13:10:29
all messages are already read by = 13:10:46 (less than 20s)
I checked the flink library code and it seems that the numRecordsOutPerSecond statistic (as well as the rest similar ones) operate on a window. This means that they display average throughput but of the last X seconds. It's not the average throughput of the whole execution

How to calculate end-to-end delay in a multi-hop transmission in UnetStack

I have developed an energy aware routing protocol, now for performance evaluation I want to calculate end-to-end packet transmission delay when packets travel through a multi-hop link. I am unable to decide which timing information to consider whether to consider the simulation time available in log file (log-0.txt) or the modem's transmission time (txtime and rxtime). Please let me know the method to calculate end-to-end delay in UnetStack.
The simulation time (first column in the log files below, in milliseconds) is synchronized across all simulated nodes, and so you can use it to compute end-to-end delays if you log a START time at your source node, and END time at your destination node.
Example log file:
5673|INFO|org.arl.unet.sim.SimulationAgent/4#570:call|TxFrameNtf:INFORM[type:DATA txTime:2066947222]
6511|INFO|org.arl.unet.sim.SimulationAgent/3#567:call|TxFrameNtf:INFORM[type:DATA txTime:1157370743]
10919|INFO|org.arl.unet.sim.SimulationAgent/4#570:call|TxFrameNtf:INFORM[type:DATA txTime:2072193222
In this example, node 4 (SimulationAgent/4) transmits at time 5673. Node 3 (SimulationAgent/3) then transmits at time 6511. And so on...
The txTime and rxTime are in microseconds, but are local to each node. So they can be used to get time differences for events in the same node, but cannot directly be compared across nodes.

Getting JMeter to work with Throughput Shaping timer and Concurrency Thread Group

I am trying to shape a JMeter test involving a Concurrency Thread Group and a Throughput Shaping Timer as documented here and here. the timer is configured to run ten ramps and stages with RPS from 1 to 333.
I want to set up the Concurrency Thread Group to use the schedule feedback function and added the formula in the Target concurrency field (I have updated the example from tst-name to the actual timer name). ramp-up time and steps I have set to 1 as I assume the properties are not that important if the throughput is managed by the timer; the Hold Target Rate time is 8000, which is longer than the steps added in the timer (6200).
When I run the test, it ends without any exceptions within 3 seconds or so. The log file shows a few rows about starting and ending threads but nothing alarming.
The only thing I find suspicious is the Log entry "VirtualUserController: Test limit reached, thread is done plus thread name.
I am not getting enough clues from the documentation linked here to figure this out myself, do you have any hints?
According to the documentation rampup time and steps should be blank:
When using this approach, leave Concurrency Thread Group Ramp Up Time and Ramp-Up Steps Count fields blank"
So your assumption that setting them to 1 is OK, seems false...

Persistent Connection on a web server HTTP1.1

I'm trying to write a web server in C under Linux using protocol HTTP1.1 .
I've used select for multiple requests and I'd like to implement persistent connections but it didn't work so far 'cause I can't set a timeout properly. How can I do it? I think about setsockopt function:
setsockopt(connsd, SOL_SOCKET, SO_RCVTIMEO, (char *)&tv, sizeof(tv))
where tv is a struct timeval. This isn't working either.
Any suggestions?
SO_RCVTIMEO will only work when you are actually reading data. select() won't honor it. select() takes a timeout parameter in its last argument. If you have a timer data structure to organize which connections should timeout in what order, then you can pass the soonest to timeout time to select(). If the return value is 0, then a timeout has occurred, and you should expire all timed out connections. After processing live connections (and re-setting their idle timeout in your timer data structure), you should again check to see if any connections should be timed out before calling select() again.
There are various data structures you can use, but popular ones include the timing wheel and timer heap.
A timing wheel is basically an array organized as a circular buffer, where each buffer position represents a time unit. If the wheel units is in seconds, you could construct a 300 element array to represent 5 minutes of time. There is a sticky index which represents the last time any timers were expired, and the current position would be the current time modulo the size of the array. To add a timeout, calculate the absolute time it needs to be timed out, modulo that by the size of the array, and add it to the list at that array position. All buckets between the last index and the current position whose time out has been reached need to be expired. After expiring the entries, the last index is updated to the current position. To calculate the time until the next expiration, the buckets are scanned starting from the current position to find a bucket with an entry that will expire.
A timer heap is basically a priority queue, where entries that expire sooner have higher priority than entries that expire later. The top of a non-empty heap determines the time to next expiration.
If your application is inserting a lots and lots of timers all the time, and then cancelling them all the time, then a wheel may be more appropriate, as inserting into the wheel and removing from the wheel is more efficient than inserting and removing from a priority queue.
The simplest solution is probably to keep a last-time-request-received for each connection, then regularly check that time and if it's too long ago then close the connection.

Resources