Akka Streams buffer on SubFlows based on parent Flow

Akka Streams buffer on SubFlows based on parent Flow - akka-stream

I am using akka-streams and I hit an exception because of maxing out the Http Pool on akka-http.
There is a Source of list-elements, which get split and thus transformed to SubFlows.
The SubFlows issue http requests. Although I put a buffer on the SubFlow, it seems the buffer takes effect per SubFlow.
Is there a way to have a buffer based on the Source that takes effect on the SubFlows?

My mistake was that I was merging the substreams without taking into consideration the parallelism by using
def mergeSubstreams(): Flow[In, Out, Mat]
From the documentation
This is identical in effect to mergeSubstreamsWithParallelism(Integer.MAX_VALUE).
Thus my workaround was to use
def mergeSubstreamsWithParallelism(parallelism: Int): Flow[In, Out, Mat]

Related

Flink: An abstraction that implements CheckpointListener w/o element processing

I'm new to Flink and am looking for a way to run some code once a checkpoint completes (supposedly by implementing CheckpointListener) without processing events (void processElement(StreamRecord<IN> element)). Currently, I have an operator MyOperator that runs my code within notifyCheckpointComplete function. However, I see a lot of traffic sent to that operator. The operator chain looks as follows:
input = KafkaStream
input -> IcebergSink
input -> MyOperator
I can't find how to register CheckpointListener in Flink execution environment. Is it possible?
Also, I have the following ideas:
map input stream elements to Void, Unit before sending to MyOperator
use Side Output without emitting data to side output. I'm wondering if notifyCheckpointComplete will be still called.

How do I request a scan on a Wi-Fi device using libnm?

The documentation suggests I use nm-device-wifi-request-scan-async, but I don't understand how to understand when it's has finished scanning. What's the correct second parameter I should pass, how is it constructed and what do I pass to nm-device-wifi-request-scan-finish?
I've tried using nm-device-wifi-get-last-scan and determining whether the scan has just happened or the last scan was a long time ago, but it doesn't seem to update the time of the scan - i.e., after requesting the scan and printing out the time between nm-utils-get-timestamp-msec and the last scan, it only increases and and decreases only if I restart the whole program, for some reason...
All I really need is to request a scan every x seconds and understand whether it has happened or not. The deprecated synchronous functions seem to have allowed this with callback functions, but I don't understand async :(

nm_device_wifi_request_scan_async() starts the scan, it will take a while until the result is ready.
You will know when the result is ready when the nm_device_wifi_get_last_scan() timestamp gets bumped.
it only increases and and decreases only if I restart the whole program, for some reason...
last-scan should be in clock_gettime(CLOCK_BOOTTIME) scale. That is supposed to never decrease (unless reboot). See man clock_gettime.
but I don't understand async
The sync method does not differ from the async method in this regard. They both only kick off a new scan, and when they compete, NetworkManager will be about to scan (or start shortly). The sync method is deprecated for other reasons (https://smcv.pseudorandom.co.uk/2008/11/nonblocking/).

Akka HTTP streaming API with cycles never completes

I'm building an application where I take a request from a user, call a REST API to get back some data, then based on that response, make another HTTP call and so on. Basically, I'm processing a tree of data where each node in the tree requires me to recursively call this API, like this:
A
/ \
B C
/ \ \
D E F
I'm using Akka HTTP with Akka Streams to build the application, so I'm using it's streaming API, like this:
val httpFlow = Http().cachedConnection(host = "localhost")
val flow = GraphDSL.create() { implicit builder =>
import GraphDSL.Implicits._
val merge = b.add(Merge[Data](2))
val bcast = b.add(Broadcast[ResponseData](2))
takeUserData ~> merge ~> createRequest ~> httpFlow ~> processResponse ~> bcast
merge <~ extractSubtree <~ bcast
FlowShape(takeUserData.in, bcast.out(1))
}
I understand that the best way to handle recursion in an Akka Streams application is to handle recursion outside of the stream, but since I'm recursively calling the HTTP flow to get each subtree of data, I wanted to make sure that the flow was properly backpressured in case the API becomes overloaded.
The problem is that this stream never completes. If I hook it up to a simple source like this:
val source = Source.single(data)
val sink = Sink.seq[ResponseData]
source.via(flow).runWith(sink)
It prints out that it's processing all the data in the tree and then stops printing anything, just idling forever.
I read the documentation about cycles and the suggestion was to put a MergePreferred in there, but that didn't seem to help. This question helped, but I don't understand why MergePreferred wouldn't stop the deadlock, since unlike their example, the elements are removed from the stream at each level of the tree.
Why doesn't MergePreferred avoid the deadlock, and is there another way of doing this?

MergePreferred (in the absence of eagerComplete being true) will complete when all the inputs have completed, which tends to generally be true of stages in Akka Streams (completion flows down from the start).
So that implies that the merge can't propagate completion until both the input and extractSubtree signal completion. extractSubtree won't signal completion (most likely, without knowing the stages in that flow) until bcast signals completion which (again most likely) won't happen until processResponse signals completion which* won't happen until httpFlow signals completion which* won't happen until createRequest signals completion, which* won't happen until merge signals completion. Because detecting this cycle in general is impossible (consider that there are stages for which completion is entirely dynamic), Akka Streams effectively takes the position that if you want to create a cycle like this, it's on you to determine how to break the cycle.
As you've noticed, eagerComplete being true changes this behavior, but since it will complete as soon as any input completes (which in this case will always be the input, thanks to the cycle) merge completes and cancels demand on extractSubtree (which by itself could (depending on whether the Broadcast has eagerCancel set) cause the downstream to cancel), which will likely result in at least some elements emitted by extractSubtree not getting processed.
If you're absolutely sure that the input completing means that the cycle will eventually dry up, you can use eagerComplete = false if you have some means to complete extractSubtree once the cycle is dry and the input has completed. A broad outline (without knowing what, specifically, is in extractSubtree) for going about this:
map everything coming into extractSubtree from bcast into a Some of the input
prematerialize a Source.actorRef to which you can send a None, save the ActorRef (which will be the materialized value of this source)
merge the input with that prematerialized source
when extracting the subtree, use a statefulMapConcat stage to track whether a) a None has been seen and b) how many subtrees are pending (initial value 1, add the number of (first generation) children of this node minus 1, i.e. no children subtracts 1); if a None has been seen and no subtrees are pending emit a List(None), otherwise emit a List of each subtree wrapped in a Some
have a takeWhile(_.isDefined), which will complete once it sees a None
if you have more complex things (e.g. side effects) in extractSubtrees, you'll have to figure out where to put them
before merging the outside input, pass it through a watchTermination stage, and in the future callback (on success) send a None to the ActorRef you got when prematerializing the Source.actorRef for extractSubtrees. Thus, when the input completes, watchTermination will fire successfully and effectively send a message to extractSubtrees to watch for when it's completed the inflight tree.

Camel aggregator does not aggregate all

I am splitting some java objects and then aggregating. I am kind of confused how this completion strategy works with Camel (2.15.2). I am using completion size and completion timeout. If I understand correctly, completion timeout does not have much effect in it. Because, there is not much waiting going on here.
Altogether, I have 3000+ objects. But, it seems only a part of it is aggregated. But, if I vary the completion size value, situation changes. If the size is 100, it aggregates around 800, and if it is 200, it aggregates up to around 1600. But, I don't know the size of objects in advance, and so, cannot rely on a assumed number.
Can anyone please explain to me what I am doing wrong here?
If I use eagerCheckCompletion, it aggregates the whole thing in a one go, which I don't want.
Below is my route:
from("direct:specializeddatavalidator")
.to("bean:headerFooterValidator").split(body())
.process(rFSStatusUpdater)
.process(dataValidator).choice()
.when(header("discrepencyList").isNotNull()).to("seda:errorlogger")
.otherwise().to("seda:liveupdater").end();
from("seda:liveupdater?concurrentConsumers=4&timeout=5000")
.aggregate(simple("${in.header.contentType}"),
batchAggregationStrategy())
.completionSize(MAX_RECORDS)
.completionTimeout(BATCH_TIME_OUT).to("bean:liveDataUpdater");
from("seda:errorlogger?concurrentConsumers=4")
.aggregate(simple("${in.header.contentType}"),
batchAggregationStrategy("discrepencyList"))
.completionSize(MAX_RECORDS_FOR_ERRORS)
.completionTimeout(BATCH_TIME_OUT)
.process(errorProcessor).to("bean:liveDataUpdater");

Weird, but if you want to aggregate all the splitted messages you can simply use
.split(body(), batchAggregationStrategy())
And depending on how you want it to work you can use
.shareUnitOfWork().stopOnException()
See http://camel.apache.org/splitter.html for more info

Only one write() call sends data over socket connection

First stackoverflow question! I've searched...I promise. I haven't found any answers to my predicament. I have...a severely aggravating problem to say the least. To make a very long story short, I am developing the infrastructure for a game where mobile applications (an Android app and an iOS app) communicate with a server using sockets to send data to a database. The back end server script (which I call BES, or Back End Server), is several thousand lines of code long. Essentially, it has a main method that accepts incoming connections to a socket and forks them off, and a method that reads the input from the socket and determines what to do with it. Most of the code lies in the methods that send and receive data from the database and sends it back to the mobile apps. All of them work fine, except for the newest method I have added. This method grabs a large amount of data from the database, encodes it as a JSON object, and sends it back to the mobile app, which also decodes it from the JSON object and does what it needs to do. My problem is that this data is very large, and most of the time does not make it across the socket in one data write. Thus, I added one additional data write into the socket that informs the app of the size of the JSON object it is about to receive. However, after this write happens, the next write sends empty data to the mobile app.
The odd thing is, when I remove this first write that sends the size of the JSON object, the actual sending of the JSON object works fine. It's just very unreliable and I have to hope that it sends it all in one read. To add more oddity to the situation, when I make the size of the data that the second write sends a huge number, the iOS app will read it properly, but it will have the data in the middle of an otherwise empty array.
What in the world is going on? Any insight is greatly appreciated! Below is just a basic snippet of my two write commands on the server side.
Keep in mind that EVERYWHERE else in this script the read's and write's work fine, but this is the only place where I do 2 write operations back to back.
The server script is on a Ubuntu server in native C using Berkeley sockets, and the iOS is using a wrapper class called AsyncSocket.
int n;
//outputMessage contains a string that tells the mobile app how long the next message
//(returnData) will be
n = write(sock, outputMessage, sizeof(outputMessage));
if(n < 0)
//error handling is here
//returnData is a JSON encoded string (well, char[] to be exact, this is native-C)
n = write(sock, returnData, sizeof(returnData));
if(n < 0)
//error handling is here
The mobile app makes two read calls, and gets outputMessage just fine, but returnData is always just a bunch of empty data, unless I overwrite sizeof(returnData) to some hugely large number, in which case, the iOS will receive the data in the middle of an otherwise empty data object (NSData object, to be exact). It may also be important to note that the method I use on the iOS side in my AsyncSocket class reads data up to the length that it receives from the first write call. So if I tell it to read, say 10000 bytes, it will create an NSData object of that size and use it as the buffer when reading from the socket.
Any help is greatly, GREATLY appreciated. Thanks in advance everyone!

It's just very unreliable and I have to hope that it sends it all in one read.
The key to successful programming with TCP is that there is no concept of a TCP "packet" or "block" of data at the application level. The application only sees a stream of bytes, with no boundaries. When you call write() on the sending end with some data, the TCP layer may choose to slice and dice your data in any way it sees fit, including coalescing multiple blocks together.
You might write 10 bytes two times and read 5 then 15 bytes. Or maybe your receiver will see 20 bytes all at once. What you cannot do is just "hope" that some chunks of bytes you send will arrive at the other end in the same chunks.
What might be happening in your specific situation is that the two back-to-back writes are being coalesced into one, and your reading logic simply can't handle that.

Thanks for all of the feedback! I incorporated everyone's answers into the solution. I created a method that writes to the socket an iovec struct using writev instead of write. The wrapper class I'm using on the iOS side, AsyncSocket (which is fantastic, by the way...check it out here -->AsyncSocket Google Code Repo ) handles receiving an iovec just fine, and behind the scenes apparently, as it does not require any additional effort on my part for it to read all of the data correctly. The AsyncSocket class does not call my delegate method didReadData now until it receives all of the data specified in the iovec struct.
Again, thank you all! This helped greatly. Literally overnight I got responses for an issue I've been up against for a week now. I look forward to becoming more involved in the stackoverflow community!
Sample code for solution:
//returnData is the JSON encoded string I am returning
//sock is my predefined socket descriptor
struct iovec iov[1];
int iovcnt = 0;
iov[0].iov_base = returnData;
iov[0].iov_len = strlen(returnData);
iovcnt = sizeof(iov) / sizeof(struct iovec);
n = writev(sock, iov, iovcnt)
if(n < 0)
//error handling here
while(n < iovcnt)
//rebuild iovec struct with remaining data from returnData (from position n to the end of the string)

You should really define a function write_complete that completely writes a buffer to a socket. Check the return value of write, it might also be a positive number, but smaller than the size of the buffer. In that case you need to write the remaining part of the buffer again.
Oh, and using sizeof is error-prone, too. In the above write_complete function you should therefore print the given size and compare it to what you expect.

Ideally on the server you want to write the header (the size) and the data atomically, I'd do that using the scatter/gather calls writev() also if there is any chance multiple threads can write to the same socket concurrently you may want to use a mutex in the write call.
writev() will also write all the data before returning (if you are using blocking I/O).
On the client you may have to have a state machine that reads the length of the buffer then sits in a loop reading until all the data has been received, as large buffers will be fragmented and come in in various sized blocks.