Akka Streams - Creating a Flow which emits from a source when an input is received - akka-stream

I have a Source that provides elements of type A.
And I have a Flow that receives elements of type B.
What I would like to do, is when the flow receives an input, the next element from the source is emitted as the output of the flow.
The way I do this currently, is I have the source connected to a Sink.queue. Then for each element in the flow, I map over it, discarding the input, and pulling the next value from the queue. Once the queue is empty, I complete the flow.
I feel like there ought to be a simpler way that I'm missing. That there is probably some built in mechanism to allow a input to trigger an element from a source.
For example:
val source = ... some akka streams source
val queue = source.grouped(limit.toInt).runWith(Sink.queue[Seq[DataFrame]])
Flow[Message]
.prepend(Source.single(TextMessage.Strict("start")))
.collect {
case TextMessage.Strict(text) => Future.successful(text)
case TextMessage.Streamed(textStream) => textStream.runFold("")(_ + _).flatMap(Future.successful)
}
// swallow the future of the incoming message's text
.mapAsync(1)(identity)
// take the next batch
.mapAsync(1)(_ => queue.pull())
// swallow the option monad, and add in an end or page-end message
.collect {
case Some(batch) if batch.size == limit => batch.toList :+ pageend
case Some(batch) => batch.toList :+ end
case None => List(end)
}
// flatten out the frames
.mapConcat(identity)
end and pageend are just special frames that the ui uses. The key part of the question is around this use of a queue.

Related

For bounded data, how do I get Flink to "trigger" once flatmap has finished outputting all its data

I've explicitly set "batch mode" in Flink's StreamExecutionEnvironmen settings, as I'm working with bounded data.
The bounded data passes through a flatmap; and the flatmap is windowed using GlobalWindows. Since the data is bounded, there is a FINITE (though initially unknown) number of elements that will be outputted by the Collection.out() operations in the FlatMap. I'd like to trigger a Reduce() function. However, I can't figure out how to tell Flink the following: once the FlatMap has finished outputting all its elements, proceeed with the remainder of the code, eg do the reduce. (From the documentation, GlogalWindows always use the NeverTrigger, so I need to explicitly call a trigger I presume.) (Note: The CountTrigger won't work I believe, since I don't know apriori the number of elements that the flatmap will output.)
Bonus: Technically, the reduce operation can start as soon as the flatmap starts outputting its output. So, I'm not sure exactly how Flink works, but ideally, the reduce starts right away but only "completes" after the window closes....(And the window should close, in the case of bounded data, once the flatmap stops outputting the the output data.)
===
Edit #1:
Per #kkrugler, here's the skeleton code:
sosCleavedFeaturesEtc
.flatMap((Tuple4<Float2FloatAVLTreeMap, List<ImmutableFeatureV2>, List<ImmutableFeatureV2>, Integer> tuple4, Collector<Tuple4<Float2FloatAVLTreeMap, List<ImmutableFeatureV2>, Integer, Integer>> out) -> {
...
IntStream.range(0, numBlocksForClustering + 1)
.forEach(blockIdx -> out.collect(Tuple4.of(rtMapper, unmodifiableLstCleavedFeatures, diaWindowNum, blockIdx)));
})
.flatMap((Tuple4<Float2FloatAVLTreeMap, List<ImmutableFeatureV2>, Integer, Integer> tuple4, Collector<Tuple2<Float2FloatAVLTreeMap, Cluster>> out) -> {
...
setClusters
.stream()
.filter(cluster -> cluster.getClusterSize() >= minFeaturesInCluster)
.forEach(e -> out.collect(Tuple2.of(rtMapper, e)));
})
.map(tuple -> {
...
})
.filter(repFeature -> {
...
})
.windowAll(GlobalWindows.create())
...trigger??...
.aggregate(...});

How to detect a gap of certain duration in an Akka Stream?

I don't think it currently (Akka Stream 2.5.21) can be, and am interested in simple work-arounds, or people seeing this could be part of the Akka Stream library, itself.
My current work-around is:
/*
* Implement a flow that does the 'action' when there is a gap of minimum 'duration' in the stream.
*/
def onGap[T](duration: FiniteDuration, action: => Unit): Flow[T,T,NotUsed] = {
Flow[T]
.map(Some(_))
.keepAlive(duration, () => { action; None })
.collect{ case Some(x) => x }
}
What I'd like to see is akin to .keepAlive, but igniting only once per gap, and not injecting an entry to the stream.
Other approaches I considered:
.idleTimeout(duration) with Supervision.Decider, but this would need a separate ActorSystem to be created.
GraphStage but those always feel complex..
My use case for this is to analyse (guess) that consumption of a Kinesis stream has reached "current state" (historic values have been seen).
Would there be a better way?

Connecting SourceShape/PortOpts from a UniformFanOutShape to sources in Akka Streams

Most examples of FanOut shapes in Akka steams either merge the faned-out flows into a single stream, or immediately connect them to a sink. I want to instead return a sequence of Sources that I can then apply transformation and connect to a sinks later on. An example would look something like this
def transformedSources(src: Source[Int,NotUsed]) : Seq[Source[Int,NotUsed]] {
import GraphDSL.Implicits._
val builder = new GraphDSL.Builder
val bcast = builder.add(Broadcast[Int](2))
src.out ~> bcast.in
val sourceShapes = (bcast.out).map { out => SourceShape(out)}}
??? // How to convert sourceShapes into Sources
}
Is there a way to achieve this?
p.s
My real use case is an AmorphousShape that takes X sources, and produces X output. Ideally I would like to apply the AmorphousShape stage, and then continue operating as if nothing changes.
So if my code now is
sources.map{s => s.via(someStage).runWith(Sink.seq)}
I would like to transform it into
transformSource(sources).map{s => s.via(someStage).runWith(Sink.seq)}
def transformSource(sources:Seq[Source[Int, NotUsed]]) : Seq[Source[Int, NotUsed]] = {do some magic with an AmorphousShape graph }

Merging streams when _any_ of the substreams has a value ready

From Akka-stream documentation, it looks like that all stream merging options (merge, mergeSorted, mergePreferred, zipN, zipWithN) work by waiting when all merged streams have the new element ready, then applying the merge strategy (combining elements into a tuple, or applying zip function, etc.)
This works well for offline processing (e.g. reading the data from files or HTTP and combining it), but it introduces latency in online processing. I need to merge streams of data produced by e.g. multiple Websocket connection, and deliver updates in the merged stream as soon as any of the source streams produces a value. Example: if there are source streams A and B, here's what should be in the merged stream:
Output stream starts with some initial value, e.g. (None, None).
(A:1) (B:<not ready>) -> (Some(1), None)
(A:2) (B:<not ready>) -> (Some(2), None)
(A:3) (B:1) -> (Some(3), Some(1))
(A:3) (B:2) -> (Some(3), Some(2))
etc. Again, a new value appears in the output stream when any of the source stream produces a value, immediately.
Is there any combinator to achieve that?
As stated in the comments, Merge and MergePreferred stages do emit elements downstream even if not all upstreams have an element available.
From your example it looks like you are looking for zipping sources though. And yes, Zip emits the zipped tuple downstream only when it has elements to zip from all its upstreams. To overcome this you can 'lift' your sources to produce Options, and make them emit None whenever there is nothing else to emit. The source wrapper can look like this:
def asOption[In, Mat](source: Source[In, Mat]): Source[Option[In], Mat] =
Source.fromGraph(GraphDSL.create(source.map(Option(_))) {
implicit builder: GraphDSL.Builder[Mat] => src =>
import GraphDSL.Implicits._
val noneSource = Source.repeat(None)
val merge = builder.add(MergePreferred[Option[In]](1))
src ~> merge.preferred
noneSource ~> merge.in(0)
SourceShape(merge.out)
})
At this point you can zip your sources as you would normally.
val src1: Source[Int, NotUsed] = ???
val src2: Source[Int, NotUsed] = ???
val zipped = asOption(src1) zip asOption(src2)

How do I create a Flow with a different input and output types for use inside of a graph?

I am making a custom sink by building a graph on the inside. Here is a broad simplification of my code to demonstrate my question:
def mySink: Sink[Int, Unit] = Sink() { implicit builder =>
val entrance = builder.add(Flow[Int].buffer(500, OverflowStrategy.backpressure))
val toString = builder.add(Flow[Int, String, Unit].map(_.toString))
val printSink = builder.add(Sink.foreach(elem => println(elem)))
builder.addEdge(entrance.out, toString.in)
builder.addEdge(toString.out, printSink.in)
entrance.in
}
The problem I am having is that while it is valid to create a Flow with the same input/output types with only a single type argument and no value argument like: Flow[Int] (which is all over the documentation) it is not valid to only supply two type parameters and zero value parameters.
According to the reference documentation for the Flow object the apply method I am looking for is defined as
def apply[I, O]()(block: (Builder[Unit]) ⇒ (Inlet[I], Outlet[O])): Flow[I, O, Unit]
and says
Creates a Flow by passing a FlowGraph.Builder to the given create function.
The create function is expected to return a pair of Inlet and Outlet which correspond to the created Flows input and output ports.
It seems like I need to deal with another level of graph builders when I am trying to make what I think is a very simple flow. Is there an easier and more concise way to create a Flow that changes the type of it's input and output that doesn't require messing with it's inside ports? If this is the right way to approach this problem, what would a solution look like?
BONUS: Why is it easy to make a Flow that doesn't change the type of its input from it's output?
If you want to specify both the input and the output type of a flow, you indeed need to use the apply method you found in the documentation. Using it, though, is done pretty much exactly the same as you already did.
Flow[String, Message]() { implicit b =>
import FlowGraph.Implicits._
val reverseString = b.add(Flow[String].map[String] { msg => msg.reverse })
val mapStringToMsg = b.add(Flow[String].map[Message]( x => TextMessage.Strict(x)))
// connect the graph
reverseString ~> mapStringToMsg
// expose ports
(reverseString.inlet, mapStringToMsg.outlet)
}
Instead of just returning the inlet, you return a tuple, with the inlet and the outlet. This flow can now we used (for instance inside another builder, or directly with runWith) with a specific Source or Sink.

Resources