Apache Flink Watermark behaviour for TwoInputStreamOperator operator - apache-flink

There are 2 data streams with timestamp assigned and watermark generator defined as followed.
val streamA: DataStream[A] = kafkaStreamASourceOutput.assignTimestampsAndWatermarks(
WatermarkStrategy
.forBoundedOutOfOrderness[A](Duration.ofSeconds(0))
.withTimestampAssigner(new SerializableTimestampAssigner[A] {
override def extractTimestamp(element: A, recordTimestamp: Long): Long = {
element.lastUpdatedMs
}
})
)
val streamA: DataStream[B] = kafkaStreamBSourceOutput.assignTimestampsAndWatermarks(
WatermarkStrategy
.forBoundedOutOfOrderness[B](Duration.ofSeconds(0))
.withTimestampAssigner(new SerializableTimestampAssigner[B] {
override def extractTimestamp(element: B, recordTimestamp: Long): Long = {
element.lastUpdatedMs
}
})
)
When these 2 streams are connected in an operator then the minimum watermark from streamA or streamB acts as the watermark of connecting operator.
class CombineAB extends CoProcessFunction[A, B, C] {
override def processElement1(elem: A, ctx:Context, out: Collector[C]) {
out.collect(C(elem.x, elem.y, time.Now()))
}
override def processElement2(elem: B, ctx:Context, out: Collector[C]) {
out.collect(C(elem.x, elem.y, time.Now()))
}
}
val streamC: DataStream[C] = streamA.connect(streamB)
.process(new CombineAB)
The watermark of CombineAB operator is the minimum of A or B. Based on that the elements of type C are marked late or not.
But since we have not attached any timestamp assigned to C, does that mean none of the elements from CombineAB operator are marked late? Hence windowing on C will not have any late records being dropped?
Let's say we attach a timestamp assigned and watermark generator to C as follows, then does it mean the watermarks from A and B are completely ignored and watermark of CombineAB depends only on timestamp field of C and the lateness defined with C.
streamC.assignTimestampsAndWatermarks(
WatermarkStrategy
.forBoundedOutOfOrderness[C](Duration.ofSeconds(0))
.withTimestampAssigner(new SerializableTimestampAssigner[C] {
override def extractTimestamp(element: C, recordTimestamp: Long): Long = {
element.updatedTime
}
})
)
Isn't there a way that I can attach the timestamp assigner to C and the watermark for CombineAB is still the minimum of A and B and elements of C are marked late based on C's assigned timestamp and wartermark of CombineAB
Update: Refined implementation of CombineAB

A few points:
forBoundedOutOfOrderness[A](Duration.ofSeconds(0)) is unusual. Any out-of-order events will be late. Why not use forMonotonousTimestamps()?
The records produced by CombineAB will have timestamps; there's no need to apply assignTimestampsAndWatermarks to this stream. The timestamp of any records produced by the Collector is the timestamp of the incoming record.
If you do call assignTimestampsAndWatermarks on stream C, the incoming watermarks will be filtered out, and you'll need to generate new watermarks.

Related

Flink Watermark forBoundedOutOfOrderness includes data beyond boundaries

I'm assigning timestamps and watermarks like this
def myProcess(dataStream: DataStream[Foo]) {
dataStream
.assignTimestampsAndWatermarks(
WatermarkStrategy
.forBoundedOutOfOrderness[Foo](Duration.ofSeconds(5))
.withTimestampAssigner(new SerializableTimestampAssigner[Foo]() {
// Long is milliseconds since the Epoch
override def extractTimestamp(element: Foo, recordTimestamp: Long): Long = element.eventTimestamp
})
)
.keyBy({ k => k.id
})
.window(TumblingEventTimeWindows.of(Time.hours(1)))
.reduce(new MyReducerFn(), new MyWindowFunction())
}
I have a unit test
val env = StreamExecutionEnvironment.getExecutionEnvironment
env.setParallelism(1)
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
val dataTime = 1641488400000L
val lateTime = dataTime + Duration.ofHours(5).toMillis
val dataSource = env
.addSource(new SourceFunction[Foo]() {
def run(ctx: SourceFunction.SourceContext[Foo]) {
ctx.collect(Foo(id=1, value=1, eventTimestamp=dataTime))
ctx.collect(Foo(id=1, value=1, eventTimestamp=lateTime))
// should be dropped due to past max lateness
ctx.collect(Foo(id=1, value=2, eventTimestamp=dataTime))
}
override def cancel(): Unit = {}
})
myProcess(dataSource).collect(new MyTestSink)
env.execute("watermark & lateness test")
I expect that the second element would advance the watermark to (latetime - Duration.ofSeconds(5)) (i.e. the bounded out of orderness) and therefore the third element would not be assigned to the 1 hour tumbling window since the watermark had advanced considerably past it. However I see both the first and third element reach my reduce function.
I must misunderstand watermarks here? Or what forBoundedOutOfOrderness does? Can I please get clarity?
Thanks!
I must have typo'd something I have it working.

Flink side output for late data missing

This is my application code
object StreamingJob {
def main(args: Array[String]) {
// set up the streaming execution environment
val env = StreamExecutionEnvironment.getExecutionEnvironment
// define EventTime and Watermark
var sensorData: DataStream[SensorReading] = env.addSource(new SensorSource).assignTimestampsAndWatermarks(
WatermarkStrategy
.forBoundedOutOfOrderness[SensorReading](Duration.ofSeconds(0))
.withTimestampAssigner(new SerializableTimestampAssigner[SensorReading] {
override def extractTimestamp(t: SensorReading, l: Long): Long = t.timestamp
})
)
val outputTag = OutputTag[SensorReading]("late-event")
val minTemp: DataStream[String] = sensorData
.map(r => {
val celsius = (r.temperature - 32) * (5.0 / 9.0)
SensorReading(r.id, r.timestamp, celsius)
})
.keyBy(_.id)
.window(TumblingEventTimeWindows.of(Time.seconds(5)))
.allowedLateness(Time.seconds(10))
.sideOutputLateData(outputTag)
// compute min temperature
.process(new TemperatureMiner)
val lateStream: DataStream[SensorReading] = minTemp.getSideOutput(outputTag)
lateStream.map(r => s"late event: ${r.id}, ${r.timestamp}, ${r.temperature}").print()
minTemp.print()
// execute program
env.execute("Flink Streaming Scala API Skeleton")
}
}
I am pretty sure that late data are captured because I can see log printing that TemperatureMiner is invoke multiple times for one window, hence late firing.
But the problem is that there is nothing printing by lateStream from side output for late data. Any idea why?
A side output for late data in a window is only sent data that is so late that it falls outside the allowed lateness. Perhaps none of your late data is late enough.

Why does Flink emit duplicate records on a DataStream join + Global window?

I'm learning/experimenting with Flink, and I'm observing some unexpected behavior with the DataStream join, and would like to understand what is happening...
Let's say I have two streams with 10 records each, which I want to join on a id field. Let's assume that for each record in one stream had a matching one in the other, and the IDs are unique in each stream. Let's also say I have to use a global window (requirement).
Join using DataStream API (my simplified code in Scala):
val stream1 = ... // from a Kafka topic on my local machine (I tried with and without .keyBy)
val stream2 = ...
stream1
.join(stream2)
.where(_.id).equalTo(_.id)
.window(GlobalWindows.create()) // assume this is a requirement
.trigger(CountTrigger.of(1))
.apply {
(row1, row2) => // ...
}
.print()
Result:
Everything is printed as expected, each record from the first stream joined with a record from the second one.
However:
If I re-send one of the records (say, with an updated field) from one of the stream to that stream, two duplicate join events get emitted 😞
If I repeat that operation (with or without updated field), I will get 3 emitted events, then 4, 5, etc... 😞
Could someone in the Flink community explain why this is happening? I would have expected only 1 event emitted each time. Is it possible to achieve this with a global window?
In comparison, the Flink Table API behaves as expected in that same scenario, but for my project I'm more interested in the DataStream API.
Example with Table API, which worked as expected:
tableEnv
.sqlQuery(
"""
|SELECT *
| FROM stream1
| JOIN stream2
| ON stream1.id = stream2.id
""".stripMargin)
.toRetractStream[Row]
.filter(_._1) // just keep the inserts
.map(...)
.print() // works as expected, after re-sending updated records
Thank you,
Nicolas
The issue is that records are never removed from your global window. So you trigger the join operation on the global window, whenever a new record has arrived, but the old records are still present.
Thus, to get it running in your case, you'd need to implement a custom evictor. I expanded your example in a minimal working example and added the evictor, which I will explain after the snippet.
val data1 = List(
(1L, "myId-1"),
(2L, "myId-2"),
(5L, "myId-1"),
(9L, "myId-1"))
val data2 = List(
(3L, "myId-1", "myValue-A"))
val stream1 = env.fromCollection(data1)
val stream2 = env.fromCollection(data2)
stream1.join(stream2)
.where(_._2).equalTo(_._2)
.window(GlobalWindows.create()) // assume this is a requirement
.trigger(CountTrigger.of(1))
.evictor(new Evictor[CoGroupedStreams.TaggedUnion[(Long, String), (Long, String, String)], GlobalWindow](){
override def evictBefore(elements: lang.Iterable[TimestampedValue[CoGroupedStreams.TaggedUnion[(Long, String), (Long, String, String)]]], size: Int, window: GlobalWindow, evictorContext: Evictor.EvictorContext): Unit = {}
override def evictAfter(elements: lang.Iterable[TimestampedValue[CoGroupedStreams.TaggedUnion[(Long, String), (Long, String, String)]]], size: Int, window: GlobalWindow, evictorContext: Evictor.EvictorContext): Unit = {
import scala.collection.JavaConverters._
val lastInputTwoIndex = elements.asScala.zipWithIndex.filter(e => e._1.getValue.isTwo).lastOption.map(_._2).getOrElse(-1)
if (lastInputTwoIndex == -1) {
println("Waiting for the lookup value before evicting")
return
}
val iterator = elements.iterator()
for (index <- 0 until size) {
val cur = iterator.next()
if (index != lastInputTwoIndex) {
println(s"evicting ${cur.getValue.getOne}/${cur.getValue.getTwo}")
iterator.remove()
}
}
}
})
.apply((r, l) => (r, l))
.print()
The evictor will be applied after the window function (join in this case) has been applied. It's not entirely clear how your use case exactly should work in case you have multiple entries in the second input, but for now, the evictor only works with single entries.
Whenever a new element comes into the window, the window function is immediately triggered (count = 1). Then the join is evaluated with all elements having the same key. Afterwards, to avoid duplicate outputs, we remove all entries from the first input in the current window. Since, the second input may arrive after the first inputs, no eviction is performed, when the second input is empty. Note that my scala is quite rusty; you will be able to write it in a much nicer way. The output of a run is:
Waiting for the lookup value before evicting
Waiting for the lookup value before evicting
Waiting for the lookup value before evicting
Waiting for the lookup value before evicting
4> ((1,myId-1),(3,myId-1,myValue-A))
4> ((5,myId-1),(3,myId-1,myValue-A))
4> ((9,myId-1),(3,myId-1,myValue-A))
evicting (1,myId-1)/null
evicting (5,myId-1)/null
evicting (9,myId-1)/null
A final remark: if the table API offers already a concise way of doing what you want, I'd stick to it and then convert it to a DataStream when needed.

Flink - behaviour of timesOrMore

I want to find pattern of events that follow
Inner pattern is:
Have the same value for key "sensorArea".
Have different value for key "customerId".
Are within 5 seconds from each other.
And this pattern needs to
Emit "alert" only if previous happens 3 or more times.
I wrote something but I know for sure it is not complete.
Two Questions
I need to access the previous event fields when I'm in the "next" pattern, how can I do that without using the ctx command because it is heavy..
My code brings weird result - this is my input
and my output is
3> {first=[Customer[timestamp=50,customerId=111,toAdd=2,sensorData=33]], second=[Customer[timestamp=100,customerId=222,toAdd=2,sensorData=33], Customer[timestamp=600,customerId=333,toAdd=2,sensorData=33]]}
even though my desired output should be all first six events (users 111/222 and sensor are 33 and then 44 and then 55
Pattern<Customer, ?> sameUserDifferentSensor = Pattern.<Customer>begin("first", skipStrategy)
.followedBy("second").where(new IterativeCondition<Customer>() {
#Override
public boolean filter(Customer currCustomerEvent, Context<Customer> ctx) throws Exception {
List<Customer> firstPatternEvents = Lists.newArrayList(ctx.getEventsForPattern("first"));
int i = firstPatternEvents.size();
int currSensorData = currCustomerEvent.getSensorData();
int prevSensorData = firstPatternEvents.get(i-1).getSensorData();
int currCustomerId = currCustomerEvent.getCustomerId();
int prevCustomerId = firstPatternEvents.get(i-1).getCustomerId();
return currSensorData==prevSensorData && currCustomerId!=prevCustomerId;
}
})
.within(Time.seconds(5))
.timesOrMore(3);
PatternStream<Customer> sameUserDifferentSensorPatternStream = CEP.pattern(customerStream, sameUserDifferentSensor);
DataStream<String> alerts1 = sameUserDifferentSensorPatternStream.select((PatternSelectFunction<Customer, String>) Object::toString);
You will have an easier time if you first key the stream by the sensorArea. They you will be pattern matching on streams where all of the events are for a single sensorArea, which will make the pattern easier to express, and the matching more efficient.
You can't avoid using an iterative condition and the ctx, but it should be less expensive after keying the stream.
Also, your code example doesn't match the text description. The text says "within 5 seconds" and "3 or more times", while the code has within(Time.seconds(2)) and timesOrMore(2).

Apache Flink Custom Window Aggregation

I would like to aggregate a stream of trades into windows of the same trade volume, which is the sum of the trade size of all the trades in the interval.
I was able to write a custom Trigger that partitions the data into windows. Here is the code:
case class Trade(key: Int, millis: Long, time: LocalDateTime, price: Double, size: Int)
class VolumeTrigger(triggerVolume: Int, config: ExecutionConfig) extends Trigger[Trade, Window] {
val LOG: Logger = LoggerFactory.getLogger(classOf[VolumeTrigger])
val stateDesc = new ValueStateDescriptor[Double]("volume", createTypeInformation[Double].createSerializer(config))
override def onElement(event: Trade, timestamp: Long, window: Window, ctx: TriggerContext): TriggerResult = {
val volume = ctx.getPartitionedState(stateDesc)
if (volume.value == null) {
volume.update(event.size)
return TriggerResult.CONTINUE
}
volume.update(volume.value + event.size)
if (volume.value < triggerVolume) {
TriggerResult.CONTINUE
}
else {
volume.update(volume.value - triggerVolume)
TriggerResult.FIRE_AND_PURGE
}
}
override def onEventTime(time: Long, window: Window, ctx: TriggerContext): TriggerResult = {
TriggerResult.FIRE_AND_PURGE
}
override def onProcessingTime(time: Long, window:Window, ctx: TriggerContext): TriggerResult = {
throw new UnsupportedOperationException("Not a processing time trigger")
}
override def clear(window: Window, ctx: TriggerContext): Unit = {
val volume = ctx.getPartitionedState(stateDesc)
ctx.getPartitionedState(stateDesc).clear()
}
}
def main(args: Array[String]) : Unit = {
val env: StreamExecutionEnvironment = StreamExecutionEnvironment.getExecutionEnvironment
env.setParallelism(1)
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
val trades = env
.readTextFile("/tmp/trades.csv")
.map {line =>
val cells = line.split(",")
val time = LocalDateTime.parse(cells(0), DateTimeFormatter.ofPattern("yyyyMMdd HH:mm:ss.SSSSSSSSS"))
val millis = time.toInstant(ZoneOffset.UTC).toEpochMilli
Trade(0, millis, time, cells(1).toDouble, cells(2).toInt)
}
val aggregated = trades
.assignAscendingTimestamps(_.millis)
.keyBy("key")
.window(GlobalWindows.create)
.trigger(new VolumeTrigger(500, env.getConfig))
.sum(4)
aggregated.writeAsText("/tmp/trades_agg.csv")
env.execute("volume agg")
}
The data looks for example as follows:
0180102 04:00:29.715706404,169.10,100
20180102 04:00:29.715715627,169.10,100
20180102 05:08:29.025299624,169.12,100
20180102 05:08:29.025906589,169.10,214
20180102 05:08:29.327113252,169.10,200
20180102 05:09:08.350939314,169.00,100
20180102 05:09:11.532817015,169.00,474
20180102 06:06:55.373584329,169.34,200
20180102 06:07:06.993081961,169.34,100
20180102 06:07:08.153291898,169.34,100
20180102 06:07:20.081524768,169.34,364
20180102 06:07:22.838656715,169.34,200
20180102 06:07:24.561360031,169.34,100
20180102 06:07:37.774385969,169.34,100
20180102 06:07:39.305219107,169.34,200
I have a time stamp, a price and a size.
The above code can partition it into windows of roughly the same size:
Trade(0,1514865629715,2018-01-02T04:00:29.715706404,169.1,514)
Trade(0,1514869709327,2018-01-02T05:08:29.327113252,169.1,774)
Trade(0,1514873215373,2018-01-02T06:06:55.373584329,169.34,300)
Trade(0,1514873228153,2018-01-02T06:07:08.153291898,169.34,464)
Trade(0,1514873242838,2018-01-02T06:07:22.838656715,169.34,600)
Trade(0,1514873294898,2018-01-02T06:08:14.898397117,169.34,500)
Trade(0,1514873299492,2018-01-02T06:08:19.492589659,169.34,400)
Trade(0,1514873332251,2018-01-02T06:08:52.251339070,169.34,500)
Trade(0,1514873337928,2018-01-02T06:08:57.928680090,169.34,1000)
Trade(0,1514873338078,2018-01-02T06:08:58.078221995,169.34,1000)
Now I like to partition the data so that the volume is exactly matching the trigger value. For this I would need to change the data slightly by splitting a trade at the end of an interval into two parts, one that belongs to the actual window being fired, and the remaining volume that is above the trigger value, has to be assigned to the next window.
Can that be handled with some custom aggregation function? It would need to know the results from the previous window(s) though and I was not able to find out how to do that.
Any ideas from Apache Flink experts how to handle that case?
Adding an evictor does not work as it only purges some elements at the beginning.
I hope the change from Spark Structured Streaming to Flink was a good choice, as I have later even more complicated situations to handle.
Since your key is the same for all records, you may not require a window in this case. Please refer to this page in Flink's documentation https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/stream/state/state.html#using-managed-keyed-state.
It has a CountWindowAverage class where in the aggregation of a value from each record in a stream is done using a State Variable. You can implement this and send the output whenever the state variable reaches your trigger volume and reset the value of the state variable with the remaining volume.
A simple approach (though not super-efficient) would be to put a FlatMapFunction ahead of your windowing flow. If it's keyed the same way, then you can use ValueState to track total volume, and emit two records (the split) when it hits your limit.

Resources