I want to start a timer when an event is received in Flink, send the timer value and stop the timer when another event is received. Let me explain. An event consists of an event name, a source id and other fields. So I have something like this:
E1("A",1,...) -> E2("B",1,...) -> E3("C",1,...).
When I receive event "A" I want to start a timer (keyed by the source id) and update a sink with the timer value periodically. When I receive event "C" I want to stop the timer and update the sink with the final timer value. Is there a way to accomplish that in Apache Flink?
You'd do a .keyBy(r -> r.getSourceId()), and follow that with a custom KeyedProcessWindow. This gives you access to the Flink TimerService, where you can create timers. When a timer fires, the onTimer() method in your custom function will be called, which is where you can use the last value you saved in state to update the "sink" (actually a remote service of some sort). You can start a new timer in the onTimer() method, so that it's a periodic update.
Related
In Flink job I have a KeyedProcessFunction.
I have implemented a watermark strategy
val wmStrategy: WatermarkStrategy<MyInput> =
WatermarkStrategy.forMonotonousTimestamps<MyInput>()
.withTimestampAssigner { event: MyInput, _: Long -> event.getTimestampEvent() }
and then i apply it to my source data:
mysource.assignTimestampsAndWatermarks(wmStrategy)
When processElement is called a timer may be registered ctx.timerService().registerEventTimeTimer(timerWakeUpInstant.toEpochMilli()) and after that the ValueState is updated. Update is successful.
The next time processElement is called, valueState.value() returns null instead of the last updated value.
No clear() is called explicitly on the value state.
The timer is never triggered.
At the moment, I'm testing in a 'clean' environment, reading from a text file with data referring to only a key, and with parallelism = 1 running into my IDE.
Can you help me? Why the state is nullified? And why timer is not triggered?
I have tried myself: OnTimer is not called until the Function that has registered the timer receives a message that advences the watermark.
With event-time timers, the onTimer(...) method is called when the current watermark is advanced up to or beyond the timestamp of the time
The "current" watermark actually refers to the operator, and not the job. This was misleading for me, as i thought it was centralized.
Looking at some code sample in the documentation we can find a useful comment that may give us a hint:
//trigger event time timers by advancing the event time of the operator with a watermark
Do timers in Flink get fired if they are set to a timestamp in the past? Since the current timestamp is greater than that of the timer service, would it get fired immediately or never get fired?
Also, we are trying to sort/order input events based on the event time by collecting/buffering them in a processing time based tumbling window just so we don't have to drop late events. Are there are any better solutions to address this?
Timers set to a timestamp in the past get triggered ASAP.
For sorting, see How to sort an out-of-order event time stream using Flink and How to sort a stream by event time using Flink SQL.
I've implemented a Flink processor that aggregates events into sessions and then writes them to a sink. Now I'd like extend it so that I can get the number of concurrent sessions every five minutes.
The events coming into my system are on the form:
{
"SessionId": "UniqueUUID",
"Customer": "CustomerA",
"EventType": "EventTypeA",
[ ... ]
}
And a single session usually contains several events of different EventTypes. I then aggregate the events into sessions by doing the following in Flink.
DataStream<Session> sessions = events
.keyBy((KeySelector<HashMap, String>) event -> (String) event.get(Field.SESSION_ID))
.window(ProcessingTimeSessionWindows.withGap(org.apache.flink.streaming.api.windowing.time.Time.minutes(5)))
.trigger(SessionProcessingTimeTrigger.create())
.aggregate(new SessionAggregator())
Each session is the emitted (by the SessionProcessingTimeTrigger) when an event with a specific EventType is processed ("EventType":"Session.Ended"). And finally the stream is sent to a sink and written Kafka.
Now I want to write a similar Flink processor but instead of only emitting a session once it is finished, I instead want to emit all sessions every 5 minutes in order to keep track of how many concurrent session we have every 5 minutes.
So in a sense I guess what I want is a SessionWindow that also emits it's contents at regular intervals without purging the content.
I'm stumped on how to accomplish this in Flink and are therefore looking for some aid.
Whenever you want a Flink window to emit results at non-default times, you can do this by implementing a custom Trigger. You trigger just needs to return FIRE each time a 5-minute-long timer fires, in addition to its original logic. You'll want to register this timer when the first event is assigned to a window, and again every time the timer fires.
In the case of session windows this can be more complex because of the manner in which session windows are merged. But I believe that in the case of processing time session windows what I've outlined above will work.
I want to know exactly
When will watermark value be set as Long.MaxValue? (On canceling a SourceFunction? Cancel a job through cli & web-panel? ... )
What does it means for an application? (End of the job? Job failure? with no re/start?)
And how should it be handled? (clearing all the states? what about timers? As I saw registering a new timer on this state will make application to run forever! If I should be able to persist a state in last-watermark to recover from it in later time/run, how I should persist a timer-state?)
The last watermark is emitted when your SourceFunction exits the run method and it means you have consumed all input.
Given this you should not need to clear as the job will be marked as finished once the watermark reaches all sinks.
I have a stream of elements in an EventTime stream to a TumblingWindow to an AggregationFunction to a FlatMap which maintains state over aggregations and generates alarms
The elements are coming over a network and occasionally there is a delay in events so that the TumblingWindow does not close for some time. In these cases there can be elements buffered in the window that when combined with what had been collected earlier would generate an alarm. Is there a way to close a TumblingWindow early in such a scenario?
Is there a recommended pattern to follow for something like this.
I tried the following with no success ... should it have worked
I combined a ProcessFunction, a custom trigger and added a boolean flushSignal to the Element in the stream. I inserted the ProcessFunction upstream of the window and starting a processing time timer in it, from the processingElementMethod
long timeout = ctx.timerService().currentProcessingTime() + 10000;
ctx.timerService().registerProcessingTimeTimer(timeout);
and save the element that was passed in to be processed in the keyed state
When the timer expired in the onTimer method I retrieved the element from the keyedstate, set the flushSignal set to true and passed it to the onTimer collector.
In the trigger the intent was to in the onElement method return TriggerResult.FIRE.
However, the Element instance did not make it to the custom trigger. In the TumblingWindow.assignWindows method the timestamp is invalid.
Does the collector in ProcessFunction.onTimer need to be correlated with the collector in the ProcessFunction.processElement?
Note that I element stream has time characteristic eventTime and the timer I am trying to use is a ProcessingTimeTimer, is that a problem?
Any suggestions would be much appreciated.
Thanks
Jim