In the documentation of FlinkCEP, I found that I can enforce that a particular event doesn't occur between two other events using notFollowedBy or notNext.
However, I was wondering If I could detect the absence of a certain event after a time X.
For example, if an event A is not followed by another event A within 10 seconds, fire an alert or do something.
Could be possible to define a FlinkCEP pattern to capture that situation?
Thanks in advance,
Humberto
Although Flink CEP does not support notFollowedBy at the end of a Pattern, there is a way to implement this by exploiting the timeout feature.
The Flink training includes an exercise where the objective is to identify taxi rides with a START event that is not followed by an END event within two hours. You'll find a solution to this exercise that uses CEP
here.
The main idea would be to define a Pattern of A followed by A within 10 seconds, and then capture the case where this times out.
Related
I use flink to process dynamoDB stream data.
Watermark strategy: periodic, extract approximate time stamp from stream events and use it under withTimeStampAssigner.
Idleness: 10s(may not be useful at all as we only use parallism of 1.)
The data work flow looks like this:
InputStream.assignTimeStampsAndWatermarks().keyby().window(TumblingEventTimeWindow.of(1min).sideOutputLateData().reduce().map()
Then I getSideOutput(), and process the late events using exactly similar above workflow with small change such as no need to assign time stamp and watermark, no need for late output.
My logs show that all things work perfectly if ddb stream data has right timstamp, the corresponding window can close without issue and I can see the output after window is closed.
However, after I introduced late events, the late records processing logic is never triggered. I am sure that the late record’s timestamp corresponding window has closed. I put a log after I call getSideOutPut(), it never triggered. I used debugger and I am sure the getSideOutput() code is not triggered as well.
Can someone help to check this issue? Thank you.
I tried to use a different watermark strategy for late records logic. This doesn’t work as well. I want to understand why the late records are not collected to the late stream.
Without seeing more details from your implementation is difficult to give an accurate diagnosis, but based on your description, I wouldn't expect this to work:
Then I getSideOutput(), and process the late events using exactly similar above workflow with small change such as no need to assign time stamp and watermark, no need for late output.
If you are trying to apply event time windowing to the stream of late events, that's not going to work unless you adjust the allowed lateness for those windows enough to accommodate them.
As a starting point, have you tried printing the stream of late events?
I am having quite hard time to understand flink windowing principals and would be very pleased if you could point me in the right direction.
My purpose is to count the number of recurring events for a time interval and generate alert events if the number of recurring events is greater than a threshold.
As I understand, windowing is a perfect match for this scenario.
Additional requirement is to generate an early alert if recurring events count in a window is 2 (i.e. alert should be generated without waiting window end).
I thought that an alert event generating process window function can be used to aggregate windowed events and a custom trigger can be used to emit early results from the window based on the recurring events count (before the watermark reaches the window’s end timestamp).
I am using event-time semantics and having problems/questions for the custom trigger .
You can find the actual implementation in the gist: https://gist.github.com/simpleusr/7c56d4384f6fc9f0a61860a680bb5f36
I am using keyed state to keep track of element count in the window encounteredElementsCountState
Upon receiving first element I register EventTimeTimer to the window end. This is supposed to trigger FIRE_AND_PURGE for window closing and working as expected.
If the count exceeds threshold , I try to trigger early fire. This also seems to be successful, processwindow function is called immediately after this firing.
The problem is, I had to insert below check to the code without understanding the reason. Because the previously collected elements were again supplied to onElement method:
if (ctx.getCurrentWatermark() < 0) {
logger.debug(String.format("onElement processing skipped for eventId : %s for watermark: %s ", element.getEventId(), ctx.getCurrentWatermark()));
return TriggerResult.CONTINUE;
}
I could not figure out the reason. What I see is that when this happens the watermark value is (ctx.getCurrentWatermark()) Long.MIN_VALUE (that leaded to the above check). How can this happen?
This check seems to avoid duplicate early event generation, but I do not know why this happens and is this workaround is appropriate.
Could you please advice why the same elements are processed twice in the window?
Another question is about the keyed state usage. Does this implementation leaks any state after window is disposed? I am trying to clear all used states in clear method of the trigger but would that be enough?
Regards.
Each task has currentWatermark initialized to Long.MIN_VALUE, and this remains the local value of currentWatermark until larger watermarks have arrived from all of that task's input streams. Hopefully knowing that will help you better understand what's going on.
For what it's worth, often it's more straightforward to implement this kind of logic with a ProcessFunction than with the Window API.
I just started using Flink and have a problem I'm not sure how to solve. I get events from a Kafka Topic, these events represent a "beacon" signal from a mobile device. The device sends an event every 10 seconds.
I have an external customer that is asking for a beacon from our devices but every 60 seconds. Since we are already using Flink to process other events I thought I could solve this using a count window, but I'm struggling to understand how to "discard" the first 5 events and emit only the last one. Any ideas?
There are some ways to do this. As Far as I understand the idea is as follows: You receive beacon signal each 10 sec but You actually only need the most actual one and disard the others since the client asks for the data each 60 sec.
The simplest would be ofc to use ProcessFunction with count/event time window as You said. The type of the window actually depends on Your requirements. Then You sould do something like this:
stream.timeWindow([windowSize]).process(new CustomWindowProcessFunction())
The signature of the process() method of the ProcessWindowFunctionis as follows, depending on the type of the actual function def process(context: Context, elements: Iterable[IN], out: Collector[OUT]). So basically it gives you the acces to all window elements, so You can easily only push further the elements You like.
While this is the simplest idea, you may want also to take a look at the Flink timers, as they seem to be a good solution for Your issue. They are described here.
I have an always one application, listening to a Kafka stream, and processing events. Events are part of a session. And I need to do calculations based off of a sessions data. I am running into a problem trying to correctly run my calculations due to the length of my sessions. 90% of my sessions are done after 5 minutes. 99% are done after 1 hour. Sessions may last more than a day, due to this being a real-time system, there is no determined end. Session are unique, and show never collide.
I am looking for a way where I can process a window multiple times, either with an initial wait period and processing any later events after that, or a pure process per event type structure. I will need to keep all previous events around(ListState), as well as previously processed values(ValueState).
I previously thought allowedLateness would allow me to do this, but it seems the lateness is only considered for when the event should have been processed, it does not extend an actual window. GlobalWindows may also work, but I am unsure if there is a way to process a window multiple times. I believe I can used an evictor with GlobalWindows to purge the Windows after a period of inactivity(although admittedly, I did not research this yet, because I was unsure of how to trigger a GlobalWindow multiple times.
Any suggestions on how to achieve what I am looking to do would be greatly appreciated, I would also be happy to clarify any points needed.
If SessionWindows won't do the job, then you can use GlobalWindows with a custom Trigger and Evictor. The Trigger interface has onElement and timer-based callbacks that can fire whenever and as often as you like. If you go down this route, then yes, you'll also need to implement an Evictor to dispose of elements when they are no longer needed.
The documentation and the source code are helpful when trying to understand how this all fits together.
Is it possible, or more precisely how is it possible to use RX.Net to listen to a number and different variety of (WinForms) controls' .TextChanged/.RowsChanged/.SelectionChanged events and whenever one condition is fullfilled (ControlA.Text isn't empty, ControlB.RowsCount > 0 etc) enable that one DoSomething button.
I am asking because currently we have a lengthy if/then statement in each of these events' handlers and maintaining them if the condition changes is, due to duplicate code, quite error prone and that's why, if possible, I think it would be nice to take the stream of events and put the condition in one place.
Has anyone done that?
You can use Observable.FromEventPattern in combination with Join patterns, with Observable.When, Observable.And and Observable.Then, to create observables which will fire depending on various conditions, like combinations of events. For example, consider my answer here: Reactive Extensions for .NET (Rx): Take action once all events are completed
You should be able to use .FromEventPattern() to create observables for each of those events and then use CombineLatest() to do your logic on the current overall state and determine whether your button should be enabled, in one place.