flink aggregation with window AND state - apache-flink

I'm looking for a way to implement aggregation/fold function on a window that also have a state.
I understand how to aggregate on a window, and how to use key/global state - but not both.
Just to be clear, when I say a window with state - I mean that the state should be initialized (nullified) every time the window is changed/moved.
For example: I want to count the number of events keyed by event type every 5 minutes. But in addition to event type (which is the window key) the event has some id field - and I would like to count each id only once - so I need to save a state of all the ids I've already counted in that window.
Is there a simple way to do this in Flink?

Flink has a RichReduceFunction, which will give you access to state that is global across all windows for a given key. If you need per-window state, see [Flink-5929] which will be part of Flink 1.3.

Related

Accessing flink state within Window via queryable state or storing state after each window aggregation

I am using apache flink to aggregate kafka stream on certain amount of time based windows.
After window expires, data will be saved to storage and this is all implemented via this structure:
SingleOutputStreamOperator<Aggregator> stream = source
.forceNonParallel()
.keyBy(Object::getKey)
.window(TumblingEventTimeWindows.of(Time.minutes(5)))
.aggregate(new CustomAggregator(), new KeyedWindowFunction("m5"));
Now my requirement is to have access to window state at any point in time in order to retrieve it because I am showing realtime chart on UI.
So for example, in this case only after 5 mins have expired will object be saved to data storage and be eligibale for access and that means that users will only see last 5 minutes and not the current aggregation which is non ideal.
As far as I see there are 2 options:
Either flink exposes way to access window state via queryable state
Or I need to create custom state object and on each element after aggregation store this into the state object so I can retrieve it via queryable state.
ProcessWindowFunction has access to context and state, but its called only after window has expired which is no use to me since at that point, data will be put into sink.
Problem with second approach is that AggregateFunction called in .aggregate() does not have access to context and so I cannot save it there.
So the question is, is there a wat to combine: window, AggregateFunction, and be able to store each aggregation of wndow into custom state object?

Flink window aggregation with state

I would like to do a window aggregation with an early trigger logic (you can think that the aggregation is triggered either by window is closed, or by a specific event), and I read on the doc: https://ci.apache.org/projects/flink/flink-docs-release-1.12/dev/stream/operators/windows.html#incremental-window-aggregation-with-aggregatefunction
The doc mentioned that Note that using ProcessWindowFunction for simple aggregates such as count is quite inefficient. so the suggestion is to pair with incremental window aggregation.
My question is that AverageAggregate in the doc, the state is not saved anywhere, so if the application crashed, the averageAggregate will loose all the intermediate value, right?
So If that is the case, is there a way to do a window aggregation, still supports incremental aggregation, and has a state backend to recover from crash?
The AggregateFunction is indeed only describing the mechanism for combining the input events into some result, that specific class does not store any data.
The state is persisted for us by Flink behind the scene though, when we write something like this:
input
.keyBy(<key selector>)
.window(<window assigner>)
.aggregate(new AverageAggregate(), new MyProcessWindowFunction());
the .keyBy(<key selector>).window(<window assigner>) is indicating to Flink to hold a piece of state for us for each key and time bucket, and to call our code in AverageAggregate() and MyProcessWindowFunction() when relevant.
In case of crash or restart, no data is lost (assuming state backend are configured properly): as with other parts of Flink state, the state here will either be retrieved from the state backend or recomputed from first principles from upstream data.

Does Flink automatically checkpoint AggregateFunction's state and how to use AggregatingStateDescriptor?

I am implementing a AggregateFunction to measure the duration between two events after .window(EventTimeSessionWindows.withGap(gap))
. After the second event is processed, the window is closed.
Will flink automatically checkpoint the state of the AggregateFunction so that existing data in the accumulator is not lost from restarting?
Since I am not sure about that. I tried to implement AggregatingState in a RichAggregateFunction:
class MyAgg extends RichAggregateFunction<IN, ACC, OUT>
AggregatingState requires AggregatingStateDescriptor. Its constructor has this signature:
String name,
AggregateFunction<IN, ACC, OUT> aggFunction,
Class<ACC> stateType) {
I am very confused by the aggFunction. What should be put here? Isn't it the MyAgg that I am trying to define in the first place?
An AggregateFunction doesn't have any state. But the aggregating state used in a streaming window (and manipulated by an AggregateFunction) is checkpointed as part of the window's state.
A RichAggregateFunction cannot be used in a window context, and an AggregateFunction cannot have its own state. It's designed this way because if an AggregateFunction were allowed to use a state descriptor to define ValueState, for example, then that state wouldn't be mergeable -- and to keep the Window API reasonably clean, all window state needs to be mergeable (for the sake of session windows).
AggregatingState is something you might use in a KeyedProcessFunction, for example. In that context, you need to define how elements are to be aggregated into the accumulator (i.e., the AggregatingState), which you do with an AggregateFunction.

How is keyed state managed for KeyedBroadcastProcessFunction in Flink?

I am using BroadcastState to perform streaming computation in Flink. I have defined a class extending KeyedBroadcastProcessFunction for my job. Say I have a stream A which is keyed by (user_id, location), and a stream B, which is broadcasted to all executors to process elements in A using my defined class. I understand I can registered a timer in processBroadcastElement or processElement in this class so that when it times out, I can delete the associated state for a specific key group by calling state.clear(). I wonder after that, does this key group still exist?
For example, in stream A, a new message comes with (user_id=1, location='usa') and we have such key group and its associated states generated. After that if another message with (user_id=1, location='usa') comes, it will trigger processElement() and emit result.
Say after 24 hours, I'm no longer interested with this key group (user_id=1, location='usa'), I can register a timer to clear the associated state, but I have no control of this key group. As a result, after 24 hours, when another message with (user_id=1, location='usa') comes, since this key group still exists, processElement() will still be invoked. As the job runs, although their associated states will be cleared after 24 hours, will key groups accumulate or that should not be a concern for memory usage?
Relevant blogs: https://www.da-platform.com/blog/a-practical-guide-to-broadcast-state-in-apache-flink
Flink's keyed state is organized as a distributed (or sharded) key-value store, where the keys can be simple things, like integers and strings, or composites, like (user_id=1, location='usa'). Key groups are something different than composite keys. A key group is a runtime construct that was introduced in Flink 1.2 (see FLINK-3755) to permit efficient rescaling of key-value state. A key group is a subset of the key space, and is checkpointed as an independent unit. At runtime, all of the keys in the same key group are partitioned together in job graph -- each subtask has the key-value state for one or more complete key groups. This design doc gives more details. As a user working with the DataStream API, key groups are an implementation detail, and not something you work with directly.
As for timers in a KeyedBroadcastProcessFunction, they can be registered in the processElement or onTimer method, but not in the processBroadcastElement method. This is because timers are always associated with a key, and there is no key associated with a broadcast element. You can, however, manipulate any or all of the keyed state during your processBroadcastElement method by using the applyToKeyedState method on the KeyedBroadcastProcessFunction.Context object. See the docs for more details.
Once you call state.clear(), the state entry for that key is deleted. New stream events for that key may, of course, arrive after the state has been cleared, and you are able to once again store value state for that key, if you wish. In order to avoid unbounded memory usage due to keeping state for no-longer-relevant keys, you do need to be careful. You might want some logic like this to expire the state 24 hours after each time it is created:
processElement:
if state.value() is null, register timer
state.update(...)
onTimer:
state.clear()
Or you might need more complex logic that extends the lifetime of the state whenever it is updated or accessed.
Another option would be to use the state time-to-live feature.
Update:
Whenever you are in a processElement or onTimer method of any of the ProcessFunction types, there is a specific key implicitly in context, and anything done to keyed state (such as .update() or .clear()) will only affect the state for that one key.
Broadcast state works differently. Broadcast state is always MapState, and is replicated into all of the parallel subtasks. Broadcast state is keyless -- if you read broadcast state during the processElement method you will see the same value for the broadcast state regardless of what key is in context during that call.
Only in the processBroadcastElement method of a KeyedBroadcastProcessFunction can you modify (or clear) broadcast state, and it's important that whatever modifications (or deletions) occur be done in the same way in all of the parallel instances. This is designed this way so as to guarantee that every parallel instance will have the same contents in broadcast state. Ignoring this rule will lead to inconsistencies in the state, which can be very difficult to debug. See the docs for more info.
So yes, if you call .clear() on the broadcast state, then all of the broadcast state for all keys will be removed. Or you might remove a specific item from the broadcast state (remember, broadcast state is MapState), in which case that specific item will be removed for all keys.
There are several examples of working with broadcast state in the Flink training site. See
https://training.da-platform.com/exercises/ongoingRides.html
https://training.da-platform.com/exercises/nearestTaxi.html
https://training.da-platform.com/exercises/taxiQuery.html

collect the data before the end of the time windows in flink

I use apply function to get unique count. But i want to collect the count when the number of unique data changes.
Code :
hashMap
.keyBy(x => x.hash)
.timeWindow(Time.minutes(15))
.apply(new DataWindow())
But apply function is triggered when the time windows end, how can I get the value more frequently without sliding window.
I would recommend using a ProcessFunction rather than a window. You will want to use key-partitioned state to hold whatever data structure you decide use to track the unique values. You can use either an event time timer or a processing time timer to clear the state every 15 minutes, depending on what kind of time is appropriate to your application.
But if you want to stick with windowing, you could implement a custom Trigger. In this case you would need to keep your state in the partitioned state available on the TriggerContext. Also see more info about windows and triggers.

Resources