How is keyed state managed for KeyedBroadcastProcessFunction in Flink? - apache-flink

I am using BroadcastState to perform streaming computation in Flink. I have defined a class extending KeyedBroadcastProcessFunction for my job. Say I have a stream A which is keyed by (user_id, location), and a stream B, which is broadcasted to all executors to process elements in A using my defined class. I understand I can registered a timer in processBroadcastElement or processElement in this class so that when it times out, I can delete the associated state for a specific key group by calling state.clear(). I wonder after that, does this key group still exist?
For example, in stream A, a new message comes with (user_id=1, location='usa') and we have such key group and its associated states generated. After that if another message with (user_id=1, location='usa') comes, it will trigger processElement() and emit result.
Say after 24 hours, I'm no longer interested with this key group (user_id=1, location='usa'), I can register a timer to clear the associated state, but I have no control of this key group. As a result, after 24 hours, when another message with (user_id=1, location='usa') comes, since this key group still exists, processElement() will still be invoked. As the job runs, although their associated states will be cleared after 24 hours, will key groups accumulate or that should not be a concern for memory usage?
Relevant blogs: https://www.da-platform.com/blog/a-practical-guide-to-broadcast-state-in-apache-flink

Flink's keyed state is organized as a distributed (or sharded) key-value store, where the keys can be simple things, like integers and strings, or composites, like (user_id=1, location='usa'). Key groups are something different than composite keys. A key group is a runtime construct that was introduced in Flink 1.2 (see FLINK-3755) to permit efficient rescaling of key-value state. A key group is a subset of the key space, and is checkpointed as an independent unit. At runtime, all of the keys in the same key group are partitioned together in job graph -- each subtask has the key-value state for one or more complete key groups. This design doc gives more details. As a user working with the DataStream API, key groups are an implementation detail, and not something you work with directly.
As for timers in a KeyedBroadcastProcessFunction, they can be registered in the processElement or onTimer method, but not in the processBroadcastElement method. This is because timers are always associated with a key, and there is no key associated with a broadcast element. You can, however, manipulate any or all of the keyed state during your processBroadcastElement method by using the applyToKeyedState method on the KeyedBroadcastProcessFunction.Context object. See the docs for more details.
Once you call state.clear(), the state entry for that key is deleted. New stream events for that key may, of course, arrive after the state has been cleared, and you are able to once again store value state for that key, if you wish. In order to avoid unbounded memory usage due to keeping state for no-longer-relevant keys, you do need to be careful. You might want some logic like this to expire the state 24 hours after each time it is created:
processElement:
if state.value() is null, register timer
state.update(...)
onTimer:
state.clear()
Or you might need more complex logic that extends the lifetime of the state whenever it is updated or accessed.
Another option would be to use the state time-to-live feature.
Update:
Whenever you are in a processElement or onTimer method of any of the ProcessFunction types, there is a specific key implicitly in context, and anything done to keyed state (such as .update() or .clear()) will only affect the state for that one key.
Broadcast state works differently. Broadcast state is always MapState, and is replicated into all of the parallel subtasks. Broadcast state is keyless -- if you read broadcast state during the processElement method you will see the same value for the broadcast state regardless of what key is in context during that call.
Only in the processBroadcastElement method of a KeyedBroadcastProcessFunction can you modify (or clear) broadcast state, and it's important that whatever modifications (or deletions) occur be done in the same way in all of the parallel instances. This is designed this way so as to guarantee that every parallel instance will have the same contents in broadcast state. Ignoring this rule will lead to inconsistencies in the state, which can be very difficult to debug. See the docs for more info.
So yes, if you call .clear() on the broadcast state, then all of the broadcast state for all keys will be removed. Or you might remove a specific item from the broadcast state (remember, broadcast state is MapState), in which case that specific item will be removed for all keys.
There are several examples of working with broadcast state in the Flink training site. See
https://training.da-platform.com/exercises/ongoingRides.html
https://training.da-platform.com/exercises/nearestTaxi.html
https://training.da-platform.com/exercises/taxiQuery.html

Related

Flink AggregateFunction vs KeyedProcessFunction with ValueState

We have an application that consumes events from a kafka source. The logic from processing each element needs to take into account the events that were previously received (having the same partition key), without using time for windowing. The first implementation used a GlobalWindow, with an AggregateFunction for keeping the current state information and a trigger that would always fire in onElement call. I am guessing that the alternative of using a KeyedProcessFunction that and holds the state in a ValueState object would be more adequate, since we are not really taking timing into account, nor using any custom triggering. Is this assumption correct and are there any downsides to either one of these approaces?
In prefer using a KeyedProcessFunction in cases like this. It puts all of the related logic into one object -- rather than having to coordinate what's going on in a GlobalWindow, an AggregateFunction, and a Trigger (and perhaps also an Evictor). I find this results in implementations that are more maintainable and testable, plus you have more straightforward control over state management.
I don't see any advantages to a solution based on windows.

How to process an already available state based on an event comes in a different stream in flink

We are working on deriving the status of accounts based on the activity on it. We calculate and keep the expiryOn date(which says the tentative, future date on which account expires) based on the user activity on the account.
We have a manual date change event which gives a date based on which the status of the account is emitted as Expired.
I would like to know on what would be the best way to achieve this.
So, my question is since the date change event occurs in future when compared to the calculation of the expiryOn date, can the broadcasted state be a solution for this? If yes, please suggest the way.
Or, is there any better approaches like Table API to solve this problem?
Broadcast state is suitable in cases (like this one) where you need to either share information or invoke actions that aren't keyed, and so cannot be sent to one relevant instance.
If you need to store the broadcast state, keep in mind that each instance will store a copy of the broadcast state on the heap, and include that copy in its checkpoints.
If you are using context.applytokeyedstate, be careful to make changes to the keyed state that are deterministic -- otherwise, in the event of a failure and recovery at a point where some instances of the broadcast operator have applied the changes to keyed state, and other instances have not, you could end up with inconsistencies.

Flink window aggregation with state

I would like to do a window aggregation with an early trigger logic (you can think that the aggregation is triggered either by window is closed, or by a specific event), and I read on the doc: https://ci.apache.org/projects/flink/flink-docs-release-1.12/dev/stream/operators/windows.html#incremental-window-aggregation-with-aggregatefunction
The doc mentioned that Note that using ProcessWindowFunction for simple aggregates such as count is quite inefficient. so the suggestion is to pair with incremental window aggregation.
My question is that AverageAggregate in the doc, the state is not saved anywhere, so if the application crashed, the averageAggregate will loose all the intermediate value, right?
So If that is the case, is there a way to do a window aggregation, still supports incremental aggregation, and has a state backend to recover from crash?
The AggregateFunction is indeed only describing the mechanism for combining the input events into some result, that specific class does not store any data.
The state is persisted for us by Flink behind the scene though, when we write something like this:
input
.keyBy(<key selector>)
.window(<window assigner>)
.aggregate(new AverageAggregate(), new MyProcessWindowFunction());
the .keyBy(<key selector>).window(<window assigner>) is indicating to Flink to hold a piece of state for us for each key and time bucket, and to call our code in AverageAggregate() and MyProcessWindowFunction() when relevant.
In case of crash or restart, no data is lost (assuming state backend are configured properly): as with other parts of Flink state, the state here will either be retrieved from the state backend or recomputed from first principles from upstream data.

How to clear the whole MapSate state with only one call

I know that if I do mapState.clear() I will be able to clean all the values into the state for the specific key, but my question is: Is there a way to do something like mapState.clear() and clean all the states into the mapStates with just one call? will be something like mapState.isEmpty() it will say "true" because all the keys into the mapState were cleaned up, not just for the current key.
Thanks.
Kind regards!
Because we are talking about a situation with nested maps, it's easy to get our terminology confused. So let's put this question into the context of an example.
Suppose you have a stream of events about users, and inside a KeyedProcessFunction you are using a MapState<ATTR, VALUE> to maintain a map of attribute/value pairs for each user:
userEvents
.keyBy(e -> e.userId)
.process(new ManageUserData())
Inside the process function, any time you are working with MapState you can only manipulate the one map for the user corresponding to the event being processed,
public static class ManageUserData extends KeyedProcessFunction<...> {
MapState<ATTR, VALUE> userMap;
}
so userMap.clear() will clear the entire map of attribute/value pairs for one user, but leave the other maps alone.
I believe you are asking if there's some way to clear all of the MapStates for all users at once. And yes, there is a way to do this, though it's a bit obscure and not entirely straightforward to implement.
If you change the KeyedProcessFunction in this example to a KeyedBroadcastProcessFunction, and connect a broadcast stream to the stream of user events, then in that KeyedBroadcastProcessFunction you can use KeyedBroadcastProcessFunction.Context.html#applyToKeyedState inside of the processBroadcastElement() method to iterate over all of the users, and for each user, clear their MapState.
You will have to arrange to send an event on the broadcast stream whenever you want this to happen.
You should pay attention to the warnings in the documentation regarding working with broadcast state. And keep in mind that the logic implemented in processBroadcastElement() must have the same deterministic behavior across all parallel instances.

When to use CoProcess Function in Flink?

I am just trying to understand the use case when to use CoProcessFunction in Flink. Explanation with an example would help me to understand the concept better.
A CoProcessFunction is similar to a RichCoFlatMap, but with the addition of also being able to use timers. The timers are useful for expiring state for stale keys, or for raising alarms when keep alive messages fail to arrive, for example.
A CoProcessFunction allows you to use one stream to influence how another is processed, or to enrich another stream. For example, an e-commerce site might have a stream of order events and a stream of shipment events, and they want to create a stream of events for orders that haven't shipped with 24 hours of the order being placed. The two streams can be keyed by the orderId, and connected together. As an order arrives it's recorded in keyed state, and a timer is created to fire 24 hours later. When a shipment event arrives, the state and timer are cleared. If a timer does fire, the state is used to send the order out to the unfilled order service.
For more on this, and examples with code, see connected streams and process function and the labs that accompany those tutorials.

Resources