Flink window aggregation with state - apache-flink

I would like to do a window aggregation with an early trigger logic (you can think that the aggregation is triggered either by window is closed, or by a specific event), and I read on the doc: https://ci.apache.org/projects/flink/flink-docs-release-1.12/dev/stream/operators/windows.html#incremental-window-aggregation-with-aggregatefunction
The doc mentioned that Note that using ProcessWindowFunction for simple aggregates such as count is quite inefficient. so the suggestion is to pair with incremental window aggregation.
My question is that AverageAggregate in the doc, the state is not saved anywhere, so if the application crashed, the averageAggregate will loose all the intermediate value, right?
So If that is the case, is there a way to do a window aggregation, still supports incremental aggregation, and has a state backend to recover from crash?

The AggregateFunction is indeed only describing the mechanism for combining the input events into some result, that specific class does not store any data.
The state is persisted for us by Flink behind the scene though, when we write something like this:
input
.keyBy(<key selector>)
.window(<window assigner>)
.aggregate(new AverageAggregate(), new MyProcessWindowFunction());
the .keyBy(<key selector>).window(<window assigner>) is indicating to Flink to hold a piece of state for us for each key and time bucket, and to call our code in AverageAggregate() and MyProcessWindowFunction() when relevant.
In case of crash or restart, no data is lost (assuming state backend are configured properly): as with other parts of Flink state, the state here will either be retrieved from the state backend or recomputed from first principles from upstream data.

Related

Flink AggregateFunction vs KeyedProcessFunction with ValueState

We have an application that consumes events from a kafka source. The logic from processing each element needs to take into account the events that were previously received (having the same partition key), without using time for windowing. The first implementation used a GlobalWindow, with an AggregateFunction for keeping the current state information and a trigger that would always fire in onElement call. I am guessing that the alternative of using a KeyedProcessFunction that and holds the state in a ValueState object would be more adequate, since we are not really taking timing into account, nor using any custom triggering. Is this assumption correct and are there any downsides to either one of these approaces?
In prefer using a KeyedProcessFunction in cases like this. It puts all of the related logic into one object -- rather than having to coordinate what's going on in a GlobalWindow, an AggregateFunction, and a Trigger (and perhaps also an Evictor). I find this results in implementations that are more maintainable and testable, plus you have more straightforward control over state management.
I don't see any advantages to a solution based on windows.

How to process an already available state based on an event comes in a different stream in flink

We are working on deriving the status of accounts based on the activity on it. We calculate and keep the expiryOn date(which says the tentative, future date on which account expires) based on the user activity on the account.
We have a manual date change event which gives a date based on which the status of the account is emitted as Expired.
I would like to know on what would be the best way to achieve this.
So, my question is since the date change event occurs in future when compared to the calculation of the expiryOn date, can the broadcasted state be a solution for this? If yes, please suggest the way.
Or, is there any better approaches like Table API to solve this problem?
Broadcast state is suitable in cases (like this one) where you need to either share information or invoke actions that aren't keyed, and so cannot be sent to one relevant instance.
If you need to store the broadcast state, keep in mind that each instance will store a copy of the broadcast state on the heap, and include that copy in its checkpoints.
If you are using context.applytokeyedstate, be careful to make changes to the keyed state that are deterministic -- otherwise, in the event of a failure and recovery at a point where some instances of the broadcast operator have applied the changes to keyed state, and other instances have not, you could end up with inconsistencies.

Flink windowAll aggregate than window process?

We are aggregating some data for 1 minute which we then flush onto a file. The data itself is like a map where key is an object and value is also an object.
Since we need to flush the data together hence we are not doing any keyBy and hence are using windowAll.
The problem that we are facing is that we get better throughput if we use window function with ProcessAllWindowFunction and then aggregate in the process call vs when we use aggregate with window function. We are also seeing timeouts in state checkpointing when we use aggregate.
I tried to go through the code base and the only hypothesis I could think of is probably it is easier to checkpoint ListState that process will use vs the AggregateState that aggregate will use.
Is the hypothesis correct? Are we doing something wrong? If not, is there a way to improve the performance on aggregate?
Based on what you've said, I'm going to jump to some conclusions.
I assume you are using the RocksDB state backend, and are aggregating each incoming event into into some sort of collection. In that case, the RocksDB state backend is having to deserialize that collection, add the new event to it, and then re-serialize it -- for every event. This is very expensive.
When you use a ProcessAllWindowFunction, each incoming event is appended to a ListState object, which has a very efficient implementation -- the serialized bytes for the new event are simply appended (the list doesn't have to be deserialized and re-serialized).
Checkpoints are timing out because the throughput is so poor.
Switching to the FsStateBackend would help. Or use a ProcessAllWindowFunction. Or implement your own windowing with a KeyedProcessFunction, and then use ListState or MapState for the aggregation.

Flink window state size and state management

After reading flink's documentation and searching around, i couldn't entirely understand how flink's handles state in its windows.
Lets say i have an hourly tumbling window with an aggregation function that accumulate msgs into some java pojo or scala case class.
Will The size of that window be tied to the number of events entering that window in a single hour, or will it just be tied to the pojo/case class, as im accumalting the events into that object. (e.g if counting 10000 msgs into an integer, will the size be close to 10000 * msg size or size of an int?)
Also, if im using pojos or case classes, does flink handle the state for me (spills to disk if memory exhausted/saves state at check points etc) or must i use flink's state objects for that?
Thanks for your help!
The state size of a window depends on the type of function that you apply. If you apply a ReduceFunction or AggregateFunction, arriving data is immediately aggregated and the window only holds the aggregated value. If you apply a ProcessWindowFunction or WindowFunction, Flink collects all input records and applies the function when time (event or processing time depending on the window type) passes the window's end time.
You can also combine both types of functions, i.e., have an AggregateFunction followed by a ProcessWindowFunction. In that case, arriving records are immediately aggregated and when the window is closed, the aggregation result is passed as single value to the ProcessWindowFunction. This is useful because you have incremental aggregation (due to ReduceFunction / AggregateFunction) but also access to the window metadata like begin and end timestamp (due to ProcessWindowFunction).
How the state is managed depends on the chosen state backend. If you configure the FsStateBackend all local state is kept on the heap of the TaskManager and the JVM process is killed with an OutOfMemoryError if the state grows too large. If you configure the RocksDBStateBackend state is spilled to disk. This comes with de/serialization costs for every state access but gives much more storage for state.

Apache Flink: Is there a way to transform queryable state before it is returned to the client?

Based on my reading of the docs, one must retrieve the entire state value associated with a key when using queryable state. I would like to be able to transform the value on the TaskManager before it is returned to the client, i.e. in the QueryableStateClientProxy or in the QueryableStateServer.
For example, in the case of MapState, it could be useful to be able to retrieve data for a particular key in the map and not have to return the entire MapState to the client (particularly if the MapState is large).
Am I right that there is no way to do this currently? And, if so, does anyone know if this might be on the roadmap somewhere? I see that the query state is marked as beta and may change in the future.
Thanks.
In the current version (Flink 1.7.0), the fetched value cannot be modified before it is returned.
AFAIK, this feature is also not on the roadmap.

Resources