Flink add a TTL to an existing value state - apache-flink

For one of our Flink jobs, we found a state causing a state leak. To fix this we need to add a TTL to the state causing the leak, however, we would like to keep existing state(savepoint). If we add a TTL to a value state would we be able to use the existing savepoint? Thank you.

No, according to the docs this won't work:
Trying to restore state, which was previously configured without TTL, using TTL enabled descriptor or vice versa will lead to compatibility failure and StateMigrationException.
However, you may be able to use the state processor API to accomplish this.
However, exactly how you should handle this depends on what kind of state it is, how it was serialized, and whether the operator has a UID.

Related

Cleanup configuration for ProcessWindowFunction's window state without TTL with RocksDB as backend

Flink offers TTL configuration for managed state and,
when using RocksDB as backend,
it executes cleanup in a custom compaction filter
(if I understand correctly).
However, in the case of keyed windowed state in a ProcessWindowFunction,
the expectation is that we override the clear method and explicitly call something like
context.windowState().*.clear()
If the state descriptor does not configure TTL,
does cleanup still occur after the clear callback?
If not, and cleanup for this type of state depends solely on sizes in RocksDB's levels,
what's the default setting and is it configurable?
If the state descriptor does not configure TTL, does cleanup still occur after the clear callback?
Yes, unless the state descriptor was used to create state stored in KeyedStateStore ProcessWindowFunction.Context#globalState. This global state is the only state that is kept after windows are cleared. If you have an ever-growing key space, you should configure state TTL for any globalState you use, as otherwise globalState for stale keys will never be cleaned up.
FWIW, there's nothing RocksDB-specific about this. The answer is the same for any of the state backends.

How to process an already available state based on an event comes in a different stream in flink

We are working on deriving the status of accounts based on the activity on it. We calculate and keep the expiryOn date(which says the tentative, future date on which account expires) based on the user activity on the account.
We have a manual date change event which gives a date based on which the status of the account is emitted as Expired.
I would like to know on what would be the best way to achieve this.
So, my question is since the date change event occurs in future when compared to the calculation of the expiryOn date, can the broadcasted state be a solution for this? If yes, please suggest the way.
Or, is there any better approaches like Table API to solve this problem?
Broadcast state is suitable in cases (like this one) where you need to either share information or invoke actions that aren't keyed, and so cannot be sent to one relevant instance.
If you need to store the broadcast state, keep in mind that each instance will store a copy of the broadcast state on the heap, and include that copy in its checkpoints.
If you are using context.applytokeyedstate, be careful to make changes to the keyed state that are deterministic -- otherwise, in the event of a failure and recovery at a point where some instances of the broadcast operator have applied the changes to keyed state, and other instances have not, you could end up with inconsistencies.

Flink window aggregation with state

I would like to do a window aggregation with an early trigger logic (you can think that the aggregation is triggered either by window is closed, or by a specific event), and I read on the doc: https://ci.apache.org/projects/flink/flink-docs-release-1.12/dev/stream/operators/windows.html#incremental-window-aggregation-with-aggregatefunction
The doc mentioned that Note that using ProcessWindowFunction for simple aggregates such as count is quite inefficient. so the suggestion is to pair with incremental window aggregation.
My question is that AverageAggregate in the doc, the state is not saved anywhere, so if the application crashed, the averageAggregate will loose all the intermediate value, right?
So If that is the case, is there a way to do a window aggregation, still supports incremental aggregation, and has a state backend to recover from crash?
The AggregateFunction is indeed only describing the mechanism for combining the input events into some result, that specific class does not store any data.
The state is persisted for us by Flink behind the scene though, when we write something like this:
input
.keyBy(<key selector>)
.window(<window assigner>)
.aggregate(new AverageAggregate(), new MyProcessWindowFunction());
the .keyBy(<key selector>).window(<window assigner>) is indicating to Flink to hold a piece of state for us for each key and time bucket, and to call our code in AverageAggregate() and MyProcessWindowFunction() when relevant.
In case of crash or restart, no data is lost (assuming state backend are configured properly): as with other parts of Flink state, the state here will either be retrieved from the state backend or recomputed from first principles from upstream data.

Can we update a state's TTL value?

We have a topology that uses states (ValueState and ListState) with TTL(StateTtlConfig) because we can not use Timers (We would generate hundred of millions of timers per day, and it does scale : a savepoint/checkpoint would take hours to be generated and might even get stuck while running).
However we need to update the value of the TTL at runtime depending of the type of some incoming events and other logic. Is this alright to recreate a new state with a new StateTtlConfig (and updated TTL time) and copy the values from "old" to "new in the processElement1() and processElement2() methods of a CoProcessFunction (instead of once in the open() like we usually do) ?
I guess the "old" state would be garbage collected (?).
Would this solution scale? be performant? generate any issue? anything bad?
I think your approach can work with the state re-creation in runtime to some extent but it is brittle. The problem, I can see, is that the old state meta information can linger somewhere depending on backend implementation.
For Heap (FS) backend, eventually the checkpoint/savepoint will have no records for the expired old state but the meta info can linger in memory while the job is running. It will go away if the job is restarted.
For RocksDB, the column family of the old state can linger. Moreover, the background cleanup runs only during compaction. If the table is too small, like the part which is in memory, this part (maybe even a bit on disk) will linger. It will go away after restart if cleanup on full snapshot is active (not for incremental checkpoints).
All in all, it depends on how often you have to create the new state and restart your job from savepoint/checkpoint.
I created a ticket to document what can be changed in TTL config and when,
so check some details in the issue.
I guess the "old" state would be garbage collected (?).
from the Flink documentation Cleanup of Expired State.
By default, expired values are explicitly removed on read, such as
ValueState#value, and periodically garbage collected in the background
if supported by the configured state backend. Background cleanup can
be disabled in the StateTtlConfig:
import org.apache.flink.api.common.state.StateTtlConfig;
StateTtlConfig ttlConfig = StateTtlConfig
.newBuilder(Time.seconds(1))
.disableCleanupInBackground()
.build();
or execute the clean up after a full snapshot:
import org.apache.flink.api.common.state.StateTtlConfig;
import org.apache.flink.api.common.time.Time;
StateTtlConfig ttlConfig = StateTtlConfig
.newBuilder(Time.seconds(1))
.cleanupFullSnapshot()
.build();
you can change the TTL at anytime according to the documentation. However, you have to restart the query (it I snot in run-time):
For existing jobs, this cleanup strategy can be activated or
deactivated anytime in StateTtlConfig, e.g. after restart from
savepoint.
But why don'y you see the timers on RocksDB like David said on the referenced answer?

Apache Flink: Is there a way to transform queryable state before it is returned to the client?

Based on my reading of the docs, one must retrieve the entire state value associated with a key when using queryable state. I would like to be able to transform the value on the TaskManager before it is returned to the client, i.e. in the QueryableStateClientProxy or in the QueryableStateServer.
For example, in the case of MapState, it could be useful to be able to retrieve data for a particular key in the map and not have to return the entire MapState to the client (particularly if the MapState is large).
Am I right that there is no way to do this currently? And, if so, does anyone know if this might be on the roadmap somewhere? I see that the query state is marked as beta and may change in the future.
Thanks.
In the current version (Flink 1.7.0), the fetched value cannot be modified before it is returned.
AFAIK, this feature is also not on the roadmap.

Resources