Flink keyed state clean up for incremental rocksdb checkpointing

Flink keyed state clean up for incremental rocksdb checkpointing - apache-flink

We have a flink job that would persist large keyed state in rocksdb backend. We are using incremental checkpointing strategy. As time goes by, the size of the state become a problem. We have checked the state ttl solution but it does not support incremental rocksdb states.
What would be the best approach for this problem if I really need incremental checkpoint?

One approach that is often used is to manipulate the state in some kind of ProcessFunction, and use a timer to clear the state when it is no longer needed -- e.g., if it hasn't been accessed for several hours. ProcessFunctions are able to have both event-time and processing-time timers, so you can choose whichever is more appropriate for your use case.
See the expiring state exercise on the Flink training site for an example of using timers to clear state.

Related

What is the difference between incremental checkpoint and change log statebackend in Flink?

I am interested in processing large state using Flink.
To resolve this issue, there are some ways to handle it such as incremental checkpoint and others.
I understand its concept via the Flink document.
And also I found that there is change log statebackend which is introduced in Flink 1.16.
I think that chanage log state backend can reduce the time of taking snapshot by capturing the difference between current and previous one.
But I am a little bit confused and cannot fully understand the difference between incremental checkpoint and the change log state backend.
I want to use two methods to process large state in Flink, but want to understand its nature of origin. Although I read the articles provided in the Flink document, but not fully differentiate between the incremental checkpoint and the changelog state backend.
Is it almost same except that incremental checkpoint is focused on checkpoint mechanism while the change log state back end is focused on snapshot?
Any comments will be appreciated.

Apache Flink Checkpoining (Manually put a value into RocksDB Checkpoint and retrieve during recovery or Restart)

We have a scenario where we have to persist/save some value into the checkpoint and retrieve it back during failure recovery/application restart.
We followed a few things like ValueState, ValueStateDescriptor still not working.
https://github.com/realtime-storage-engine/flink-spillable-statebackend/blob/master/flink-spillable-benchmark/src/main/java/org/apache/flink/spillable/benchmark/WordCount.java
https://towardsdatascience.com/heres-how-flink-stores-your-state-7b37fbb60e1a
https://github.com/king/flink-state-cache/blob/master/examples/src/main/java/com/king/flink/state/Example.java
We can't externalize it to a DB as it may cause some performance issues.
Any lead to this will be helpful using checkpoint. How to put and get back from a Checkpoint?

All of your managed application state is automatically written into Flink checkpoints (and savepoints). This includes
keyed state (ValueState, ListState, MapState, etc)
operator state (ListState, BroadcastState, etc)
timers
This state is automatically restored during recovery, and can optionally be restored during manual restarts.
The Flink Operations Playground shows how to work with checkpoints and savepoints, and lets you observe their behavior during failure/recovery and restarts/rescaling.
If you want to read from a checkpoint yourself, that's what the State Processor API is for. Here's an example.

Are there limitations in using a State in Apache Flink?

Apache Flink allows me to use a State in a RichMapFunction. I am planning to build a continuously running job which analyses a stream of web events. Part of the processing will be the creation of a session context with session scoped metrics (like nth of the session, duration etc) and additionally a user context.
A session context will timeout after 30 minutes, but a user context may exist for a year to handle returning users.
There will be millions of sessions and users so I would end up in millions of individual states. Every state is just a few KB in size.
Is this something that can be handled properly with the Flink states?
How is Flink actually cleaning up deprecated states?
Would it make sense to think about providing a custom backend to store the state in a KV cluster?

For large state I would recommend using Flink's RocksDBStateBackend. This state backend uses RocksDB to store state. Since RocksDB gracefully spills to disk, it is only limited by your available disk space. Thus, Flink should be able to handle your use case.
At the moment you need to register timers to clean up state. However, with the next Flink release, the community will add clean up for state with TTL. This will then automatically clean up your state when it is expired.
Keeping your state close to your computation with periodic checkpoints which are persisted will keep your application fast. If every state access went to a remote KV cluster, it would considerably slow down the processing.

Removing TTL expired keys in Flink Mapstate

I need the ability to remove old keys from map state which are older than a fixed amount of time.
I currently keep the timestamps of each event in the the key state map, and I'd like to have an ansyncronous process which will remove these stale keys.
I'm using RocksDB as state backend, and I don't think that the Java API of RocksDB supports the open with TTL as noted here.
So my questions are:
Is it at all possible to have an async thread that has access to the Mapstate since it runs in an operator function?
Is there a better practice in this case?
Thanks in advance,

One straightforward approach for expiring state in Flink is to use a ProcessFunction operator to hold the state. You can then use a timer (either a processing time timer or an event time timer, depending on what makes sense for your application) and clear the state in the onTimer method.

As Flink 1.6.0 version, state TTL feature has been implemented. It allows to explicitly define TTL for records in state backend. The catch is that removal of key happens lazy when keys are getting read. If the key is not accessed it will stay there. The limitation most likely be removed in future version.
State Time-To-Live (TTL) Flink Documentation
State TTL for Apache Flink: How to Limit the Lifetime of State

Manual checkpoint from Flink stream

Is it possible to trigger checkpoint from Flink streaming job?
My use case is that: I have two streams R and S to join with tumbling time windows. The source is Kafka. I use event time processing and BoundedOutOfOrdernessGenerator to make sure events from two streams end up in the same window.
The problem is my states are large and a regular periodic checkpoint takes too much time sometimes. At first, I wanted to disable checkpointing and rely on Kafka offset. But out of orderness means I have already some data in future windows from current offset. So I need checkpointing.
If it was possible to trigger checkpoints after a window gets cleaned instead of periodic ones it would be more efficient. Maybe at evictAfter method.
Does that make sense and is it possible? IF not I'd appreciate a work around.

Seems the issue here is checkpoint efficiency. Consider using the RocksDB state backend with incremental checkpoints, discussed in the docs under Debugging and Tuning Checkpoints and Large State.