What is the difference between incremental checkpoint and change log statebackend in Flink? - apache-flink

I am interested in processing large state using Flink.
To resolve this issue, there are some ways to handle it such as incremental checkpoint and others.
I understand its concept via the Flink document.
And also I found that there is change log statebackend which is introduced in Flink 1.16.
I think that chanage log state backend can reduce the time of taking snapshot by capturing the difference between current and previous one.
But I am a little bit confused and cannot fully understand the difference between incremental checkpoint and the change log state backend.
I want to use two methods to process large state in Flink, but want to understand its nature of origin. Although I read the articles provided in the Flink document, but not fully differentiate between the incremental checkpoint and the changelog state backend.
Is it almost same except that incremental checkpoint is focused on checkpoint mechanism while the change log state back end is focused on snapshot?
Any comments will be appreciated.

Related

Apache Flink Checkpoining (Manually put a value into RocksDB Checkpoint and retrieve during recovery or Restart)

We have a scenario where we have to persist/save some value into the checkpoint and retrieve it back during failure recovery/application restart.
We followed a few things like ValueState, ValueStateDescriptor still not working.
https://github.com/realtime-storage-engine/flink-spillable-statebackend/blob/master/flink-spillable-benchmark/src/main/java/org/apache/flink/spillable/benchmark/WordCount.java
https://towardsdatascience.com/heres-how-flink-stores-your-state-7b37fbb60e1a
https://github.com/king/flink-state-cache/blob/master/examples/src/main/java/com/king/flink/state/Example.java
We can't externalize it to a DB as it may cause some performance issues.
Any lead to this will be helpful using checkpoint. How to put and get back from a Checkpoint?
All of your managed application state is automatically written into Flink checkpoints (and savepoints). This includes
keyed state (ValueState, ListState, MapState, etc)
operator state (ListState, BroadcastState, etc)
timers
This state is automatically restored during recovery, and can optionally be restored during manual restarts.
The Flink Operations Playground shows how to work with checkpoints and savepoints, and lets you observe their behavior during failure/recovery and restarts/rescaling.
If you want to read from a checkpoint yourself, that's what the State Processor API is for. Here's an example.

Flink - Lazy start with operators working during savepoint startup

I am using Apache Flink with RocksDBStateBackend and going through some trouble when the job is restarted using a savepoint.
Apparently, it takes some time for the state to be ready again, but even though the state isn't ready yet, DataStreams from Kafka seems to be moving data around, which causes some invalid misses as the state isn't ready yet for my KeyedProcessFunction.
Is it the expected behavior? I couldn't find anything in the documentation, and apparently, no related configuration.
The ideal for us would be to have the state fully ready to be queried before any data is moved.
For example, this shows that during a deployment, the estimate_num_keys metric was slowly increasing.
However, if we look at an application counter from an operator, they were working during that "warm-up phase".
I found some discussion here Apache flink: Lazy load from save point for RocksDB backend where it was suggested to use Externalized Checkpoints.
I will look into it, but currently, our state isn't too big (~150 GB), so I am not sure if that is the only path to try.
Starting a Flink job that uses RocksDB from a savepoint is an expensive operation, as all of the state must first be loaded from the savepoint into new RocksDB instances. On the other hand, if you use a retained, incremental checkpoint, then the SST files in that checkpoint can be used directly by RocksDB, leading to must faster start-up times.
But, while it's normal for starting from a savepoint to be expensive, this shouldn't lead to any errors or dropped data.

Flink keyed state clean up for incremental rocksdb checkpointing

We have a flink job that would persist large keyed state in rocksdb backend. We are using incremental checkpointing strategy. As time goes by, the size of the state become a problem. We have checked the state ttl solution but it does not support incremental rocksdb states.
What would be the best approach for this problem if I really need incremental checkpoint?
One approach that is often used is to manipulate the state in some kind of ProcessFunction, and use a timer to clear the state when it is no longer needed -- e.g., if it hasn't been accessed for several hours. ProcessFunctions are able to have both event-time and processing-time timers, so you can choose whichever is more appropriate for your use case.
See the expiring state exercise on the Flink training site for an example of using timers to clear state.

Link map state size and number of keys

I was wondering if there is a way to retrieve the total state size stored in the state backend.
I am currently using flink 1.3, above emr and rocksdb backend, with asynchornous checkpointing and incremental checkpoints.
The flink dashboard under "checkpoints" displays the state size but I assume that due to the fact that I'm using incremental checkpoints I see in the checkpoint history page a fluctuation in state size.
The only current way I use and I'm not sure is a fit, is with "ls" command on the hdfs checkpoint location.
I assume there is a better way, and would appreciate your help
Currently one of the things that may help You with Your problem is Queryable State which allows You to basically query the state of the operator. But it is not available through the WEB-UI but you need to create a separate Queryable State Client for this. More info can be found here

Manual checkpoint from Flink stream

Is it possible to trigger checkpoint from Flink streaming job?
My use case is that: I have two streams R and S to join with tumbling time windows. The source is Kafka. I use event time processing and BoundedOutOfOrdernessGenerator to make sure events from two streams end up in the same window.
The problem is my states are large and a regular periodic checkpoint takes too much time sometimes. At first, I wanted to disable checkpointing and rely on Kafka offset. But out of orderness means I have already some data in future windows from current offset. So I need checkpointing.
If it was possible to trigger checkpoints after a window gets cleaned instead of periodic ones it would be more efficient. Maybe at evictAfter method.
Does that make sense and is it possible? IF not I'd appreciate a work around.
Seems the issue here is checkpoint efficiency. Consider using the RocksDB state backend with incremental checkpoints, discussed in the docs under Debugging and Tuning Checkpoints and Large State.

Resources