Savepoint in Apache Flink with Large State - apache-flink

I want to keep state about 2TB in Flink using the Rocksdb state backend. I will use the incremental checkpoint, thus it will reduce the checkpoint time dramatically.
But I have to change code sometimes, e.g re-scaling, bug fix, adding new filter/mapping, adding new sources/sinks etc.
All of them can affect the job topology. I can bootstrap state again when any changes on state. But other times, bootstrap state could be difficult because that means time waste for me.
In these cases, I have to take a savepoint to restart my job. I also take savepoint periodically while job is running to restart job from the latest savepoint when the job is failed (e.g every 15 minutes). But the time while taking savepoint will be too long due to large state. MTTR (mean time to recovery) is very important for me. How can i improve savepoint performance?

You can use retained checkpoints for redeployments that don't change the topology, require a state migration, or upgrade the Flink version (e.g., rescaling, or simple code changes that don't affect state) -- but otherwise you should use a savepoint. And with large state, that can take quite a while (and I don't have any ideas for how to speed it up).
Rather than trying to improve savepoint performance, you might consider whether some sort of blue/green deployment strategy could work for you. For example, see Zero-downtime upgrades of Flink applications.

Related

What is the difference between incremental checkpoint and change log statebackend in Flink?

I am interested in processing large state using Flink.
To resolve this issue, there are some ways to handle it such as incremental checkpoint and others.
I understand its concept via the Flink document.
And also I found that there is change log statebackend which is introduced in Flink 1.16.
I think that chanage log state backend can reduce the time of taking snapshot by capturing the difference between current and previous one.
But I am a little bit confused and cannot fully understand the difference between incremental checkpoint and the change log state backend.
I want to use two methods to process large state in Flink, but want to understand its nature of origin. Although I read the articles provided in the Flink document, but not fully differentiate between the incremental checkpoint and the changelog state backend.
Is it almost same except that incremental checkpoint is focused on checkpoint mechanism while the change log state back end is focused on snapshot?
Any comments will be appreciated.

Apache Flink Checkpoining (Manually put a value into RocksDB Checkpoint and retrieve during recovery or Restart)

We have a scenario where we have to persist/save some value into the checkpoint and retrieve it back during failure recovery/application restart.
We followed a few things like ValueState, ValueStateDescriptor still not working.
https://github.com/realtime-storage-engine/flink-spillable-statebackend/blob/master/flink-spillable-benchmark/src/main/java/org/apache/flink/spillable/benchmark/WordCount.java
https://towardsdatascience.com/heres-how-flink-stores-your-state-7b37fbb60e1a
https://github.com/king/flink-state-cache/blob/master/examples/src/main/java/com/king/flink/state/Example.java
We can't externalize it to a DB as it may cause some performance issues.
Any lead to this will be helpful using checkpoint. How to put and get back from a Checkpoint?
All of your managed application state is automatically written into Flink checkpoints (and savepoints). This includes
keyed state (ValueState, ListState, MapState, etc)
operator state (ListState, BroadcastState, etc)
timers
This state is automatically restored during recovery, and can optionally be restored during manual restarts.
The Flink Operations Playground shows how to work with checkpoints and savepoints, and lets you observe their behavior during failure/recovery and restarts/rescaling.
If you want to read from a checkpoint yourself, that's what the State Processor API is for. Here's an example.

Which Flink deleteMode to use in production?

Currently we're using the Savepoint deleteMode, which writes a savepoint on shutdown. Unfortunately sometimes Flink crash loops, i.e. when restarting it isn't able to write the savepoint so it repeatedly tries to restart. In this case we manually change the deleteMode to None and restart the application. Are savepoints recommended, or are checkpoints sufficient for Flink to self-recover? I don't think we've ever manually recovered from a savepoint.
If you arrange for the checkpoints to be retained, as in
CheckpointConfig config = env.getCheckpointConfig();
config.enableExternalizedCheckpoints(
CheckpointConfig.ExternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION);
then you can rely on them for restarts and rescaling. But for re-deployments that require state migration or a topology change, or if you are doing a Flink version upgrade, then savepoints are recommended.
The operational capabilities and guarantees of both checkpoints and savepoints are covered in more detail in the Flink documentation.

Flink - Lazy start with operators working during savepoint startup

I am using Apache Flink with RocksDBStateBackend and going through some trouble when the job is restarted using a savepoint.
Apparently, it takes some time for the state to be ready again, but even though the state isn't ready yet, DataStreams from Kafka seems to be moving data around, which causes some invalid misses as the state isn't ready yet for my KeyedProcessFunction.
Is it the expected behavior? I couldn't find anything in the documentation, and apparently, no related configuration.
The ideal for us would be to have the state fully ready to be queried before any data is moved.
For example, this shows that during a deployment, the estimate_num_keys metric was slowly increasing.
However, if we look at an application counter from an operator, they were working during that "warm-up phase".
I found some discussion here Apache flink: Lazy load from save point for RocksDB backend where it was suggested to use Externalized Checkpoints.
I will look into it, but currently, our state isn't too big (~150 GB), so I am not sure if that is the only path to try.
Starting a Flink job that uses RocksDB from a savepoint is an expensive operation, as all of the state must first be loaded from the savepoint into new RocksDB instances. On the other hand, if you use a retained, incremental checkpoint, then the SST files in that checkpoint can be used directly by RocksDB, leading to must faster start-up times.
But, while it's normal for starting from a savepoint to be expensive, this shouldn't lead to any errors or dropped data.

Difference between savepoint and checkpoint in Flink

I know there are similar questions on the stackoverflow,but after investigating several of them, I know
savepoint is triggered manually, while checkpoint is triggered
automatically
They are using different storage format
But these are not the confusing points,I have no idea when to use one or when to use the other.
Consider the following two scenarios:
If I need to shutdown or restart the whole application for some reason(eg bug fix or crash unexpected) , then I will have to use savepoint to restore the whole application?
I thought that checkpoint is only used internally in Flink for fault tolerance when application is running, that is, the application itself is running, but tasks or other things may fail, that is, Flink will use checkpoint for state recovery?
There is also externalized checkpoint, I think it is the same with savepoint in functionality, that is, externalized checkpoint can also be used to recover from a restarted application?
Does Flink use checkpoint for state recovery?
Basically you're right. As you said, the checkpoint is usually used internally in Flink for fault tolerance and it's more like a concept inside the framework. When your application fails, the program will try to restart from the latest checkpoint. That's how checkpoint works in Flink, without any mannual interfering.
Should I use savepoint to restore the whole application for bug fix?
Yes. In these cases, you don't want to restore from the checkpoint because maybe the latest checkpoint occurs several minutes ago. Instead, you'd like to snapshot the current the state of the whole application and restart it from the latest savepoint, which may be the quickest way to restore the application without too much delay.
Externalized checkpoint.
It's still the checkpoint, but will be persisted externally based on your configurations. It can be used to restore the application, but the states are not so real time because there exists an interval between checkpoints.
For more information, take a look at this blog artical: https://data-artisans.com/blog/differences-between-savepoints-and-checkpoints-in-flink.

Resources