Which Flink deleteMode to use in production? - apache-flink

Currently we're using the Savepoint deleteMode, which writes a savepoint on shutdown. Unfortunately sometimes Flink crash loops, i.e. when restarting it isn't able to write the savepoint so it repeatedly tries to restart. In this case we manually change the deleteMode to None and restart the application. Are savepoints recommended, or are checkpoints sufficient for Flink to self-recover? I don't think we've ever manually recovered from a savepoint.

If you arrange for the checkpoints to be retained, as in
CheckpointConfig config = env.getCheckpointConfig();
config.enableExternalizedCheckpoints(
CheckpointConfig.ExternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION);
then you can rely on them for restarts and rescaling. But for re-deployments that require state migration or a topology change, or if you are doing a Flink version upgrade, then savepoints are recommended.
The operational capabilities and guarantees of both checkpoints and savepoints are covered in more detail in the Flink documentation.

Related

Does Flink RocksDB statebackend help restoring state?

I'm considering using RocksDB as a statebackend of flink job which has state size up to 1TB.
My environment
checkpoint dir: hdfs
flink job submit: yarn-per-job (per-job mode on yarn cluster)
If the job fails and retry attempts exceed maximum retry count and the job completely dies (or canceling the job), I think the checkpoint and the rocksdb file will be deleted(because I'm deploying job as per-job-mode and the task manager would also terminate).
Here, I think I lose all state and have no way to restore the state but I expect using RocksDB would help something to restore the state because it is a disk based statebackend. If not, what is the advantage of using RocksDB statebackend?
Would retaining the checkpoint on cancellation and restart the job from the checkpoint(or savepoint) help in this case?
Thank you
I would recommend to check out https://nightlies.apache.org/flink/flink-docs-master/docs/ops/production_ready/ for an overview of steps to consider before putting a Flink application in production. Choosing the right state backend is one of them.
What is important for state recovery is that you enable the snapshotting mechanism. That can be either checkpoints or savepoints, which you use with the configured state backend (like RocksDB). When configured properly, your state will be snapshotted to a durable storage, so you can recover from it in case of failures. RocksDB is commonly used for large state sizes, which can't fit into memory anymore.

Apache Flink Checkpoining (Manually put a value into RocksDB Checkpoint and retrieve during recovery or Restart)

We have a scenario where we have to persist/save some value into the checkpoint and retrieve it back during failure recovery/application restart.
We followed a few things like ValueState, ValueStateDescriptor still not working.
https://github.com/realtime-storage-engine/flink-spillable-statebackend/blob/master/flink-spillable-benchmark/src/main/java/org/apache/flink/spillable/benchmark/WordCount.java
https://towardsdatascience.com/heres-how-flink-stores-your-state-7b37fbb60e1a
https://github.com/king/flink-state-cache/blob/master/examples/src/main/java/com/king/flink/state/Example.java
We can't externalize it to a DB as it may cause some performance issues.
Any lead to this will be helpful using checkpoint. How to put and get back from a Checkpoint?
All of your managed application state is automatically written into Flink checkpoints (and savepoints). This includes
keyed state (ValueState, ListState, MapState, etc)
operator state (ListState, BroadcastState, etc)
timers
This state is automatically restored during recovery, and can optionally be restored during manual restarts.
The Flink Operations Playground shows how to work with checkpoints and savepoints, and lets you observe their behavior during failure/recovery and restarts/rescaling.
If you want to read from a checkpoint yourself, that's what the State Processor API is for. Here's an example.

Savepoint in Apache Flink with Large State

I want to keep state about 2TB in Flink using the Rocksdb state backend. I will use the incremental checkpoint, thus it will reduce the checkpoint time dramatically.
But I have to change code sometimes, e.g re-scaling, bug fix, adding new filter/mapping, adding new sources/sinks etc.
All of them can affect the job topology. I can bootstrap state again when any changes on state. But other times, bootstrap state could be difficult because that means time waste for me.
In these cases, I have to take a savepoint to restart my job. I also take savepoint periodically while job is running to restart job from the latest savepoint when the job is failed (e.g every 15 minutes). But the time while taking savepoint will be too long due to large state. MTTR (mean time to recovery) is very important for me. How can i improve savepoint performance?
You can use retained checkpoints for redeployments that don't change the topology, require a state migration, or upgrade the Flink version (e.g., rescaling, or simple code changes that don't affect state) -- but otherwise you should use a savepoint. And with large state, that can take quite a while (and I don't have any ideas for how to speed it up).
Rather than trying to improve savepoint performance, you might consider whether some sort of blue/green deployment strategy could work for you. For example, see Zero-downtime upgrades of Flink applications.

Difference between savepoint and checkpoint in Flink

I know there are similar questions on the stackoverflow,but after investigating several of them, I know
savepoint is triggered manually, while checkpoint is triggered
automatically
They are using different storage format
But these are not the confusing points,I have no idea when to use one or when to use the other.
Consider the following two scenarios:
If I need to shutdown or restart the whole application for some reason(eg bug fix or crash unexpected) , then I will have to use savepoint to restore the whole application?
I thought that checkpoint is only used internally in Flink for fault tolerance when application is running, that is, the application itself is running, but tasks or other things may fail, that is, Flink will use checkpoint for state recovery?
There is also externalized checkpoint, I think it is the same with savepoint in functionality, that is, externalized checkpoint can also be used to recover from a restarted application?
Does Flink use checkpoint for state recovery?
Basically you're right. As you said, the checkpoint is usually used internally in Flink for fault tolerance and it's more like a concept inside the framework. When your application fails, the program will try to restart from the latest checkpoint. That's how checkpoint works in Flink, without any mannual interfering.
Should I use savepoint to restore the whole application for bug fix?
Yes. In these cases, you don't want to restore from the checkpoint because maybe the latest checkpoint occurs several minutes ago. Instead, you'd like to snapshot the current the state of the whole application and restart it from the latest savepoint, which may be the quickest way to restore the application without too much delay.
Externalized checkpoint.
It's still the checkpoint, but will be persisted externally based on your configurations. It can be used to restore the application, but the states are not so real time because there exists an interval between checkpoints.
For more information, take a look at this blog artical: https://data-artisans.com/blog/differences-between-savepoints-and-checkpoints-in-flink.

Apache Flink - Difference between Checkpoints & Save points?

Can someone please help me understand the difference between Apache Flink's Checkpoints & Savepoints.
While i read the documentation, couldn't understand the difference! :s
Apache Flink's Checkpoints and Savepoints are similar in that way they both are mechanisms for preserving internal state of Flink's applications.
Checkpoints are taken automatically and are used for automatic restarting job in case of a failure.
Savepoints on the other hand are taken manually, are always stored externally and are used for starting a "new" job with previous internal state in case of e.g.
bug fixing
flink version upgrade
A/B testing, etc.
Underneath they are in fact the same mechanism/code path with some subtle nuances.
Edit:
You can also find a very good explanation in the official documentation https://ci.apache.org/projects/flink/flink-docs-stable/ops/state/savepoints.html#what-is-a-savepoint-how-is-a-savepoint-different-from-a-checkpoint :
A Savepoint is a consistent image of the execution state of a streaming job, created via Flink’s checkpointing mechanism. You can use Savepoints to stop-and-resume, fork, or update your Flink jobs. Savepoints consist of two parts: a directory with (typically large) binary files on stable storage (e.g. HDFS, S3, …) and a (relatively small) meta data file. The files on stable storage represent the net data of the job’s execution state image. The meta data file of a Savepoint contains (primarily) pointers to all files on stable storage that are part of the Savepoint, in form of absolute paths.
Attention: In order to allow upgrades between programs and Flink versions, it is important to check out the following section about assigning IDs to your operators.
Conceptually, Flink’s Savepoints are different from Checkpoints in a similar way that backups are different from recovery logs in traditional database systems. The primary purpose of Checkpoints is to provide a recovery mechanism in case of unexpected job failures. A Checkpoint’s lifecycle is managed by Flink, i.e. a Checkpoint is created, owned, and released by Flink - without user interaction. As a method of recovery and being periodically triggered, two main design goals for the Checkpoint implementation are i) being as lightweight to create and ii) being as fast to restore from as possible. Optimizations towards those goals can exploit certain properties, e.g. that the job code doesn’t change between the execution attempts. Checkpoints are usually dropped after the job was terminated by the user (except if explicitly configured as retained Checkpoints).
In contrast to all this, Savepoints are created, owned, and deleted by the user. Their use-case is for planned, manual backup and resume. For example, this could be an update of your Flink version, changing your job graph, changing parallelism, forking a second job like for a red/blue deployment, and so on. Of course, Savepoints must survive job termination. Conceptually, Savepoints can be a bit more expensive to produce and restore and focus more on portability and support for the previously mentioned changes to the job.
Those conceptual differences aside, the current implementations of Checkpoints and Savepoints are basically using the same code and produce the same format. However, there is currently one exception from this, and we might introduce more differences in the future. The exception are incremental checkpoints with the RocksDB state backend. They are using some RocksDB internal format instead of Flink’s native savepoint format. This makes them the first instance of a more lightweight checkpointing mechanism, compared to Savepoints.
Savepoints
Savepoints usually apply to an individual transaction; it marks a
point to which the transaction can be rolled back, so subsequent
changes can be undone if necessary.
More See Here
https://ci.apache.org/projects/flink/flink-docs-release-1.2/setup/cli.html#savepoints
Checkpoints
Checkpoints usually apply to whole systems, You can configure periodic checkpoints to be persisted externally. Externalized checkpoints write their meta data out to persistent storage and are not automatically cleaned up when the job fails.
More See Here:
https://ci.apache.org/projects/flink/flink-docs-release-1.2/setup/checkpoints.html
On difference I would like to add is savepoint can be manually applied when we upgrade the pipeline vs checkpoint kicks in as useful in case the pipeline restarts or crashes abruptly. However, there could be side effects to later where application(pipeline) has to handle any scenarios like re-processing duplicate data etc.

Resources