Difference between savepoint and checkpoint in Flink - apache-flink

I know there are similar questions on the stackoverflow,but after investigating several of them, I know
savepoint is triggered manually, while checkpoint is triggered
automatically
They are using different storage format
But these are not the confusing points,I have no idea when to use one or when to use the other.
Consider the following two scenarios:
If I need to shutdown or restart the whole application for some reason(eg bug fix or crash unexpected) , then I will have to use savepoint to restore the whole application?
I thought that checkpoint is only used internally in Flink for fault tolerance when application is running, that is, the application itself is running, but tasks or other things may fail, that is, Flink will use checkpoint for state recovery?
There is also externalized checkpoint, I think it is the same with savepoint in functionality, that is, externalized checkpoint can also be used to recover from a restarted application?

Does Flink use checkpoint for state recovery?
Basically you're right. As you said, the checkpoint is usually used internally in Flink for fault tolerance and it's more like a concept inside the framework. When your application fails, the program will try to restart from the latest checkpoint. That's how checkpoint works in Flink, without any mannual interfering.
Should I use savepoint to restore the whole application for bug fix?
Yes. In these cases, you don't want to restore from the checkpoint because maybe the latest checkpoint occurs several minutes ago. Instead, you'd like to snapshot the current the state of the whole application and restart it from the latest savepoint, which may be the quickest way to restore the application without too much delay.
Externalized checkpoint.
It's still the checkpoint, but will be persisted externally based on your configurations. It can be used to restore the application, but the states are not so real time because there exists an interval between checkpoints.
For more information, take a look at this blog artical: https://data-artisans.com/blog/differences-between-savepoints-and-checkpoints-in-flink.

Related

Apache Flink Checkpoining (Manually put a value into RocksDB Checkpoint and retrieve during recovery or Restart)

We have a scenario where we have to persist/save some value into the checkpoint and retrieve it back during failure recovery/application restart.
We followed a few things like ValueState, ValueStateDescriptor still not working.
https://github.com/realtime-storage-engine/flink-spillable-statebackend/blob/master/flink-spillable-benchmark/src/main/java/org/apache/flink/spillable/benchmark/WordCount.java
https://towardsdatascience.com/heres-how-flink-stores-your-state-7b37fbb60e1a
https://github.com/king/flink-state-cache/blob/master/examples/src/main/java/com/king/flink/state/Example.java
We can't externalize it to a DB as it may cause some performance issues.
Any lead to this will be helpful using checkpoint. How to put and get back from a Checkpoint?
All of your managed application state is automatically written into Flink checkpoints (and savepoints). This includes
keyed state (ValueState, ListState, MapState, etc)
operator state (ListState, BroadcastState, etc)
timers
This state is automatically restored during recovery, and can optionally be restored during manual restarts.
The Flink Operations Playground shows how to work with checkpoints and savepoints, and lets you observe their behavior during failure/recovery and restarts/rescaling.
If you want to read from a checkpoint yourself, that's what the State Processor API is for. Here's an example.

Flink - Lazy start with operators working during savepoint startup

I am using Apache Flink with RocksDBStateBackend and going through some trouble when the job is restarted using a savepoint.
Apparently, it takes some time for the state to be ready again, but even though the state isn't ready yet, DataStreams from Kafka seems to be moving data around, which causes some invalid misses as the state isn't ready yet for my KeyedProcessFunction.
Is it the expected behavior? I couldn't find anything in the documentation, and apparently, no related configuration.
The ideal for us would be to have the state fully ready to be queried before any data is moved.
For example, this shows that during a deployment, the estimate_num_keys metric was slowly increasing.
However, if we look at an application counter from an operator, they were working during that "warm-up phase".
I found some discussion here Apache flink: Lazy load from save point for RocksDB backend where it was suggested to use Externalized Checkpoints.
I will look into it, but currently, our state isn't too big (~150 GB), so I am not sure if that is the only path to try.
Starting a Flink job that uses RocksDB from a savepoint is an expensive operation, as all of the state must first be loaded from the savepoint into new RocksDB instances. On the other hand, if you use a retained, incremental checkpoint, then the SST files in that checkpoint can be used directly by RocksDB, leading to must faster start-up times.
But, while it's normal for starting from a savepoint to be expensive, this shouldn't lead to any errors or dropped data.

Savepoint in Apache Flink with Large State

I want to keep state about 2TB in Flink using the Rocksdb state backend. I will use the incremental checkpoint, thus it will reduce the checkpoint time dramatically.
But I have to change code sometimes, e.g re-scaling, bug fix, adding new filter/mapping, adding new sources/sinks etc.
All of them can affect the job topology. I can bootstrap state again when any changes on state. But other times, bootstrap state could be difficult because that means time waste for me.
In these cases, I have to take a savepoint to restart my job. I also take savepoint periodically while job is running to restart job from the latest savepoint when the job is failed (e.g every 15 minutes). But the time while taking savepoint will be too long due to large state. MTTR (mean time to recovery) is very important for me. How can i improve savepoint performance?
You can use retained checkpoints for redeployments that don't change the topology, require a state migration, or upgrade the Flink version (e.g., rescaling, or simple code changes that don't affect state) -- but otherwise you should use a savepoint. And with large state, that can take quite a while (and I don't have any ideas for how to speed it up).
Rather than trying to improve savepoint performance, you might consider whether some sort of blue/green deployment strategy could work for you. For example, see Zero-downtime upgrades of Flink applications.

Apache Flink - Difference between Checkpoints & Save points?

Can someone please help me understand the difference between Apache Flink's Checkpoints & Savepoints.
While i read the documentation, couldn't understand the difference! :s
Apache Flink's Checkpoints and Savepoints are similar in that way they both are mechanisms for preserving internal state of Flink's applications.
Checkpoints are taken automatically and are used for automatic restarting job in case of a failure.
Savepoints on the other hand are taken manually, are always stored externally and are used for starting a "new" job with previous internal state in case of e.g.
bug fixing
flink version upgrade
A/B testing, etc.
Underneath they are in fact the same mechanism/code path with some subtle nuances.
Edit:
You can also find a very good explanation in the official documentation https://ci.apache.org/projects/flink/flink-docs-stable/ops/state/savepoints.html#what-is-a-savepoint-how-is-a-savepoint-different-from-a-checkpoint :
A Savepoint is a consistent image of the execution state of a streaming job, created via Flink’s checkpointing mechanism. You can use Savepoints to stop-and-resume, fork, or update your Flink jobs. Savepoints consist of two parts: a directory with (typically large) binary files on stable storage (e.g. HDFS, S3, …) and a (relatively small) meta data file. The files on stable storage represent the net data of the job’s execution state image. The meta data file of a Savepoint contains (primarily) pointers to all files on stable storage that are part of the Savepoint, in form of absolute paths.
Attention: In order to allow upgrades between programs and Flink versions, it is important to check out the following section about assigning IDs to your operators.
Conceptually, Flink’s Savepoints are different from Checkpoints in a similar way that backups are different from recovery logs in traditional database systems. The primary purpose of Checkpoints is to provide a recovery mechanism in case of unexpected job failures. A Checkpoint’s lifecycle is managed by Flink, i.e. a Checkpoint is created, owned, and released by Flink - without user interaction. As a method of recovery and being periodically triggered, two main design goals for the Checkpoint implementation are i) being as lightweight to create and ii) being as fast to restore from as possible. Optimizations towards those goals can exploit certain properties, e.g. that the job code doesn’t change between the execution attempts. Checkpoints are usually dropped after the job was terminated by the user (except if explicitly configured as retained Checkpoints).
In contrast to all this, Savepoints are created, owned, and deleted by the user. Their use-case is for planned, manual backup and resume. For example, this could be an update of your Flink version, changing your job graph, changing parallelism, forking a second job like for a red/blue deployment, and so on. Of course, Savepoints must survive job termination. Conceptually, Savepoints can be a bit more expensive to produce and restore and focus more on portability and support for the previously mentioned changes to the job.
Those conceptual differences aside, the current implementations of Checkpoints and Savepoints are basically using the same code and produce the same format. However, there is currently one exception from this, and we might introduce more differences in the future. The exception are incremental checkpoints with the RocksDB state backend. They are using some RocksDB internal format instead of Flink’s native savepoint format. This makes them the first instance of a more lightweight checkpointing mechanism, compared to Savepoints.
Savepoints
Savepoints usually apply to an individual transaction; it marks a
point to which the transaction can be rolled back, so subsequent
changes can be undone if necessary.
More See Here
https://ci.apache.org/projects/flink/flink-docs-release-1.2/setup/cli.html#savepoints
Checkpoints
Checkpoints usually apply to whole systems, You can configure periodic checkpoints to be persisted externally. Externalized checkpoints write their meta data out to persistent storage and are not automatically cleaned up when the job fails.
More See Here:
https://ci.apache.org/projects/flink/flink-docs-release-1.2/setup/checkpoints.html
On difference I would like to add is savepoint can be manually applied when we upgrade the pipeline vs checkpoint kicks in as useful in case the pipeline restarts or crashes abruptly. However, there could be side effects to later where application(pipeline) has to handle any scenarios like re-processing duplicate data etc.

When are flink checkpoint files cleaned?

I have a streaming job that:
reads from Kafka --> maps events to some other DataStream --> key by(0) --> reduces a time window of 15 seconds processing time and writes back to a Redis sink.
When starting up, everything works great. The problem is, that after a while, the disk space get's full by what I think are links checkpoints.
My question is, are the checkpoints supposed to be cleaned/deleted while the link job is running? could not find any resources on this.
I'm using a filesystem backend that writes to /tmp (no hdfs setup)
Flink cleans up checkpoint files while it is running. There were some corner cases where it "forgot" to clean up all files in case of system failures.
But for Flink 1.3 the community is working on fixing all these issues.
In your case, I'm assuming that you don't have enough disk space to store the data of your windows on disk.
Checkpoints are by default not persisted externally and are only used to resume a job from failures. They are deleted when a program is cancelled.
If you are taking externalized checkpoints, then it has two policy
ExternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION: Retain the externalized checkpoint when the job is cancelled. Note that you have to manually clean up the checkpoint state after cancellation in this case.
ExternalizedCheckpointCleanup.DELETE_ON_CANCELLATION: Delete the externalized checkpoint when the job is cancelled. The checkpoint state will only be available if the job fails.
For more details
https://ci.apache.org/projects/flink/flink-docs-release-1.4/ops/state/checkpoints.html

Resources