Apache Flink: restoring state from checkpoint with changes Kafka topic - apache-flink

I faced with unexpected behavior when need start job from checkpoint and change Kafka topic. In this case Flink restore state for Kafka Consumer with early defined topic, last committed offset and consumer group id, as a result, Kafka Consumer starts consuming messages from two topics, the former one, which was restored from the state and the new one, defined in the configuration at the start of the job.
It's very confusing, and in the end, it's not entirely clear if it's a bug or a feature? Is there a way to manage recovery jobs from a checkpoint and at the same time not restore the state of Kafka consumers, but instead use the parameters from the configuration to initialize them?
I need a previous job state, but I want to get new data from another topic!

If you change the UID of the KafkaSource (or FlinkKafkaConsumer) and restart the job with allowNonRestoredState enabled, then you'll get the behavior you are looking for.
Changing the UID (or setting one, if you haven't explicitly set one) will prevent the saved Kafka offsets from being restored, and allowNonRestoredState will override Flink's built-in protections against losing state.

Related

What is the difference between incremental checkpoint and change log statebackend in Flink?

I am interested in processing large state using Flink.
To resolve this issue, there are some ways to handle it such as incremental checkpoint and others.
I understand its concept via the Flink document.
And also I found that there is change log statebackend which is introduced in Flink 1.16.
I think that chanage log state backend can reduce the time of taking snapshot by capturing the difference between current and previous one.
But I am a little bit confused and cannot fully understand the difference between incremental checkpoint and the change log state backend.
I want to use two methods to process large state in Flink, but want to understand its nature of origin. Although I read the articles provided in the Flink document, but not fully differentiate between the incremental checkpoint and the changelog state backend.
Is it almost same except that incremental checkpoint is focused on checkpoint mechanism while the change log state back end is focused on snapshot?
Any comments will be appreciated.

Apache Flink Checkpoining (Manually put a value into RocksDB Checkpoint and retrieve during recovery or Restart)

We have a scenario where we have to persist/save some value into the checkpoint and retrieve it back during failure recovery/application restart.
We followed a few things like ValueState, ValueStateDescriptor still not working.
https://github.com/realtime-storage-engine/flink-spillable-statebackend/blob/master/flink-spillable-benchmark/src/main/java/org/apache/flink/spillable/benchmark/WordCount.java
https://towardsdatascience.com/heres-how-flink-stores-your-state-7b37fbb60e1a
https://github.com/king/flink-state-cache/blob/master/examples/src/main/java/com/king/flink/state/Example.java
We can't externalize it to a DB as it may cause some performance issues.
Any lead to this will be helpful using checkpoint. How to put and get back from a Checkpoint?
All of your managed application state is automatically written into Flink checkpoints (and savepoints). This includes
keyed state (ValueState, ListState, MapState, etc)
operator state (ListState, BroadcastState, etc)
timers
This state is automatically restored during recovery, and can optionally be restored during manual restarts.
The Flink Operations Playground shows how to work with checkpoints and savepoints, and lets you observe their behavior during failure/recovery and restarts/rescaling.
If you want to read from a checkpoint yourself, that's what the State Processor API is for. Here's an example.

Is there a way to programmatically check if a Flink streaming job started from a savepoint before executing the stream?

Before calling execute on the StreamExecutionEnvironment and starting the stream job, is there a way to programmatically find out whether or not the job was restored from a savepoint? I need to know such information so that I can set the offset of a Kafka source depending on it while building the job graph.
It seems that the FlinkConnectorKafkaBase class which has a method initializeState has access to such information (code). However, there is no way to intercept the FunctionInitializationContext and retrieve the isRestored() value since initializeState is a final method. Also, the initializeState method gets called after the job graph is executed and so I don't think there is a feasible solution associated to it.
Another attempt I made was to find a Flink job parameter that indicates whether or not the job was started from a savepoint. However, I don't think such parameter exists.
You can get the effect you are looking for by simply doing this:
FlinkKafkaConsumer<String> myConsumer = new FlinkKafkaConsumer<>(...);
myConsumer.setStartFromEarliest();
If you use setStartFromEarliest then Flink will ignore the offsets stored in Kafka, and instead begin reading from the earliest record. Moreover, even if you use setStartFromEarliest, if Flink is resuming from a checkpoint or savepoint, it will instead use the offsets stored in that snapshot.
Note that Flink does its own Kafka offset management, and when recovering from a checkpoint ignores the offsets stored in Kafka. Flink does this as a part of providing exactly-once guarantees, which requires knowing exactly how much of the input was consumed to produce the results present in the rest of the state captured in a checkpoint or savepoint. For this reason, Flink always stores the offsets as part of every state snapshot (checkpoint or savepoint).
This is documented here and here.
As for your original question about initializeState, this is available if you implement the CheckpointedFunction interface, but it's quite rare to actually need this.

Flink Kinesis Consumer not storing last successfully processed sequence nos

We are using Flink Kinesis Consumer to consume data from Kinesis stream into our Flink application.
KCL library uses a DynamoDB table to store last successfully processed Kinesis stream sequence nos. so that the next time application starts, it resumes from where it left off.
But, it seems that Flink Kinesis Consumer does not maintain any such sequence nos. in any persistent store. As a result, we need to rely upon ShardIteratortype (trim_horizen, latest, etc) to decide where to resume Flink application processing upon application restart.
A possible solution to this could be to rely on Flink checkpointing mechanism, but that only works when application resumes upon failure, and not when the application has been deliberately cancelled and is needed to be restarted from the last successfully consumed Kinesis stream sequence no.
Do we need to store these last successfully consumed sequence nos ourselves ?
Best practice with Flink is to use checkpoints and savepoints, as these create consistent snapshots that contain offsets into your message queues (in this case, Kinesis stream sequence numbers) together with all of the state throughout the rest of the job graph that resulted from having consumed the data up to those offsets. This makes it possible to recover or restart without any loss or duplication of data.
Flink's checkpoints are snapshots taken automatically by Flink itself for the purpose of recovery from failures, and are in a format optimized for rapid restoration. Savepoints use the same underlying snapshot mechanism, but are triggered manually, and their format is more concerned about operational flexibility than performance.
Savepoints are what you are looking for. In particular, cancel with savepoint and resume from savepoint are very useful.
Another option is to use retained checkpoints with ExternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION.
To add to David's response, I'd like to explain the reasoning behind not storing sequence numbers.
Any kind of offsets committing into the source system would limit the checkpointing/savepointing feature only to fault tolerance. That is, only the latest checkpoint/savepoint would be able to recover.
However, Flink actually supports to jump back to a previous checkpoint/savepoint. Consider an application upgrade. You make a savepoint before, upgrade and let it run for a couple of minutes where it creates a few checkpoints. Then, you discover a critical bug. You would like to rollback to the savepoint that you have taken and discard all checkpoints.
Now if Flink commits the source offsets only to the source systems, we would not be able to replay the data between now and the restored savepoint. So, Flink needs to store the offsets in the savepoint itself as David pointed out. At this point, additionally committing to source system does not yield any benefit and is confusing while restoring to a previous savepoint/checkpoint.
Do you see any benefit in storing the offsets additionally?

Restore MapState after Job restart/cancellation

I have to Aggregate the count/sum on event stream for various entities.
event logs(json str) are received from kafka and populate map entityname as key and value is count of the selective attriibutes as json str .
MapState sourceAggregationMap = getRuntimeContext().getMapState(sourceAggregationDesc);
for each event stream repopulate the value .
problem is whenever job gets stopped (failed)/cancelled and when the job gets restarted map state is not getting reinitialized / restored . again count starts from 0.
using Apache flink 1.6.0
state.backend: rocksdb
Checkpoints are used for automatic recovery from failures, and need to be explicitly enabled and configured. Savepoints are triggered manually and are used for restarts and upgrades. Both rely on the same snapshotting mechanism which is described in detail here.
These snapshots capture the entire state of the distributed pipeline, recording offsets into the input queues as well as the state throughout the job graph that has resulted from having ingested the data up to that point. When a failure occurs, the sources are rewound, the state is restored, and processing is resumed.
With the RocksDB state backend, the working state is held on the local disk (in a location you configure), and checkpoints are durably persisted to a distributed file system (again, configurable). When a job is cancelled, the checkpoints are normally deleted (as they will no longer be needed for recovery), but they can be configured to be retained. If your jobs aren't recovering their state after failures, perhaps the checkpoints are failing, or the job is failing before the first checkpoint can complete. The web ui has a section that displays information about checkpoints, and the logs should also have helpful information.
Update: see also this answer.

Resources