Flink job restore from state after code changes - apache-flink

I am using Apache Flink 1.9 and standart checkpoint/savepoint mechanism to FS.
And my question is about: what is the proper way to restore job from savepoint, if job's code was changed?
For example, after refactoring i rename few classes and after that i can't restore from old checkpoint.
I lose my data, and want to ask - what i can do in this cases?
All operators have uid and name

Shortly speaking: it depends.
As for the more elaborate explanation, it shouldn't generally be an issue if You have only reordered and renamed the classes, obviously as long as the UIDs have not changed. As for the refactoring, it may actually influence how the state is stored and thus may prevent from restoring it. In such case You can use the parameter --allowNonRestoredState, which should allow to restore the available states from savepoint and start clean ones. Keep in mind that this may not restore all the states. In general You shouldn't really refactor the operators once they are running, since it can effectively prevent restoring from savepoint.
It's worth noting that It may not be possible to restore from savepoint if you are using SQL, refer to FLINK-6966 issue.
I assume that You are dealing with Savepoints not externalized checkpoints, otherwise there are few things to have in mind especially when changing parallelism.

seems your state cannot be treated as POJOs (POJOs: classes that follow a certain bean-like pattern). When a user-defined data type can’t be recognized as a POJO type, it must be processed as GenericType and serialized with Kryo.
Currently, In Flink, schema evolution is supported only for POJO and Avro types. Therefore, if you care about schema evolution for the state, it is currently recommended to always use either Pojo or Avro for state data types.
Some docs FYI:
https://ci.apache.org/projects/flink/flink-docs-stable/dev/types_serialization.html
https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/state/schema_evolution.html

Related

How flink handle unused keyed state field when we update our job

We have a job which all the user feature and information are stored in keyed state. Each user feature represents a state descriptor. But we are evolving our features so sometimes some features are abandoned in our next release/version because we will no longer declare the abandoned feature state's descriptor in our code. My question is how flink takes care of those abandoned state? Will it no longer restore those abandoned state automatically?
If you are using Flink POJOs or Avro types, then Flink will automatically migrate the types and state for you. Otherwise, it will not, and you could implement a custom serializer instead. Or you could use the State Processor API to clean things up.

What is the best way to have a cache of an external database in Flink?

The external database consists of a set of rules for each key, these rules should be applied on each stream element in the Flink job. Because it is very expensive to make a DB call for each element and retrieve the rules, I want to fetch the rules from the database at initialization and store it in a local cache.
When rules are updated in the external database, a status change event is published to the Flink job which should be used to fetch the rules and refresh this cache.
What is the best way to achieve what I've described? I looked into keyed state but initializing all keys and refreshing the keys on update doesn't seem possible.
I think you can make use of BroadcastProcessFunction or KeyedBroadcastProcessFunction to achieve your use case. A detailed blog available here
In short: You can define the source such as Kafka or any other and then publish the rules to Kafka that you want the actual stream to consume. Connect the actual data stream and rules stream. Then the processBroadcastElement will stream the rules where you can update the state. Finally the updated state (rules) can be retrieved in the actual event streaming method processElement.
Points to consider: Broadcast state will be kept on the heap always, not in state store (RocksDB). So, it has to be small enough to fit in memory. Each slot will copy all of the broadcast state into its checkpoints, so all checkpoints and savepoints will have n (parallelism) copies of the broadcast state.
A few different mechanisms in Flink may be relevant to this use case, depending on your detailed requirements.
Broadcast State
Jaya Ananthram has already covered the idea of using broadcast state in his answer. This makes sense if the rules should be applied globally, for every key, and if you can find a way to collect and broadcast the updates.
Note that the Context in the processBroadcastElement() of a KeyedBroadcastProcessFunction method contains the method applyToKeyedState(StateDescriptor<S, VS> stateDescriptor, KeyedStateFunction<KS, S> function). This means you can register a KeyedStateFunction that will be applied to all states of all keys associated with the provided stateDescriptor.
State Processor API
If you want to bootstrap state in a Flink savepoint from a database dump, you can do that with this library. You'll find a simple example of using the State Processor API to bootstrap state in this gist.
Change Data Capture
The Table/SQL API supports Debezium, Canal, and Maxwell CDC streams, and Kafka upsert streams. This may be a solution. There's also flink-cdc-connectors.
Lookup Joins
Flink SQL can do temporal lookup joins against a JDBC database, with a configurable cache. Not sure this is relevant.
In essence David's answer summarizes it well. If you are looking for more detail: not long ago, I gave a webinar [1] on this topic including running code examples. [2]
[1] https://www.youtube.com/watch?v=cJS18iKLUIY
[2] https://github.com/knaufk/enrichments-with-flink

Is there a way to programmatically check if a Flink streaming job started from a savepoint before executing the stream?

Before calling execute on the StreamExecutionEnvironment and starting the stream job, is there a way to programmatically find out whether or not the job was restored from a savepoint? I need to know such information so that I can set the offset of a Kafka source depending on it while building the job graph.
It seems that the FlinkConnectorKafkaBase class which has a method initializeState has access to such information (code). However, there is no way to intercept the FunctionInitializationContext and retrieve the isRestored() value since initializeState is a final method. Also, the initializeState method gets called after the job graph is executed and so I don't think there is a feasible solution associated to it.
Another attempt I made was to find a Flink job parameter that indicates whether or not the job was started from a savepoint. However, I don't think such parameter exists.
You can get the effect you are looking for by simply doing this:
FlinkKafkaConsumer<String> myConsumer = new FlinkKafkaConsumer<>(...);
myConsumer.setStartFromEarliest();
If you use setStartFromEarliest then Flink will ignore the offsets stored in Kafka, and instead begin reading from the earliest record. Moreover, even if you use setStartFromEarliest, if Flink is resuming from a checkpoint or savepoint, it will instead use the offsets stored in that snapshot.
Note that Flink does its own Kafka offset management, and when recovering from a checkpoint ignores the offsets stored in Kafka. Flink does this as a part of providing exactly-once guarantees, which requires knowing exactly how much of the input was consumed to produce the results present in the rest of the state captured in a checkpoint or savepoint. For this reason, Flink always stores the offsets as part of every state snapshot (checkpoint or savepoint).
This is documented here and here.
As for your original question about initializeState, this is available if you implement the CheckpointedFunction interface, but it's quite rare to actually need this.

Prevent redundant CRUD operations in multi-container pod

If I have multiple identical containers deployed simultaneously, and each contains a job to periodically create an artifact and save to a database, and what they save is deterministic, how should I go about preventing redundant operations?
Should I check the key in the database to see if it exists first, and if it doesn't, begin the saving operation? The artifact creation process is lengthy, so it's quite likely that one container may check the DB, see that it hasn't been saved to yet, and start the artifact creation process ... in the meantime, the other container may do the same.
I realize that having multiple clones of the same container is good for preventing downtime / keeping the application robust, but how should you deal with side effects?
This is a pretty open-ended question, so there isn't going to be one definitive answer without knowing the exact specifics of your situation.
Generally speaking in situations like this you should try to make the action that is being performed idempotent if possible, thus removing the issues if multiple requests are sent to perform the same action.
The question I would be asking myself is whether or not your architecture and technology stack is sutiable for this task. Not every activity needs to be performed in Kubernetes.
Would a Kubernetes CronJob be more sutiable for this?
What about a using messaging queue?

Apache Flink - Difference between Checkpoints & Save points?

Can someone please help me understand the difference between Apache Flink's Checkpoints & Savepoints.
While i read the documentation, couldn't understand the difference! :s
Apache Flink's Checkpoints and Savepoints are similar in that way they both are mechanisms for preserving internal state of Flink's applications.
Checkpoints are taken automatically and are used for automatic restarting job in case of a failure.
Savepoints on the other hand are taken manually, are always stored externally and are used for starting a "new" job with previous internal state in case of e.g.
bug fixing
flink version upgrade
A/B testing, etc.
Underneath they are in fact the same mechanism/code path with some subtle nuances.
Edit:
You can also find a very good explanation in the official documentation https://ci.apache.org/projects/flink/flink-docs-stable/ops/state/savepoints.html#what-is-a-savepoint-how-is-a-savepoint-different-from-a-checkpoint :
A Savepoint is a consistent image of the execution state of a streaming job, created via Flink’s checkpointing mechanism. You can use Savepoints to stop-and-resume, fork, or update your Flink jobs. Savepoints consist of two parts: a directory with (typically large) binary files on stable storage (e.g. HDFS, S3, …) and a (relatively small) meta data file. The files on stable storage represent the net data of the job’s execution state image. The meta data file of a Savepoint contains (primarily) pointers to all files on stable storage that are part of the Savepoint, in form of absolute paths.
Attention: In order to allow upgrades between programs and Flink versions, it is important to check out the following section about assigning IDs to your operators.
Conceptually, Flink’s Savepoints are different from Checkpoints in a similar way that backups are different from recovery logs in traditional database systems. The primary purpose of Checkpoints is to provide a recovery mechanism in case of unexpected job failures. A Checkpoint’s lifecycle is managed by Flink, i.e. a Checkpoint is created, owned, and released by Flink - without user interaction. As a method of recovery and being periodically triggered, two main design goals for the Checkpoint implementation are i) being as lightweight to create and ii) being as fast to restore from as possible. Optimizations towards those goals can exploit certain properties, e.g. that the job code doesn’t change between the execution attempts. Checkpoints are usually dropped after the job was terminated by the user (except if explicitly configured as retained Checkpoints).
In contrast to all this, Savepoints are created, owned, and deleted by the user. Their use-case is for planned, manual backup and resume. For example, this could be an update of your Flink version, changing your job graph, changing parallelism, forking a second job like for a red/blue deployment, and so on. Of course, Savepoints must survive job termination. Conceptually, Savepoints can be a bit more expensive to produce and restore and focus more on portability and support for the previously mentioned changes to the job.
Those conceptual differences aside, the current implementations of Checkpoints and Savepoints are basically using the same code and produce the same format. However, there is currently one exception from this, and we might introduce more differences in the future. The exception are incremental checkpoints with the RocksDB state backend. They are using some RocksDB internal format instead of Flink’s native savepoint format. This makes them the first instance of a more lightweight checkpointing mechanism, compared to Savepoints.
Savepoints
Savepoints usually apply to an individual transaction; it marks a
point to which the transaction can be rolled back, so subsequent
changes can be undone if necessary.
More See Here
https://ci.apache.org/projects/flink/flink-docs-release-1.2/setup/cli.html#savepoints
Checkpoints
Checkpoints usually apply to whole systems, You can configure periodic checkpoints to be persisted externally. Externalized checkpoints write their meta data out to persistent storage and are not automatically cleaned up when the job fails.
More See Here:
https://ci.apache.org/projects/flink/flink-docs-release-1.2/setup/checkpoints.html
On difference I would like to add is savepoint can be manually applied when we upgrade the pipeline vs checkpoint kicks in as useful in case the pipeline restarts or crashes abruptly. However, there could be side effects to later where application(pipeline) has to handle any scenarios like re-processing duplicate data etc.

Resources