Setup:
Java 8
Flink 1.2 (Mac OSX)
Kafka 0.10.0 (VirtualBox/Ubuntu)
FlinkKafkaConsumer010
FlinkKafkaProducer010
Created a simple example program to consume 1M message from one Kafka topic and produce to another - running in local execution mode. Both topics have 32 partitions.
When I let run from start to finish, it consumes and produces all message. If I start and then stop (SIGINT) before it is completed, then restart again the producer only receives a subset of the original 1M messages.
I have confirmed my offsets for the consumer and it read all 1M messages.
final StreamExecutionEnvironment env =
StreamExecutionEnvironment.getExecutionEnvironment();
env.setParallelism(32);
env.enableCheckpointing(1000L, CheckpointingMode.EXACTLY_ONCE);
--
producer.setFlushOnCheckpoint(true);
producer.setLogFailuresOnly(false);
In local execution mode is this expected? Do I need to enable savepoints to stop and restart a stream job? I appears the producer is not committing all the messages when this happens.
Thanks in advance!
First of all, on subsequent runs, it only receives a subset of the messages because the FlinkKafkaConsumer is using the committed offsets in Kafka as the starting positions. Currently, the only way to avoid this right now in the releases (up to 1.2.0 as of now) is to always assign a new group.id. In the next release, there will be new options for this: https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/connectors/kafka.html#kafka-consumers-start-position-configuration.
As a side note, please also note that the committed offsets in Kafka are not used at all for the exactly-once processing guarantees in Flink. Flink only relies on the checkpointed offsets for that. More details on this can be found in the Flink Kafka connector docs in the link above.
Related
I have a question aboult Flink-Kafka Source:
When a flink application starts up after restored from checkpoint, and runs well.
During running, serveral Kafka partitions are added to the Kafka topic, will the running flink application be aware of these added partitions and read them without manual effort? or I have to restart the application and let flink be aware of these partition during startup?
Could you please point to me the code where Flink handles Kafka partitions change if adding partitions doesn't need manual effort. I didn't find the logic in the code.
Thanks!
Looks that Flink will be aware of new topic and new partition during runtime,the method call sequence is:
FlinkKafkaConsumerBase#run
FlinkKafkaConsumerBase#runWithPartitionDiscovery
FlinkKafkaConsumerBase#createAndStartDiscoveryLoop
It the last method, it will kick off a new thread to discover new topics/partitions periodically
We are using Flink Kinesis Consumer to consume data from Kinesis stream into our Flink application.
KCL library uses a DynamoDB table to store last successfully processed Kinesis stream sequence nos. so that the next time application starts, it resumes from where it left off.
But, it seems that Flink Kinesis Consumer does not maintain any such sequence nos. in any persistent store. As a result, we need to rely upon ShardIteratortype (trim_horizen, latest, etc) to decide where to resume Flink application processing upon application restart.
A possible solution to this could be to rely on Flink checkpointing mechanism, but that only works when application resumes upon failure, and not when the application has been deliberately cancelled and is needed to be restarted from the last successfully consumed Kinesis stream sequence no.
Do we need to store these last successfully consumed sequence nos ourselves ?
Best practice with Flink is to use checkpoints and savepoints, as these create consistent snapshots that contain offsets into your message queues (in this case, Kinesis stream sequence numbers) together with all of the state throughout the rest of the job graph that resulted from having consumed the data up to those offsets. This makes it possible to recover or restart without any loss or duplication of data.
Flink's checkpoints are snapshots taken automatically by Flink itself for the purpose of recovery from failures, and are in a format optimized for rapid restoration. Savepoints use the same underlying snapshot mechanism, but are triggered manually, and their format is more concerned about operational flexibility than performance.
Savepoints are what you are looking for. In particular, cancel with savepoint and resume from savepoint are very useful.
Another option is to use retained checkpoints with ExternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION.
To add to David's response, I'd like to explain the reasoning behind not storing sequence numbers.
Any kind of offsets committing into the source system would limit the checkpointing/savepointing feature only to fault tolerance. That is, only the latest checkpoint/savepoint would be able to recover.
However, Flink actually supports to jump back to a previous checkpoint/savepoint. Consider an application upgrade. You make a savepoint before, upgrade and let it run for a couple of minutes where it creates a few checkpoints. Then, you discover a critical bug. You would like to rollback to the savepoint that you have taken and discard all checkpoints.
Now if Flink commits the source offsets only to the source systems, we would not be able to replay the data between now and the restored savepoint. So, Flink needs to store the offsets in the savepoint itself as David pointed out. At this point, additionally committing to source system does not yield any benefit and is confusing while restoring to a previous savepoint/checkpoint.
Do you see any benefit in storing the offsets additionally?
In a Flink streaming application that is ingesting messages from Kafka,
1) How do I disable auto-committing?
2) How do I manually commit from Flink after successfully processing a message?
Thanks.
By default Flink commits offsets on checkpoints. You can disable it as follows:
val consumer = new FlinkKafkaConsumer011[T](...)
c.setCommitOffsetsOnCheckpoints(false)
If you don't have checkpoints enabled see here
Why would you do that though? Flink's checkpointing mechanism is there to solve this problem for you. Flink won't commit offsets in the presence of failures. If you throw an exception at some point downstream of the Kafka consumer Flink will attempt to restart the stream from previous successful checkpoint. If the error persists then Flink will repeatedly restart for the configured number of times before failing the stream.
This means that is unlikely you will lose messages due to Flink committing offsets of messages your code hasn't successfully processed.
I'm struggling with an issue regarding event time of flink's kafka's consumer connector.
Citing Flink doc
Since Apache Kafka 0.10+, Kafka’s messages can carry timestamps, indicating the time the event has occurred (see “event time” in Apache Flink) or the time when the message has been written to the Kafka broker.
The FlinkKafkaConsumer010 will emit records with the timestamp attached, if the time characteristic in Flink is set to TimeCharacteristic.EventTime (StreamExecutionEnvironment.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)).
The Kafka consumer does not emit watermarks.
Some questions and issues come to mind:
How do I know if it timestamp taken is of the time it occurred or time written to the Kafka broker?
If the consumer does not emit watermarks and TimeCharacteristic.EventTime is set, does this mean a message late by a few days can still enter and be processed?
The main flow diagram does not contain a window function, and basically looks like the following: source(kafka)->filter->processFunction->Sink. Does this mean the the event is fired at the moment it is consumed by Kafka connector?
I currently use Kafka connector 0.10.0, TimeCharacteristic.EventTime set and use a processFunction which every expectedly X minutes does some state cleanup.
However I'm receiving a strange situation where the OnTimerContext contains timestamps which starts from 0 and grows until current timestamp when I start the flink program and is quite strange, is this a bug?
Thanks in advance to all helpers!
That depends on the configuration of the Kafka producer that's creating these events. The message.timestamp.type property should be set to either CreateTime or LogAppendTime.
Your flink application is responsible for creating watermarks; the kafka consumer will take care of the timestamps, but not the watermarks. It doesn't matter how late an event is, it will still enter your pipeline.
Yes.
It's not clear to me what part of this is strange.
I've setup a Flink 1.2 standalone cluster with 2 JobManagers and 3 TaskManagers and I'm using JMeter to load-test it by producing Kafka messages / events which are then processed. The processing job runs on a TaskManager and it usually takes ~15K events/s.
The job has set EXACTLY_ONCE checkpointing and is persisting state and checkpoints to Amazon S3.
If I shutdown the TaskManager running the job it takes a bit, a few seconds, then the job is resumed on a different TaskManager. The job mainly logs the event ids which are consecutive integers (e.g. from 0 to 1200000).
When I check the output on the TaskManager I shut down the last count is for example 500000, then when I check the output on the resumed job on a different TaskManager it starts with ~ 400000. This means ~100K of duplicated events. This number is dependent on the speed of the test can be higher or lower.
Not sure if I'm missing something but I would expect the job to display the next consecutive number (like 500001) after resuming on the different TaskManager.
Does anyone know why this is happening / extra settings I have to configure to obtain the exactly once?
You are seeing the expected behavior for exactly-once. Flink implements fault-tolerance via a combination of checkpointing and replay in the case of failures. The guarantee is not that each event will be sent into the pipeline exactly once, but rather that each event will affect your pipeline's state exactly once.
Checkpointing creates a consistent snapshot across the entire cluster. During recovery, operator state is restored and the sources are replayed from the most recent checkpoint.
For a more thorough explanation, see this data Artisans blog post: High-throughput, low-latency, and exactly-once stream processing with Apache Flink™, or the Flink docs.