Flink batch mode does not sort by event timestamp - apache-flink

I have a custom flink Source, and I have a SerializableTimestampAssigner that assigns event timestamps to records emitted by the source. The source may emit records out of order because of the nature of the underlying data storage, however with BATCH mode, I expect Flink to sort these records by event timestamp before any operator processes them.
Excerpted from Flink document on execution mode:
In BATCH mode, where the input dataset is known in advance, there is no need for such a heuristic as, at the very least, elements can be sorted by timestamp so that they are processed in temporal order.
However, this doesn't seem to be the case. If I create a datastream out of the Source (StreamExecutionEnvironment.fromSource) with my timestamp assigner, and then datastream.addSink(x => println(extractTimestamp(x)), the output isn't strictly ascending. Is my understanding of the document wrong? Or does flink expect me (the users) to sort the input dataset themselves?

BATCH execution mode first sorts by key, and within each key, it sorts by timestamp. By operating this way, it only needs to keep state for one key at a time, so this keeps the runtime simple and efficient.
If your pipeline isn't using keyed streams, then you won't be using keyed state or timers, so the ordering shouldn't matter (and I'm not sure what happens).
For keyed co-streams, they are both keyed in the same way, and both streams are sorted by those keys and the keys are advanced in lockstep.
Broadcast streams are sent in their entirety before anything else.

Related

Flink : Handling Keyed Streams with data older than application watermark

I'm using Flink with a kinesis source and event time keyed windows. The application will be listening to a live stream of data, windowing (event time windows) and processing each keyed stream. I have another use-case where i also need to be able to support backfill of older data for certain key streams (These will be new key streams with event-time < watermark).
Given that I'm using Watermarks, this poses to be a problem since Flink doesn't support per - key watermark. Hence any keyed stream for backfill will end up being ignored since the event time for this stream will be < application watermark maintained by the live stream.
I have gone through other similar questions but wasn't able to get a possible approach.
Here are possible approaches I'm considering but still have some open questions.
Possible Approach - 1
(i) Maintain a copy of the application specifically for backfill purpose. The backfill job will happen rarely (~ a few times a month). The stream of data sent to the application copy will have an indicator for start and stop in the stream. Using that I plan on starting / resetting the watermark.
Open Question ? Is it possible to reset the watermark using an indicator from the stream ? I understand that this is not best practise but can't think of an alternative solution.
Follow up to : Clear Flink watermark state in DataStream [No definitive solution provided.]
Possible Approach - 2
Have parallel instances for each key since its possible for having different watermark per task. -> Not going with this since i'll be having > 5k keyed streams.
Let me know if any other details are needed.
You can address this by running the backfill jobs in BATCH execution mode. When the DataStream API operates in batch mode, the input is bounded (finite), and known in advance. This allows Flink to sort the input by key and by timestamp, and the processing will proceed correctly according to event time without any concern for watermarks or late events.

Are Flink Stream messages sent to downstream in order

I am new to Flink. I have a question that if all the messages sent to the downstream nodes are in order? For example,
[Stream] -> [DownStream]
Stream: [1,2,3,4,5,6,7,8,9]
Downstream get [3,2,1,4,5,7,6,8,9]
If so, how do we handle this case if we want it in order?
Any help would be very appreciated!
An operator can have multiple input channels. It will process the events from each channel in the order in which they were received. (Operators can also have multiple output channels.)
If your job has more than one pathway between stream and downstream, then the events can race and the the ordering will be non-deterministic. Otherwise the ordering will be preserved.
An example: Suppose you are reading, in parallel, from a Kafka topic with multiple partitions. Further imagine that all events from a given user are in the same Kafka partition (and are in order, by timestamp, for each user). Then in Flink you can use keyBy(user) and be sure that the event stream for each user will remain in order. On the other hand, if the events for a given user are spread across multiple partitions, then keyBy(user) will end up creating a stream of events for each user that is (almost certainly) out of order, because it will be pulling together events from several different FlinkKafkaConsumer instances that are reading in parallel.

Flink timestamps in events & stream records

looking at documentation and books of Flink I have doubt about the timestamps: if a stream is set in event-time mode and this means the timestamps have the time of the source before getting into Flink (even before going through a messaging queue, which could be Kafka), why Flink attaches timestamps to records as metadata?
slide 3 having the different types of timestamps according to what they account for:
https://www.slideshare.net/dataArtisans/apache-flink-training-time-and-watermarks
If the timestamp is coming inside of the event why to pass that value to a metadata of the record? also what would exactly be the difference between the event and the recod?
The timestamps don't always come from inside of the events. For example, the Flink Kafka consumer copies the timestamps in the Kafka metadata to the Flink metadata. (You can supply a timestamp assigner if you wish to overwrite these timestamps.)
These timestamps carried in the stream record metadata are used internally in various ways:
the built-in event-time window assigners use these timestamps to assign events to windows
CEP uses these timestamps to sort the stream into event time order
Flink SQL can also use these timestamps for sorting, windowing, etc.

Aggregate two different types of records in Apache Flink

I have a specific task to join two data streams in one aggregation using Apache Flink with some additional logic.
Basically I have two data streams: a stream of events and a stream of so-called meta-events. I use Apache Kafka as a message backbone. What I'm trying to achieve is to trigger the aggregation/window to the evaluation based on the information given in meta-event. The basic scenario is:
The Data Stream of events starts to emit records of Type A;
The records keep accumulating in some aggregation or window based on some key;
The Data Stream of meta-events receives a new meta-event with the given key which also defines a total amount of the events that will be emitted in the Data Stream of events.
The number of events form the step 3 becomes a trigger criteria for the aggregation. After a total count of Type A events with a given key becomes equal to the number defined in the meta-event with a given key the aggregation should be triggered to the evaluation.
Steps 1 and 3 occur in the non-deterministic order, so they can be reordered.
What I've tried is to analyze the Flink Global Windows but not sure whether it would be a good and adequate solution. I'm also not sure if such problem has a solution in Apache Flink.
Any possible help is highly appreciated.
The simplistic answer is to .connect() the two streams, keyBy() the appropriate fields in each stream, and then run them into a custom KeyedCoProcessFunction. You'd save the current aggregation result & count in the left hand (Type A) stream state, and the target count in the right hand (meta-event) stream state, and generate results when the aggregation count == the target count.
But there is an issue here - what happens if you get N records in the Type A stream before you get the meta-event record for that key, and N > the target count? Essentially you either have to guarantee that doesn't happen, or you need to buffer Type A events (in state) until you get the meta-event record.
Though similar situations could occur if the meta-event target can be changed to a smaller value, of course.

Flink keyed windows watermark

I'm using flink with event time keyed windows.
It seems like some of the windows are not being emitted.
Is the watermark being advanced for each key individually?
For example, if my key is (id,type), and a specific pair of id and type are not being ingested to the source, will their specific window watermark will not advance?
If this is the case, how can i make sure that all my keyd windows will get evicted after some time? (we have many keys so sending a periodic dummy message for each key is not an option).
I'll appreciate any help
Flink has separate watermarks for each task (i.e., each parallel instance) -- otherwise there would have to some sort of horribly expensive global coordination -- but not for each key. In the case of a keyed window, each instance of the window operator will be handling the events for some disjoint subset of the keyspace, and all of the windows for those keys will be using the same watermark.
Keep in mind that empty windows do not produce results. So if there is some key for which there are no events during a window, that window will not produce results for that key.
Or it could be that you have an idle source holding back the watermarks. If one of your source tasks becomes idle, then its watermark won't advance. You could inspect the current watermark in the web UI, and check to see if it is advancing in every task.

Resources