We are using Flink 1.9.1.
We have a source, a process function, and a sink. The application consumes and produces to kinesis.
The input rate (produced by a simulator) is 20 events per second. The per second output rate for the process function shows 14 per second. The back pressure metrics for the source is shown as OK (green). The event count (Number of events sent by the source) and the number of events received by the process function also match with very little delay.
But this count does not match the event count pushed by the simulator. This count matches the 14 per second rate.
Now my question is, does Flink regulate the input rate automatically?
In my case, how is the input rate controlled at 14 per second?
If it is not, is there any other metric that I should be looking at that I'm missing?
It's not possible to force a Flink pipeline to consume events at a particular rate. By design, there is limited buffering in the network stack, and the slowest task in the execution graph will dictate the rate at which the pipeline will consume and process events.
The back pressure monitoring (that green OK signal) is not a definitive guide to whether back pressure is occuring. So long as the job is able to make steady forward progress, it probably won't indicate that there's a problem. You could examine some of the network queue metrics to get more insight: e.g., inPoolUsage, outPoolUsage, inputQueueLength. See Flink Network Stack Vol. 2: Monitoring, Metrics, and that Backpressure Thing for a lot more on this topic.
20 events per second seems very slow, so I am a bit surprised that something can't keep up with that rate, but that appears to be what's happening.
Related
I'm using KDA with a flink job which should analyse messages emitted by a different IOT device sources. There is a kinesis stream with 4 shards with each of them contains more or less the same amount of data (there are no hot shards). The kinesis stream gets filled by AWS Greengrass Streammanager which is using an increasing sequence number as partition key. Each message contains a single value (something like temperature = 5).
As with this setup the stream read by the kinesis consumer in flink is unordered. But I need to preserve the order of the messages. To do so I have written a small buffer function which is more or less the logic from CepOperator to buffer messages and restore the order. Therefore the stream is keyed by the id of a message. Let's say a temperature message has always a unique id and therefore the stream is keyed by this id.
To create the respective watermarks I'm using the FlinkKinesisConsumer and register there a BoundedOutOfOrdernessTimestampExtractor. If I now use a out-of-orderness time of 10 seconds everything works fine except that I have almost 50% of late arrivals which is not the desired behaviour. But now if I increase the time to 60 seconds the iterator of the kinesis stream falls significantly behind (linear growing over time). The documentation of the Kinesis Consumer does say a little about the settings here. I have also tried to register a JobManagerWatermarkTracker but it seems that it does not change the behaviour.
I do not understand the circumstances why a higher out of orderness leads the iterator to fall behind increasingly but a smaller time setting drops a significant amount of messages. What measures do I need to take to find the proper settings or is my implementation wrong?
UPDATE:
While investigating the issue I have found that if the JobManagerWatermarkTracker isn't properly configured (I still don't understand how to configure) the alignment to the global watermark stops subtasks from reading from the kinesis stream which causes the iterator to fall back. I have calculated a delta how much "latency" a dropped event has and set this as and out-of-orderness (in this case 60 secs). With deactivating the JobManagerWatermarkTracker everything work as expected.
Furthermore it seems that the AWS Greengrass Streammanager isn't optimal for such use cases as it distributes the load evenly across shards but with an increasing number of shards this isn't optimal since one temperature datapoint might be spread across all shards of a stream. That introduces a lot unnecessary latency. I appreciate any input howto configure the JobManagerWatermarkTracker
I see that there are lot of discussions going on about adding support for watermarks per key. But do flink support per partition watermarks?
Currently - then minimum of all the watermarks(non idle partitions) is taken into account. Because of this the last hanging records in a window are stuck as well.(when incremented the watermark using periodicemit)
Any info on this is really appreciated!
Some of the sources, such as the FlinkKafkaConsumer, support per-partition watermarking. You get this by calling assignTimestampsAndWatermarks on the source, rather than on the stream produced by the source.
What this does is that each consumer instance tracks the maximum timestamp within each partition, and take as its watermark the minimum of these maximums, less the configured bounded out-of-orderness. Idle partitions will be ignored, if you configure it to do so.
Not only does this yield more accurate watermarking, but if your events are in-order within each partition, this also makes it possible to take advantage of the WatermarkStrategy.forMonotonousTimestamps() strategy.
See Watermark Strategies and the Kafka Connector for more details.
As for why the last window isn't being triggered, this is related to watermarking, but not to per-partition watermarking. The problem is simply that windows are triggered by watermarks, and the watermarks are trailing behind the timestamps in the events. So the watermarks can never catch up to the final events, and can never trigger the last window.
This isn't a problem for unbounded streaming jobs, since they never stop and never have a last window. And it isn't a problem for batch jobs, since they are aware of all of the data. But for bounded streaming jobs, you need to do something to work around this issue. Broadly speaking, what you must do is to inform Flink that the input stream has ended -- whenever the Flink sources detect that they have reached the end of an event-time-based input stream, they emit one last watermark whose value is MAX_WATERMARK, and this will trigger any open windows.
One way to do this is to use a KafkaDeserializationSchema with an implementation of isEndOfStream that returns true when the job reaches its end.
I've a pipeline where I'm applying transformation rules(from broadcast state) on a stream of events; when I run broadcast stream and original stream in parallel without connecting, stream performance is really good, but the moment I do broadcast performance goes down drastically. How can I achieve better performance. Data passed between operators are in byte[] and data footprint is small as well.
I've attached snapshots of both scenarios:
Top row shows stream consuming events from Kafka and bottom row
shows rules consumed from another topic. With this setup I could
achieve throughput of upto ~20K msg/sec per task manager processing
12Gb of data in 4mins
2. I've connected the broadcast stream with the data stream for
processing in future . Note that only to measure performance of
broadcast I've made sure no records are consumed in the data
stream(top row). At the processing side of the broadcast state, i'm
only store received messages to MapState. With this setup I can get
throughput of upto ~1000 msg/sec per task manager processing 12Gb of
data in 18mins.
You've done more than simply connect the broadcast and keyed streams. Before, each event went through just one network shuffle (the rebalance, hash, and broadcast connections), and now there are four or five shuffles for each event.
Every shuffle is expensive. Try to reduce the number of times you change parallelism or use keyBy.
Is there a way to enforce a steady poll rate using the google-cloud-pubsub client?. I want to avoid scenarios where if there is spike in the publish rate, the pull request rate also tend to increase.
The client provides FlowControl settings, by setting the maxOutstanding messages. From my understanding, it sets the max batch size during a pull operation.
I want to understand how to create a constant pull rate, say 1000 RPS.
Message Flow Control can be used to set the maximum number of messages being processed at a given time (i.e., setting max_messages in the case of the python client), which indirectly sets the maximum rate at which messages are received.
While it doesn’t allow you to directly set the exact number of messages received per second (that would depend on the time it takes to process a message and the number of messages being processed), it should avoid scenarios where you get a spike in publish rate.
If you really need to set a rate in messages received per second, AFAIK it’s not made available directly on the client libraries, so you’d have to implement it yourself using an asynchronous pull and using some timers to acknowledge the messages at your desired rate.
Consider I have a data stream that contains event time data in it. I want to gather input data stream in window time of 8 milliseconds and reduce every window data. I do that using the following code:
aggregatedTuple
.keyBy( 0).timeWindow(Time.milliseconds(8))
.reduce(new ReduceFunction<Tuple2<Long, JSONObject>>()
Point: The key of the data stream is the timestamp of processing time mapped to last 8 submultiples of a timestamp of processing millisecond, for example 1531569851297 will mapped to 1531569851296.
But it's possible the data stream arrived late and enter to the wrong window time. For example, suppose I set the window time to 8 milliseconds. If data enter the Flink engine in order or at least with a delay less than window time (8 milliseconds) it will be the best case. But suppose data stream event time (that is a field in the data stream, also) has arrived with the latency of 30 milliseconds. So it will enter the wrong window and I think if I check the event time of every data stream, as it wants to enter the window, I can filter at such a late data.
So I have two question:
How can I filter data stream as it wants to enter the window and check if the data created at the right timestamp for the window?
How can I gather such late data in a variable to do some processing on them?
Flink has two different, related abstractions that deal with different aspects of computing windowed analytics on streams with event-time timestamps: watermarks and allowed lateness.
First, watermarks, which come into play whenever working with event-time data (whether or not you are using windows). Watermarks provide information to Flink about the progress of event-time, and give you, the application writer, a means of coping with out-of-order data. Watermarks flow with the data stream, and each one marks a position in the stream and carries a timestamp. A watermark serves as an assertion that at that point in the stream, the stream is now (probably) complete up to that timestamp -- or in other words, the events that follow the watermark are unlikely to be from before the time indicated by the watermark. The most common watermarking strategy is to use a BoundedOutOfOrdernessTimestampExtractor, which assumes that events arrive within some fixed, bounded delay.
This now provides a definition of lateness -- events that follow a watermark with timestamps less than the watermarks' timestamp are considered late.
The window API provides a notion of allowed lateness, which is set to zero by default. If allowed lateness is greater than zero, then the default Trigger for event-time windows will accept late events into their appropriate windows, up to the limit of the allowed lateness. The window action will fire once at the usual time, and then again for each late event, up to the end of the allowed lateness interval. After which, late events are discarded (or collected to a side output if one is configured).
How can I filter data stream as it wants to enter the window and check
if the data created at the right timestamp for the window?
Flink's window assigners are responsible for assigning events to the appropriate windows -- the right thing will happen automatically. New window instances will be created as needed.
How can I gather such late data in a variable to do some processing on them?
You can either be sufficiently generous in your watermarking so as to avoid having any late data, and/or configure the allowed lateness to be long enough to accommodate the late events. Be aware, however, that Flink will be forced to keep all windows open that are still accepting late events, which will delay garbage collecting old windows and may consume considerable memory.
Note that this discussion assumes you want to work with time windows -- e.g. the 8msec long windows you are working with. Flink also supports count windows (e.g. group events into batches of 100), session windows, and custom window logic. Watermarks and lateness don't play any role if you are using count windows, for example.
If you want per-key results for your analytics, then use keyBy to partition the stream by key (e.g., by userId) before applying windowing. For example
stream
.keyBy(e -> e.userId)
.timeWindow(Time.seconds(10))
.reduce(...)
will produce separate results for each userId.
Update: Note that in recent versions of Flink it is now possible for windows to collect late events to a side output.
Some relevant documentation:
Event Time and Watermarks
Allowed Lateness