Non blocking streaming on Flink - apache-flink

Hi, I'm trying to run a Flink job that it should process incoming data as below. In the process operator right after keyBy(), there should be a case that takes too much time according to some property in data. Even though incoming data have different ids (which is used to keyBy() the stream), long processing code in process function blocks other incoming data. I mean the entire stream.
SingleOutputStreamOperator<Envelope> processingStream = deviceStream
.map(e -> (Envelope) e)
.keyBy((KeySelector<Envelope, String>) value -> value.eventId) // key by scenarios
.process(new RuleProcessFunction());
In RuleProcessFunction.java:
...
#Override
public void processElement(Envelope value, Context ctx, Collector<Envelope> out) throws Exception {
//handleEvent(value, ctx, out);
if (value.getEventId().equals("I")) {
System.out.println("hello i");
for (long i = 0; i < 10000000000L; i++) {
}
}
out.collect(value);
}
I expect the long-running code block should not block the entire stream. I know there is AsyncFunction for blocking IO situations but I don't know that it's correct solution for this.

Since you aren't pulling data from an external database like Cassandra, I don't think you need to use an AsyncFunction.
What it could be that you are running the flink job with a single parallelism. Try increasing the parallelism so one core isn't responsible for all of the processing as well as receiving data. Granted, there can still be back pressure if you do this. Since if the core responsible for ingesting data from the source is reading in data faster than the core(s) that are running the processFunction Flink's back pressure handling will slow the rate of ingestion.

Related

Flink 1.12.x DataSet --> Flink 1.14.x DataStream

I am trying to migrate from Flink 1.12.x DataSet api to Flink 1.14.x DataStream api. mapPartition is not available in Flink DataStream.
Our Code using Flink 1.12.x DataSet
dataset
.<few operations>
.mapPartition(new SomeMapParitionFn())
.<few more operations>
public static class SomeMapPartitionFn extends RichMapPartitionFunction<InputModel, OutputModel> {
#Override
public void mapPartition(Iterable<InputModel> records, Collector<OutputModel> out) throws Exception {
for (InputModel record : records) {
/*
do some operation
*/
if (/* some condition based on processing *MULTIPLE* records */) {
out.collect(...); // Conditional collect ---> (1)
}
}
// At the end of the data, collect
out.collect(...); // Collect processed data ---> (2)
}
}
(1) - Collector.collect invoked based on some condition after processing few records
(2) - Collector.collect invoked at the end of data
Initially we thought of using flatMap instead of mapPartition, but collector not available in close function.
https://issues.apache.org/jira/browse/FLINK-14709 - Only available in case of chained drivers
How to implement this in Flink 1.14.x DataStream? Please advise...
Note: Our application works with only finite set of data (Batch Mode)
In Flink's DataSet API, a MapPartitionFunction has two parameters. An iterator for the input and a collector for the result of the function. A MapPartitionFunction in a Flink DataStream program would never return from the first function call, because the iterator would iterate over an endless stream of records. However, Flink's internal stream processing model requires that user functions return in order to checkpoint function state. Therefore, the DataStream API does not offer a mapPartition transformation.
In order to implement similar function, you need to define a window over the stream. Windows discretize streams which is somewhat similar to mini batches but windows offer way more flexibility
Solution provided by Zhipeng
One solution could be using a streamOperator to implement BoundedOneInput
interface.
An example code could be found here [1].
[1]
https://github.com/apache/flink-ml/blob/56b441d85c3356c0ffedeef9c27969aee5b3ecfc/flink-ml-core/src/main/java/org/apache/flink/ml/common/datastream/DataStreamUtils.java#L75
Flink user mailing link: https://lists.apache.org/thread/ktck2y96d0q1odnjjkfks0dmrwh7kb3z

Flink and Kinesis stream app for non continous data

We've built a Flink app to process data from Kinesis stream. The execution flow of the app contains basic operations for filtering data based on registered types, assigning watermarks based on event timestamps, map, process and aggregate functions applied on windows of data of 5 mins as shown below:
final SingleOutputStreamOperator<Object> inputStream = env.addSource(consumer)
.setParallelism(..)
.filter(..)
.assignTimestampsAndWatermarks(..);
// Processing flow
inputStream
.map(..)
.keyBy(..)
.window(..)
.sideOutputLateData(outputTag)
.aggregate(aggregateFunction, processWindowFunction);
// store processed data to external storage
AsyncDataStream.unorderedWait(...);
Ref code for my watermark assigner:
#Override
public void onEvent(#NonNull final MetricSegment metricSegment,
final long eventTimestamp,
#NonNull final WatermarkOutput watermarkOutput) {
if (eventTimestamp > eventMaxTimestamp) {
currentMaxTimestamp = Instant.now().toEpochMilli();
}
eventMaxTimestamp = Math.max(eventMaxTimestamp, eventTimestamp);
}
#Override
public void onPeriodicEmit(#NonNull final WatermarkOutput watermarkOutput) {
final Instant maxEventTimestamp = Instant.ofEpochMilli(eventMaxTimestamp);
final Duration timeElaspsed = Duration.between(Instant.ofEpochMilli(lastCurrentTimestamp), Instant.now());
if (timeElaspsed.getSeconds() >= emitWatermarkIntervalSec) {
final long watermarkTimestamp = maxEventTimestamp.plus(1, ChronoUnit.MINUTES).toEpochMilli();
watermarkOutput.emitWatermark(new Watermark(watermarkTimestamp));
}
}
Now this app was working with good performance (in terms of latency in order of few seconds) sometime back. However, recently there was a change in the upstream system post which the data in Kinesis stream gets published to the stream in bursts (only for 2-3 hours every day). Post this change, we have seen a huge spike in latency of our app (measured using flink gauge method by recording start time in first filter method and then emitting the metric in Async method by calculating the diff in the timetamp at that point from the start timestmap). Wondering if there is any issue in using Flink apps with Kinesis stream for bursty traffic/non continuous stream of data?
Since the input stream is now idle for long periods of time, this is probably creating situations where the watermarks are held up. If this is the case, then I would expect to see a lot of variance in the latency, as it would (probably) only be the final windows for each burst whose results are delayed until the arrival of the next burst.

How to handle transient/application failures in Apache Flink?

My Flink processor listens to Kafka and the business logic in processor involves calling external REST services and there are possibilities that the services may be down. I would like to replay the tuple back into the processor and Is there anyway to do it? I have used Storm and we will be able to fail the tuple so that the the tuple will not be acknowledged. So the same tuple will be replayed to the processor.
In Flink, the tuple is being acknowledged automatically once the message is consumed by Flink-Kafka Consumer. There are ways to solve this. One such way is to publish the message back to the same queue/retry queue. But I am looking for a solution similar to Storm.
I know that Flink's Savepoint/Checkpoint will be used for fault tolerance. But in my understanding, the tuples will be replayed win case of the Flink's failure. I would like to get ideas on how to handle transient failures.
Thank you
When interacting with external systems I would recommend to use Flink's async I/O operator. It allows you to execute asynchronous tasks without blocking the execution of an operator.
If you want to retry failed operations without restarting the Flink job from the last successful checkpoint, then I would suggest to implement the retry policy yourself. It could look the following way:
new AsyncFunction<IN, OUT>() {
#Override
public void asyncInvoke(IN input, ResultFuture<OUT> resultFuture) throws Exception {
FutureUtils
.retrySuccessfulWithDelay(
() -> triggerAsyncOperation(input),
Time.seconds(1L),
Deadline.fromNow(Duration.ofSeconds(10L)),
this::decideWhetherToRetry,
new ScheduledExecutorServiceAdapter(new DirectScheduledExecutorService()))
.whenComplete((result, throwable) -> {
if (result != null) {
resultFuture.complete(Collections.singleton(result));
} else {
resultFuture.completeExceptionally(throwable);
}
})
}
}
with triggerAsyncOperation encapsulating your asynchronous operation and decideWhetherToRetry encapsulating your retry strategy. If decideWhetherToRetry returns true, then resultFuture will be completed with the value of this operation attempt.
If resultFuture is completed exceptionally, then it will trigger a failover which will cause the job to restart from that last successful checkpoint.

Buffering transformed messages(example, 1000 count) using Apache Flink stream processing

I'm using Apache Flink for stream processing.
After subscribing the messages from source(ex:Kafka, AWS Kinesis Data Streams) and then applying transformation, aggregation and etc. using Flink operators on streaming data I want to buffer final messages(ex:1000 in count) and post each batch in a single request to external REST API.
How to implement buffering mechanism(creating each 1000 records as a batch) in Apache Flink?
Flink pipileine: streaming Source --> transform/reduce using Operators --> buffer 1000 messages --> post to REST API
Appreciate your help!
I'd create a sink with state that would hold on to the messages that are passed in. When the count gets high enough (1000) the sink sends the batch. The state can be in memory (e.g. an instance variable holding an ArrayList of messages), but you should use checkpoints so that you can recover that state in case of a failure of some kind.
When your sink has checkpointed state, it needs to implement CheckpointedFunction (in org.apache.flink.streaming.api.checkpoint) which means you need to add two methods to your sink:
#Override
public void snapshotState(FunctionSnapshotContext context) throws Exception {
checkpointedState.clear();
// HttpSinkStateItem is a user-written class
// that just holds a collection of messages (Strings, in this case)
//
// Buffer is declared as ArrayList<String>
checkpointedState.add(new HttpSinkStateItem(buffer));
}
#Override
public void initializeState(FunctionInitializationContext context) throws Exception {
// Mix and match different kinds of states as needed:
// - Use context.getOperatorStateStore() to get basic (non-keyed) operator state
// - types are list and union
// - Use context.getKeyedStateStore() to get state for the current key (only for processing keyed streams)
// - types are value, list, reducing, aggregating and map
// - Distinguish between state data using state name (e.g. "HttpSink-State")
ListStateDescriptor<HttpSinkStateItem> descriptor =
new ListStateDescriptor<>(
"HttpSink-State",
HttpSinkStateItem.class);
checkpointedState = context.getOperatorStateStore().getListState(descriptor);
if (context.isRestored()) {
for (HttpSinkStateItem item: checkpointedState.get()) {
buffer = new ArrayList<>(item.getPending());
}
}
}
You can also use a timer in the sink (if the input stream is keyed/partitioned) to send periodically if the count doesn't reach your threshold.

Flink Running out of Memory

I have some fairly simple stream code that aggregating data via time windows. The windows are on the large side (1 hour, with a 2 hour bound), and the values in the streams are metrics coming from hundreds of servers. I keep running out of memory, and so I added the RocksDBStateBackend. This caused the JVM to segfault. Next I tried the FsStateBackend. Both of these backends never wrote any data to disk, but simply created a directory with the JobID. I'm running this code in standalone mode, not deployed. Any thoughts as to why the State Backends aren't writing data, and why it runs out of memory even when provided with 8GB of heap?
final SingleOutputStreamOperator<Metric> metricStream =
objectStream.map(node -> new Metric(node.get("_ts").asLong(), node.get("_value").asDouble(), node.get("tags"))).name("metric stream");
final WindowedStream<Metric, String, TimeWindow> hourlyMetricStream = metricStream
.assignTimestampsAndWatermarks(new BoundedOutOfOrdernessTimestampExtractor<Metric>(Time.hours(2)) { // set how long metrics can come late
#Override
public long extractTimestamp(final Metric metric) {
return metric.get_ts() * 1000; // needs to be in ms since Java epoch
}
})
.keyBy(metric -> metric.getMetricName()) // key the stream so we can run the windowing in parallel
.timeWindow(Time.hours(1)); // setup the time window for the bucket
// create a stream for each type of aggregation
hourlyMetricStream.sum("_value") // we want to sum by the _value
.addSink(new MetricStoreSinkFunction(parameters, "sum"))
.name("hourly sum stream")
.setParallelism(6);
hourlyMetricStream.aggregate(new MeanAggregator())
.addSink(new MetricStoreSinkFunction(parameters, "mean"))
.name("hourly mean stream")
.setParallelism(6);
hourlyMetricStream.aggregate(new ReMedianAggregator())
.addSink(new MetricStoreSinkFunction(parameters, "remedian"))
.name("hourly remedian stream")
.setParallelism(6);
env.execute("flink test");
It is tough to say why you would run out of memory unless you have a very large number of metric names (that is the only explanation I can come up with based on the code you posted).
With respect to the disk writing, RocksDB will actually use a temporary directory by default for its actual database files. You can also pass an explicit directory during configuration. You would do this by calling state.setDbStoragePath(someDirectory)
Somewhat confusingly, the FSStateBackend in fact only writes to disk during checkpointing, it otherwise is entirely heap based. So you likely did not see anything in the directory if you did not have checkpointing enabled. So that would explain why you might still run out of memory when the FSStateBackend is used.
Assuming you do have the RocksDB (or any) state backend working, you can enable checkpointing by doing:
env.enableCheckpointing(5000); // value is in MS, so however frequently you want to checkpoint
env.getCheckpointConfig.setMinPauseBetweenCheckpoints(5000); // this is to help prevent your job from making progress if checkpointing takes a bit. For large state checkpointing can take multiple seconds

Resources