Is it possible to have a window per sub-task/partition - apache-flink

I am working with Flink using data from a Kafka topic that has multiple partitions. Is it possible to have a window on each parallel sub-task/partition without having to use keyBy (as I want to avoid the shuffle). Based on the documentation, I can only choose between keyed windows (which requires a shuffle) or global windows (which reduces parallelism to 1).
The motivation is that I want to use a CountWindow to batch the messages with a custom trigger that also fires after a set amount of processing time. So per Kafka partition, I want to batch N records together or wait X amount of processing time before sending the batch downstream.
Thanks!

There's no good way to do that.
One workaround would be to implement the batching and timeout logic in a custom sink. You'd want to implement the CheckpointedFunction interface to make your solution fault tolerant, and you could use the Sink.ProcessingTimeService.ProcessingTimeCallback interface for the timeouts.
UPDATE:
Just thought of another solution, similar to the one in your comment below. You could implement a custom source that sends a periodic heartbeat, and broadcast that to a BroadcastProcessFunction.

Related

Flink Sorting A Global Window On A Bounded Stream

I've built a flink application to consume data directly from Kafka but in the event of a system failure or a need to re-process this data, I need to instead consume the data from a series of files in S3. The order in which messages are processed is very important, so I'm trying to figure out how I can sort this bounded stream before pushing these messages through my existing application.
I've tried inserting the stream into a temporary table using the Table API but the sort operator always uses a maximum parallelism of 1 despite sorting on two keys. Can I leverage these keys somehow to increase this parallelism ?
I've been thinking of using a keyed global window but I'm not sure how to trigger on a bounded stream and sort the window. Is Flink a good choice for this kind of batch processing and would it be a good idea to write this using the old Dataset API?
Edit
After some experimentation, I've decided that Flink isn't the correct solution and Spark is just more feature rich in this particular use case. Im trying to consume and sort over 1.5tb of data in each job. Unfortunately some of these partitions contain maybe 100G or more and everything must be in order before I can break those groups up further, which makes sorting this data in the operators difficult.
My requirements are simple, ingest the data from S3 and sort by channel ID before flushing it to disk. Having to think about windows and timestamp assignors just complicates a relatively simple task that can be achieved in 4 lines of Spark code.
Have you considered using the HybridSource for your use case, since this is exactly for what is was designed? https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/connectors/datastream/hybridsource/
The DataSet API is deprecated and I would not recommend to use it.

Need advice on migrating from Flink DataStream Job to Flink Stateful Functions 3.1

I have a working Flink job built on Flink Data Stream. I want to REWRITE the entire job based on the Flink stateful functions 3.1.
The functions of my current Flink Job are:
Read message from Kafka
Each message is in format a slice of data packets, e.g.(s for slice):
s-0, s-1 are for packet 0
s-4, s-5, s-6 are for packet 1
The job merges slices into several data packets and then sink packets to HBase
Window functions are applied to deal with disorder of slice arrival
My Objectives
Currently I already have Flink Stateful Functions demo running on my k8s. I want to do rewrite my entire job upon on stateful functions.
Save data into MinIO instead of HBase
My current plan
I have read the doc and got some ideas. My plans are:
There's no need to deal with Kafka anymore, Kafka Ingress(https://nightlies.apache.org/flink/flink-statefun-docs-release-3.0/docs/io-module/apache-kafka/) handles it
Rewrite my job based on java SDK. Merging are straightforward. But How about window functions?
Maybe I should use persistent state with TTL to mimic window function behaviors
Egress for MinIO is not in the list of default Flink I/O Connectors, therefore I need to write my custom Flink I/O Connector for MinIO myself, according to https://nightlies.apache.org/flink/flink-statefun-docs-release-3.0/docs/io-module/flink-connectors/
I want to avoid Embedded module because it prevents scaling. Auto scaling is the key reason why I want to migrate to Flink stateful functions
My Questions
I don't feel confident with my plan. Is there anything wrong with my understandings/plan?
Are there any best practice I should refer to?
Update:
windows were used to assemble results
get a slice, inspect its metadata and know it is the last one of the packet
also knows the packet should contains 10 slices
if there are already 10 slices, merge them
if there are not enough slices yet, wait for sometime (e.g. 10 minutes) and then either merge or record packet errors.
I want to get rid of windows during the rewrite, but I don't know how
Background: Use KeyedProcessFunctions Rather than Windows to Assemble Related Events
With the DataStream API, windows are not a good building block for assembling together related events. The problem is that windows begin and end at times that are aligned to the clock, rather than being aligned to the events. So even if two related events are only a few milliseconds apart they might be assigned to different windows.
In general, it's more straightforward to implement this sort of use case with keyed process functions, and use timers as needed to deal with missing or late events.
Doing this with the Statefun API
You can use the same pattern mentioned above. The function id will play the same role as the key, and you can use a delayed message instead of a timer:
as each slice arrives, add it to the packet that's being assembled
if it is the first slice, send a delayed message that will act as a timeout
when all the slices have arrived, merge them and send the packet
if the delayed message arrives before the packet is complete, do whatever is appropriate (e.g., go ahead and send the partial packet)

Flink Exponential Backoff On Just One Task

I have a Flink app with multiple tasks. In the event that one of those tasks has an error during processing, I'd like to do an exponential backoff on that one task without restarting the whole job. When using Kafka directly rather than through Flink, I can pause the consumer and then resume it later after a certain amount of time has passed. Is it possible to pause a Flink data source or task? Is there another way to accomplish an exponential backoff on just one task while not affecting the other tasks?
In general Flink itself does not offer such capability. Now, it may be possible to mimic such capability in some operators like specific sinks or AsyncIO. For Kafka, for example You may setup producer in a way that it will retry failed messages and wait given amount of time before any subsequent retry. This isn't exactly exponential backoff, but as close as You can get without writing own sink.
So, it generally depends on the place You want to achieve that, it may be possible to have backoff that is not exponential out-of-the-box. As a last resort, You may try to simply write Your own sink that implements exponential backoff.

Would it be possible for us to create an object that is accessible for all operators in Apache Flink?

I am building a class that helps monitor the numerical performance of multiple operators. My current idea of doing it is to create a method like update(), and call this method every time there is a need for the operators to update something. However, this means I need to create an object that is visible to every single operator that I wish to monitor. Would this be possible? Or would there be any better solutions? Thanks!
If you know that all operators are running in a single JVM (you have one Task Manager) then you can create a singleton that all operators can use to log activity.
If it's a Flink cluster with multiple TMs, then each is running in their own JVM, so you'd have to use some distributed system to record this activity.
The cheesy solution would be to use logging, and then post-process the logs to extract the information you need.
Or you might be able to use Flink's built-in metrics to collect the information you need.

Data/event exchange between jobs

Is it possible in Apache Flink, to create an application, which consists of multiple jobs who build a pipeline to process some data.
For example, consider a process with an input/preprocessing stage, a business logic and an output stage.
In order to be flexible in development and (re)deployment, I would like to run these as independent jobs.
Is it possible in Flink to built this and directly pipe the output of one job to the input of another (without external components)?
If yes, where can I find documentation about this and can it buffer data if one of the jobs is restarted?
If no, does anyone have experience with such a setup and point me to a possible solution?
Thank you!
If you really want separate jobs, then one way to connect them is via something like Kafka, where job A publishes, and job B (downstream) subscribes. Once you disconnect the two jobs, though, you no longer get the benefit of backpressure or unified checkpointing/saved state.
Kafka can do buffering of course (up to some max amount of data), but that's not a solution to a persistent different in performance, if the upstream job is generating data faster than the downstream job can consume it.
I imagine you could also use files as the 'bridge' between jobs (streaming file sink and then streaming file source), though that would typically create significant latency as the downstream job has to wait for the upstream job to decide to complete a file, before it can be consumed.
An alternative approach that's been successfully used a number of times is to provide the details of the preprocessing and business logic stages dynamically, rather than compiling them into the application. This means that the overall topology of the job graph is static, but you are able to modify the processing logic while the job is running.
I've seen this done with purpose-built DSLs, PMML models, Javascript (via Rhino), Groovy, Java classloading, ...
You can use a broadcast stream to communicate/update the dynamic portions of the processing.
Here's an example of this pattern, described in a Flink Forward talk by Erik de Nooij from ING Bank.

Resources