Flink stateful function address resolution for messaging - apache-flink

In Flink datastream suppose that an upstream operator is hosted on machine/task manager m, How does the upstream operator knows the machine (task manager) m’ on which the downstream operator is hosted. Is it during initial scheduling of the job sub/tasks (operators) by the JobManager that such data flow paths between downstream/upstream operators are established, and such data flow paths are fixed for the application lifetime?
More generally, consider Flink stateful functions where dynamic messaging is supported and data flow are not fixed or predefined, and given a function with key k that needs to send a message/event to a another function with key k’ how would function k finds the address of function k’ for messaging it? Does Flink runtime keeps key-machine mappings in some distributed data structure ( e.g, DHT as in Microsoft Orleans ) and every invocation of a function involves access to such data structure?
Note that I came from Spark background where given the RDD/batch model, job graph tasks are executed consecutively (broken at shuffle boundaries), and each shuffle subtasks are instructed of the machines holding the subset of keys that should be pulled/processed by that subtask….
Thank you.

Even with stateful functions, the topology of the underlying Flink job is fixed at the time the job is launched. Every stateful functions job uses a job graph more or less like this one (the ingresses vary, but the rest is always like this):
Here you see that all loaded ingresses become Flink source operators emitting the input messages,
and routers become flatmap operators chained to those sources.
The flatmaps acting as routers transform the input messages into internal event envelopes, which
essentially just wrap the message payload with its destination logical address. Envelopes are the
on-the-wire data type for all messages flowing through the stream graph.
The Stateful Functions runtime is centered on a function dispatcher operator,
which runs instances of all loaded functions across all modules.
In between the router flatmap operator and the function dispatcher operator is a keyBy operation
which re-partitions the input streams using the target destination id as the key. This
network shuffle guarantees that all messages intended for a given id are sent to the same
instance of the function dispatch operator.
On receipt, the function dispatcher extracts the target function address from the envelope, loads
that function instance, and then invokes the function with the wrapped input (which was also in the
envelope).
How do different instances of the function dispatcher send messages to each other?
This is done by co-locating each function dispatcher with a feedback operator.
All outgoing messages go through another network shuffle using the target function id as the key.
This feedback operator creates a loop, or iteration, in the job graph. Stateful Functions can have cycles, or loops, in their messaging patterns, and are not limited to processing data with a DAG.
The feedback channel is checkpointed; messages are never lost in the case of failure.
For more on this, I recommend this Flink Forward talk by Tzu-Li (Gordon) Tai: Stateful Functions: Polyglot Event-Driven Functions for Stateful Distributed Applications. The figure above is from his talk.

Related

Is there a way to broadcast configuration into all task managers or all FlatMapFunctions?

We currently have a flink-based streaming job (the task is composed of complex FlatMapFunctions DAG), and an http interface for fetching configuration.
Now I hope to read configuration from the http interface through a source function every 5 minutes with a parallelism of 1, and then distribute it to all task managers or FlatMapFunctions of the job. In FlatMapFunctions, the configuration will be read and will never not be changed.
I have read the documentationThe Broadcast State Pattern, but the method in the documentation seems to only apply to the first Function of the broadcast, and other subsequent downstream FlatMapFunctions cannot read the state of the broadcast. As shown in the figure below, only Co-Process-Broadcast can obtain the broadcast, but map func 1 and map func 2 cannot.
Broadcast state graph
Similar to QUESTION but different, I have many downstream FlatMapFunctions and expect them all to get the broadcast configuration.
You can send the broadcast stream to multiple functions, so if your config state isn't big then that's likely what I'd do.
If the config state is very small (relative to the size of records being processed) then you could attach it to every incoming record in your BroadcastProcessFunction, so downstream operators have it in hand when processing each of their records.

Reading two streams (main and configs) in sequential in Flink

I have two streams, one is main stream let's say in example of fraud detection I have transactions stream and then I have second stream which is configs, in our example it is rules. So I connect main stream to config stream in order to do processing. But when first time flink starts and we are adding job it starts consuming from transactions and configs stream parallel and when wants process transaction it sometimes see that there is no config and we have to send transaction to dead letter queue. However, what I want to achieve is, if there is patential config which I could get a bit later I want to get that config first then get transaction in order to process it rather then sending it to dead letter queue. I have the same key for transactions and configs.
long story short, is there a way telling flink when first time job starts try to consume one stream until there isn't new value then start processing main stream? How I can make them kind of sequential?
The recommended way to approach this is to connect the 2 streams and apply a RichCoFlatMap that will allow you to buffer events from main while you're waiting to receive the config events.
Check out this useful section of the Flink tutorials. The very last paragraph actually describes your problem.
It is important to recognize that you have no control over the order in which the flatMap1 and flatMap2 callbacks are called. These two input streams are racing against each other, and the Flink runtime will do what it wants to regarding consuming events from one stream or the other. In cases where timing and/or ordering matter, you may find it necessary to buffer events in managed Flink state until your application is ready to process them. (Note: if you are truly desperate, it is possible to exert some limited control over the order in which a two-input operator consumes its inputs by using a custom Operator that implements the InputSelectable interface.
So in a nutshell you should connect your 2 streams and have some kind of ListState where you can "buffer" your main elements while waiting to receive the rules. When you receive an element from the config stream, you check whether you had some pending elements "waiting" for that config in your ListState (your buffer). If you do, you can then process these elements and emit them through the collector of your flatmap.
Starting with version 1.16, you can use the hybrid source support in Flink to read all of once source (configs, in your case) before reading the second source. Though I imagine you'd have to map the events to an Either<config, transaction> so that the data stream has consistent record types.

Flink stateful functions : compensating callback on a timeout

I am implementing a use case in Flink stateful functions. My specification highlights that starting from a stateful function f a business workflow (in other words a group of stateful functions f1, f2, … fn are called either sequentially or in parallel or both ). Stateful function f waits for a result to be returned to update a local state, it as well starts a timeout callback i.e. a message to itself. At timeout, f checks if the local state is updated (it has received a result), if this is the case life is good.
However, if at timeout f discovers that it has not received a result yet, it has to launch a compensating workflow to undo any changes that stateful functions f1, f2, … fn might have received.
Does Flink stateful functions framework support such as a design pattern/use case, or it should be implemented at the application level? What is the simplest design to achieve such a solution? For instance, how to know what functions of the workflow stateful functions f1, f2, … fn were affected by the timedout invocation (where the control flow has been timed out)? How does Flink sateful functions and the concept of integrated messaging and state facilitate such a pattern?
Thank you.
I posted the question on Apache Flink mailing list and got the following response by Igal Shilman, Thanks to Igal.
The first thing that I would like to mention is that, if your original
motivation for that scenario is a concern of a transient failures such as:
did function Y ever received a message sent by function X ?
did sending a message failed?
did the target function is there to accept a message sent to it?
did the order of message got mixed up?
etc'
Then, StateFun eliminates all of these problems and a whole class of
transient errors that otherwise you would have to deal with by yourself in
your business logic (like retries, backoffs, service discovery etc').
Now if your motivating scenario is not about transient errors but more
about transactional workflows, then as Dawid mentioned you would have to
implement
this in your application logic. I think that the way you have described the
flow should map directly to a coordinating function (per flow instance)
that keeps track of results/timeouts in its internal state.
Here is a sketch:
A Flow Coordinator Function - it would be invoked with the input
necessary to kick off a flow. It would start invoking the relevant
functions (as defined by the flow's DAG) and would keep an internal state
indicating
what functions (addresses) were invoked and their completion statues.
When the flow completes successfully the coordinator can safely discard its
state.
In any case that the coordinator decides to abort the flow (an internal
timeout / an external message / etc') it would have to check its internal
state and kick off a compensating workflow (sending a special message to
the already succeed/in progress functions)
Each function in the flow has to accept a message from the coordinator,
in turn, and reply with either a success or a failure.

Flink Statefun concurrent state update

I'm trying to implement messaging scenario using apache flink stateful functions.
One of my state is able to updated by two different functions which is provided to MatchBinder. These two functions basically checks the current state and updates the state accordingly.
What happens if these two functions are called concurrently for the same key?
Is there a queue mechanism for stateful functions called for the same key?
Can we lock the state access/update for sequential access ?
What happens if these two functions are called concurrently for the
same key?
The MatchBinder is basically a convenient way to write a single StateFun function, that starts its execution by first matching the type (or properties) of the incoming message. It is basically a way to avoid writing code like this:
...
if (message instanceof A) {
handleA((A) message);
} else if (message instanceof B) {
handleB((B) message);
}
...
So in reality, although you are providing "different" Java functions to each bind case, this is the same StateFun function being invoked and the correct bind case would be selected.
Is there a queue mechanism for stateful functions called for the same
key?
Yes, StateFun functions would be invoked sequentially per address. While a function is applied for a specific address, no other message for that address would be applied concurrently. This comes almost for free, thanks to having Apache Flink as the actual runtime.
Can we lock the state access/update for sequential access ?
State access and modifications are atomic and sequential per address.

All sources readiness before data flows-in aross whole Flink job/data flow

If we have several sources in our data flow/job, and some of them implement RichSourceFunction, can we assume that RichSourceFunction.open of these sources will be called and complete before any data will enter into this entire data flow (through any of the many sources) - that is even if the sources are distributed on different task managers?
Flink guarantees to call the open() method of a function instance before it passes the first record to that instance. The guarantee is scoped only to a function instance, i.e., it might happen that the open() method of a function instance was not called yet, while another function instance (of the same or another function) started processing records already.
Flink does not globally coordinate open() calls across function instances.

Resources