Flink Statefun concurrent state update - apache-flink

I'm trying to implement messaging scenario using apache flink stateful functions.
One of my state is able to updated by two different functions which is provided to MatchBinder. These two functions basically checks the current state and updates the state accordingly.
What happens if these two functions are called concurrently for the same key?
Is there a queue mechanism for stateful functions called for the same key?
Can we lock the state access/update for sequential access ?

What happens if these two functions are called concurrently for the
same key?
The MatchBinder is basically a convenient way to write a single StateFun function, that starts its execution by first matching the type (or properties) of the incoming message. It is basically a way to avoid writing code like this:
...
if (message instanceof A) {
handleA((A) message);
} else if (message instanceof B) {
handleB((B) message);
}
...
So in reality, although you are providing "different" Java functions to each bind case, this is the same StateFun function being invoked and the correct bind case would be selected.
Is there a queue mechanism for stateful functions called for the same
key?
Yes, StateFun functions would be invoked sequentially per address. While a function is applied for a specific address, no other message for that address would be applied concurrently. This comes almost for free, thanks to having Apache Flink as the actual runtime.
Can we lock the state access/update for sequential access ?
State access and modifications are atomic and sequential per address.

Related

Multithreading inside Flink's Map/Process function

I have an use case where I need to apply multiple functions to every incoming message, each producing 0 or more results.
Having a loop won't scale for me, and ideally I would like to be able to emit results as soon as they are ready instead of waiting for the all the functions to be applied.
I thought about using AsyncIO for this, maintaining a ThreadPool but if I am not mistaken I can only emit one record using this API, which is not a deal-breaker but I'd like to know if there are other options, like using a ThreadPool but in a Map/Process function so then I can send the results as they are ready.
Would this be an anti-pattern, or cause any problems in regards to checkpointing, at-least-once guarantees?
Depending on the number of different functions involved, one solution would be to fan each incoming message out to n operators, each applying one of the functions.
I fear you'll get into trouble if you try this with a multi-threaded map/process function.
How about this instead:
You could have something like a RichCoFlatMap (or KeyedCoProcessFunction, or BroadcastProcessFunction) that is aware of all of the currently active functions, and for each incoming event, emits n copies of it, each being enriched with info about a specific function to be performed. Following that can be an async i/o operator that has a ThreadPool, and it takes care of executing the functions and emitting results if and when they become available.

About States and what is better for Flink

Lets assume that I have a job with max.parallelism=4 and a RichFlatMapFunction which is working with MapState. What is the best way to create the MapStateDescriptor? into the RichFlatMapFunction which means that for each instance of this class I will have a descriptor, or create a single instance of the descriptor, for example: public static MapStateDescriptor descriptor in a single class and call it from the RichFlatMapFunction? Because doing it on this way I will have just one MapStateDescriptor instead of 4, or did I misunderstood something?
Kind regards!
A few points...
Since each of your RichFlatMapFunction sub-tasks can be running in a different JVM on a different server, how would they share a static MapStateDescriptor?
Note that Flink's "max parallelism" isn't the same as the default environment parallelism. In general you want to leave the max parallelism value alone, and (if necessary) set your environment parallelism equal to the number of slots in your cluster.
The MapStateDescriptor doesn't store state. It tells Flink how to create the state. In your RichFlatMapFunction operator's open() call is where you'll be creating the state using the state descriptor.
So net-net is don't bother using a static MapStateDescriptor, it won't help. Just create your state (as per many examples) in your open() method.

Flink stateful functions : compensating callback on a timeout

I am implementing a use case in Flink stateful functions. My specification highlights that starting from a stateful function f a business workflow (in other words a group of stateful functions f1, f2, … fn are called either sequentially or in parallel or both ). Stateful function f waits for a result to be returned to update a local state, it as well starts a timeout callback i.e. a message to itself. At timeout, f checks if the local state is updated (it has received a result), if this is the case life is good.
However, if at timeout f discovers that it has not received a result yet, it has to launch a compensating workflow to undo any changes that stateful functions f1, f2, … fn might have received.
Does Flink stateful functions framework support such as a design pattern/use case, or it should be implemented at the application level? What is the simplest design to achieve such a solution? For instance, how to know what functions of the workflow stateful functions f1, f2, … fn were affected by the timedout invocation (where the control flow has been timed out)? How does Flink sateful functions and the concept of integrated messaging and state facilitate such a pattern?
Thank you.
I posted the question on Apache Flink mailing list and got the following response by Igal Shilman, Thanks to Igal.
The first thing that I would like to mention is that, if your original
motivation for that scenario is a concern of a transient failures such as:
did function Y ever received a message sent by function X ?
did sending a message failed?
did the target function is there to accept a message sent to it?
did the order of message got mixed up?
etc'
Then, StateFun eliminates all of these problems and a whole class of
transient errors that otherwise you would have to deal with by yourself in
your business logic (like retries, backoffs, service discovery etc').
Now if your motivating scenario is not about transient errors but more
about transactional workflows, then as Dawid mentioned you would have to
implement
this in your application logic. I think that the way you have described the
flow should map directly to a coordinating function (per flow instance)
that keeps track of results/timeouts in its internal state.
Here is a sketch:
A Flow Coordinator Function - it would be invoked with the input
necessary to kick off a flow. It would start invoking the relevant
functions (as defined by the flow's DAG) and would keep an internal state
indicating
what functions (addresses) were invoked and their completion statues.
When the flow completes successfully the coordinator can safely discard its
state.
In any case that the coordinator decides to abort the flow (an internal
timeout / an external message / etc') it would have to check its internal
state and kick off a compensating workflow (sending a special message to
the already succeed/in progress functions)
Each function in the flow has to accept a message from the coordinator,
in turn, and reply with either a success or a failure.

Flink stateful function address resolution for messaging

In Flink datastream suppose that an upstream operator is hosted on machine/task manager m, How does the upstream operator knows the machine (task manager) m’ on which the downstream operator is hosted. Is it during initial scheduling of the job sub/tasks (operators) by the JobManager that such data flow paths between downstream/upstream operators are established, and such data flow paths are fixed for the application lifetime?
More generally, consider Flink stateful functions where dynamic messaging is supported and data flow are not fixed or predefined, and given a function with key k that needs to send a message/event to a another function with key k’ how would function k finds the address of function k’ for messaging it? Does Flink runtime keeps key-machine mappings in some distributed data structure ( e.g, DHT as in Microsoft Orleans ) and every invocation of a function involves access to such data structure?
Note that I came from Spark background where given the RDD/batch model, job graph tasks are executed consecutively (broken at shuffle boundaries), and each shuffle subtasks are instructed of the machines holding the subset of keys that should be pulled/processed by that subtask….
Thank you.
Even with stateful functions, the topology of the underlying Flink job is fixed at the time the job is launched. Every stateful functions job uses a job graph more or less like this one (the ingresses vary, but the rest is always like this):
Here you see that all loaded ingresses become Flink source operators emitting the input messages,
and routers become flatmap operators chained to those sources.
The flatmaps acting as routers transform the input messages into internal event envelopes, which
essentially just wrap the message payload with its destination logical address. Envelopes are the
on-the-wire data type for all messages flowing through the stream graph.
The Stateful Functions runtime is centered on a function dispatcher operator,
which runs instances of all loaded functions across all modules.
In between the router flatmap operator and the function dispatcher operator is a keyBy operation
which re-partitions the input streams using the target destination id as the key. This
network shuffle guarantees that all messages intended for a given id are sent to the same
instance of the function dispatch operator.
On receipt, the function dispatcher extracts the target function address from the envelope, loads
that function instance, and then invokes the function with the wrapped input (which was also in the
envelope).
How do different instances of the function dispatcher send messages to each other?
This is done by co-locating each function dispatcher with a feedback operator.
All outgoing messages go through another network shuffle using the target function id as the key.
This feedback operator creates a loop, or iteration, in the job graph. Stateful Functions can have cycles, or loops, in their messaging patterns, and are not limited to processing data with a DAG.
The feedback channel is checkpointed; messages are never lost in the case of failure.
For more on this, I recommend this Flink Forward talk by Tzu-Li (Gordon) Tai: Stateful Functions: Polyglot Event-Driven Functions for Stateful Distributed Applications. The figure above is from his talk.

All sources readiness before data flows-in aross whole Flink job/data flow

If we have several sources in our data flow/job, and some of them implement RichSourceFunction, can we assume that RichSourceFunction.open of these sources will be called and complete before any data will enter into this entire data flow (through any of the many sources) - that is even if the sources are distributed on different task managers?
Flink guarantees to call the open() method of a function instance before it passes the first record to that instance. The guarantee is scoped only to a function instance, i.e., it might happen that the open() method of a function instance was not called yet, while another function instance (of the same or another function) started processing records already.
Flink does not globally coordinate open() calls across function instances.

Resources