Flink stateful functions : compensating callback on a timeout - apache-flink

I am implementing a use case in Flink stateful functions. My specification highlights that starting from a stateful function f a business workflow (in other words a group of stateful functions f1, f2, … fn are called either sequentially or in parallel or both ). Stateful function f waits for a result to be returned to update a local state, it as well starts a timeout callback i.e. a message to itself. At timeout, f checks if the local state is updated (it has received a result), if this is the case life is good.
However, if at timeout f discovers that it has not received a result yet, it has to launch a compensating workflow to undo any changes that stateful functions f1, f2, … fn might have received.
Does Flink stateful functions framework support such as a design pattern/use case, or it should be implemented at the application level? What is the simplest design to achieve such a solution? For instance, how to know what functions of the workflow stateful functions f1, f2, … fn were affected by the timedout invocation (where the control flow has been timed out)? How does Flink sateful functions and the concept of integrated messaging and state facilitate such a pattern?
Thank you.

I posted the question on Apache Flink mailing list and got the following response by Igal Shilman, Thanks to Igal.
The first thing that I would like to mention is that, if your original
motivation for that scenario is a concern of a transient failures such as:
did function Y ever received a message sent by function X ?
did sending a message failed?
did the target function is there to accept a message sent to it?
did the order of message got mixed up?
etc'
Then, StateFun eliminates all of these problems and a whole class of
transient errors that otherwise you would have to deal with by yourself in
your business logic (like retries, backoffs, service discovery etc').
Now if your motivating scenario is not about transient errors but more
about transactional workflows, then as Dawid mentioned you would have to
implement
this in your application logic. I think that the way you have described the
flow should map directly to a coordinating function (per flow instance)
that keeps track of results/timeouts in its internal state.
Here is a sketch:
A Flow Coordinator Function - it would be invoked with the input
necessary to kick off a flow. It would start invoking the relevant
functions (as defined by the flow's DAG) and would keep an internal state
indicating
what functions (addresses) were invoked and their completion statues.
When the flow completes successfully the coordinator can safely discard its
state.
In any case that the coordinator decides to abort the flow (an internal
timeout / an external message / etc') it would have to check its internal
state and kick off a compensating workflow (sending a special message to
the already succeed/in progress functions)
Each function in the flow has to accept a message from the coordinator,
in turn, and reply with either a success or a failure.

Related

Multithreading inside Flink's Map/Process function

I have an use case where I need to apply multiple functions to every incoming message, each producing 0 or more results.
Having a loop won't scale for me, and ideally I would like to be able to emit results as soon as they are ready instead of waiting for the all the functions to be applied.
I thought about using AsyncIO for this, maintaining a ThreadPool but if I am not mistaken I can only emit one record using this API, which is not a deal-breaker but I'd like to know if there are other options, like using a ThreadPool but in a Map/Process function so then I can send the results as they are ready.
Would this be an anti-pattern, or cause any problems in regards to checkpointing, at-least-once guarantees?
Depending on the number of different functions involved, one solution would be to fan each incoming message out to n operators, each applying one of the functions.
I fear you'll get into trouble if you try this with a multi-threaded map/process function.
How about this instead:
You could have something like a RichCoFlatMap (or KeyedCoProcessFunction, or BroadcastProcessFunction) that is aware of all of the currently active functions, and for each incoming event, emits n copies of it, each being enriched with info about a specific function to be performed. Following that can be an async i/o operator that has a ThreadPool, and it takes care of executing the functions and emitting results if and when they become available.

x.stop_sequences() is causing this UVM FATAL Item_done() called with no outstanding requests

x.stop_sequences() is causing this
UVM FATAL Item_done() called with no outstanding requests. Each call
to item_done() must be paired with a previous call to get_next_item()
Can someone tell me how to use stop_sequences while making sure the driver is inactive?
I don't think there is any built-in mechanism; you have to write the code yourself. Basically, you need to implement a reset or interrupt mechanism in your driver. Here is a skeleton idea:
task run_phase (uvm_phase phase);
forever begin
#(posedge <ENABLE INPUT>);
fork
<DO DRIVERY THINGS>;
join_none
#(negedge <ENABLE INPUT>);
disable fork;
end
endtask: run_phase
In addition to #mathew-taylor's suggestion, you may need to also consider the monitor, since it will need to discard partially assembled data collections.
If you have a reactive driver, this gets even trickier. It would be prudent to provide an boolean validity attribute in your transactions. Construction would set it to true (1'b1). If responses are outstanding upon reset, send all the outstanding responses after setting the validity field to false (1'b0). This will keep the sequencer from jamming. Any consumer of transaction data would then need to examine the validity. To simplify, you could build in the check via accessor functions and make all attributes local. This would also work on the monitor.

Flink stateful function address resolution for messaging

In Flink datastream suppose that an upstream operator is hosted on machine/task manager m, How does the upstream operator knows the machine (task manager) m’ on which the downstream operator is hosted. Is it during initial scheduling of the job sub/tasks (operators) by the JobManager that such data flow paths between downstream/upstream operators are established, and such data flow paths are fixed for the application lifetime?
More generally, consider Flink stateful functions where dynamic messaging is supported and data flow are not fixed or predefined, and given a function with key k that needs to send a message/event to a another function with key k’ how would function k finds the address of function k’ for messaging it? Does Flink runtime keeps key-machine mappings in some distributed data structure ( e.g, DHT as in Microsoft Orleans ) and every invocation of a function involves access to such data structure?
Note that I came from Spark background where given the RDD/batch model, job graph tasks are executed consecutively (broken at shuffle boundaries), and each shuffle subtasks are instructed of the machines holding the subset of keys that should be pulled/processed by that subtask….
Thank you.
Even with stateful functions, the topology of the underlying Flink job is fixed at the time the job is launched. Every stateful functions job uses a job graph more or less like this one (the ingresses vary, but the rest is always like this):
Here you see that all loaded ingresses become Flink source operators emitting the input messages,
and routers become flatmap operators chained to those sources.
The flatmaps acting as routers transform the input messages into internal event envelopes, which
essentially just wrap the message payload with its destination logical address. Envelopes are the
on-the-wire data type for all messages flowing through the stream graph.
The Stateful Functions runtime is centered on a function dispatcher operator,
which runs instances of all loaded functions across all modules.
In between the router flatmap operator and the function dispatcher operator is a keyBy operation
which re-partitions the input streams using the target destination id as the key. This
network shuffle guarantees that all messages intended for a given id are sent to the same
instance of the function dispatch operator.
On receipt, the function dispatcher extracts the target function address from the envelope, loads
that function instance, and then invokes the function with the wrapped input (which was also in the
envelope).
How do different instances of the function dispatcher send messages to each other?
This is done by co-locating each function dispatcher with a feedback operator.
All outgoing messages go through another network shuffle using the target function id as the key.
This feedback operator creates a loop, or iteration, in the job graph. Stateful Functions can have cycles, or loops, in their messaging patterns, and are not limited to processing data with a DAG.
The feedback channel is checkpointed; messages are never lost in the case of failure.
For more on this, I recommend this Flink Forward talk by Tzu-Li (Gordon) Tai: Stateful Functions: Polyglot Event-Driven Functions for Stateful Distributed Applications. The figure above is from his talk.

what are different types of state in calling (telephony)

I want to know in brief about different types of state in telephony (like waiting ,pending,ringing )and the difference between waiting and pending state in call.
There is many different terms to identify telephony states but CSTA (Computer Supported Telecommunication Services) standard from ECMA has defined a telephony model quite usable.
The telephony model goal is to describe the relation between telephonic devices and calls. The problem is that there is two legitimate points of view, on one hand you have a device oriented point of view (endpoint view) where the focus is a device involved in several calls ; on the other hand the call oriented point of view (global view) where the call evolve in time with several device.
The endpoint states in CSTA is :
Alerting/Offered – Indicates an incoming call at an endpoint. Typically the connection may be ringing or it may be in a pre-alerting (e.g. offered) condition.
Connected – Indicates that a connection is actively participating in a call. This connection state can be the result of an incoming or outgoing call.
Failed – Indicates that call progression has stalled. Typically this could represent that an outgoing call attempt that encountered a busy endpoint.
Held – Indicates that an endpoint is no longer actively participating in a call. For implementations that support multiple calls per endpoint (i.e. line), a connection could be Held while the line is used to place another call (consultation transfer on an analog line, for example).
Initiated – A transient state, usually indicating that the endpoint is initiating a service (e.g. dial-tone) or the device is being prompted to go off-hook.
Null – There is no relationship between the call and the endpoint.
Queued – Indicates that the call is temporarily suspended at a device (e.g. call has been parked, camped on).
The global view in CSTA is more complicated because a call state is the set of endpoint states but I try to briefly describe basic simple call states with Alice calls bob :
Null/Idle(no call) -> Alice(Null)-Bob(Null)
Pending(Alice dials) -> Alice(Initiated)-Bob(Null)
Originated(Alice wait) -> Alice(Connected)-Bob(Null)
Delivered(Bob set is ringing) -> Alice(Connected)-Bob(Alerting)
Established(Bob answers) -> Alice(Connected)-Bob(Connected)
Terminated(Bob hangs on) -> Alice(Connected)-Bob(Null)
And to get back to your specific concern about pending versus waiting; waiting imply that the call has been put in a wait queue :
Queued(call is queued) -> Alice(Connected)-Bob(Queued)
Pending is transient state but waiting can be quite long in that case a voice guide or music is played.
I don't know where you got that "pending" state from, but in TelephonyManager there are only 3 states:
CALL_STATE_IDLE - No activity
CALL_STATE_OFFHOOK - There's an
active call (either incoming or outgoing)
CALL_STATE_RINGING -
There's an incoming call waiting for the user to answer
You can detect between an incoming call and an outgoing by the state transition:
CALL_STATE_IDLE => CALL_STATE_OFFHOOK - suggests an outgoing call
CALL_STATE_RINGING => CALL_STATE_OFFHOOK - suggests an incoming call
See: https://developer.android.com/reference/android/telephony/TelephonyManager.html#CALL_STATE_IDLE

What is difference between MQTTAsync_onSuccess and MQTTAsync_deliveryComplete callbacks?

I'm learning about MQTT (specifically the paho C library) by reading and experimenting with variations on the async pub/sub examples.
What's the difference between the MQTTAsync_deliveryComplete callback that you set with MQTTAsync_setCallbacks() vs. the MQTTAsync_onSuccess or MQTTAsync_onSuccess5 callbacks that you set in the MQTTAsync_responseOptions struct that you pass to MQTTAsync_sendMessage() ?
All seem to deal with "successful delivery" of published messages, but from reading the example code and doxygen, I can't tell how they relate to or conflict with or supplement each other. Grateful for any guidance.
Basically MQTTAsync_deliveryComplete and MQTTAsync_onSuccess do the same, they notify you via callback about the delivery of a message. Both callbacks are executed asynchronously on a separate thread to the thread on which the client application is running.
(Both callbacks are even using the same thread in the case of the current version of the Paho client, but this is a non-documented implementation detail. This thread used by MQTTAsync_deliveryComplete and MQTTAsync_onSuccess is of course not the application thread otherwise it would not be an asynchronous callback).
The difference is that MQTTAsync_deliveryComplete callback is set once via MQTTAsync_setCallbacks and then you are informed about every delivery of a message.
In contrast to this, the MQTTAsync_onSuccess informs you once for exactly the message that you send out via MQTTAsync_sendMessage().
You can even define both callbacks, which will both be called when a message is delivered.
This gives you the flexibility to choose the approach that best suits your needs.
Artificial example
Suppose you have three different functions, each sending a specific type of message (e.g. sendTemperature(), sendHumidity(), sendAirPressure()) and in each function you call MQTTAsync_sendMessage, and after each delivery you want to call a matching callback function, then you would choose MQTTAsync_onSuccess. Then you do not need to keep track of MQTTAsync_token and associate that with your callbacks.
For example, if you want to implement a logging function instead, it would be more useful to use MQTTAsync_deliveryComplete because it is called for every delivery.
And of course one can imagine that one would want to have both the specific one with some actions and the generic one for logging, so in this case both variants could be used at the same time.
Documentation
You should note that MQTTAsync_deliveryComplete explicitly states in its documentation that it takes into account the Quality of Service Set. This is not the case in the MQTTAsync_onSuccess documentation, but of course it does not mean that this is not done in the implementation. But if this is important, you should explicitly check the source code.

Resources