Apache Flink Stateful Functions forwarding the same message to N functions - apache-flink

I'm trying to send incoming messages to multiple stateful functions but I couldn't fully understand how to do. For the sake of understandability let's say one of my stateful function getting some integers and sending them to couple of remote functions. These functions adds this integers to their state values ​​and saves it as the new state.
When one of these 2 remote functions fails, the other should continue to work the same way.
When the failed function recovered, it should process messages that it cannot process during failure.
I thought about sending them one after another as below, but I don't think it will work
context.send(RemoteFuncType1,someID,someInteger);
context.send(RemoteFuncType2,someID,someInteger);
...
how can I do this in a fault tolerant way?
if possible how it works in the background?

The way you are suggesting to do it is the correct way!
StateFun would deliver the messages to the remote functions in a consistent manner. If one of the functions is experiencing a short downtime, StateFun would retry sending the message until:
It would successfully deliver it (with back off)
A maximum timeout for retries would be reached. When a timeout is reached the whole StateFun job would be rewind to a
previously consistent checkpoint.
Since StateFun is managing message delivery and the state of the functions (remote included) it would make sure that a consistent state and message would be delivered to each function.
In your example: the second remote function would receive someInteger with whatever state it had before, once recovered.
To get a deeper understanding of how checkpointing works in Flink and how it enables exactly once processing I’d recommend the following:
https://ci.apache.org/projects/flink/flink-docs-stable/internals/stream_checkpointing.html

Related

How to synchronize read and multiple writes on sockets

I am asking some theoretical questions, since it would be really hard for me to post the code of the project involved, which is composed of too many files.
I am writing code for a server program, which has to communicate with several clients that send a variety of different requests and expect answers for each of them.
The server is multi-threaded, so every single thread mutually accesses a client connections list and performs all the operations within the request.
The two parts are communicating viaAF_UNIX sockets and for mutual exclusion I have used locks and condition variables.
Now my issue is this: with certain interleavings of execution, the server side ends up making two simultaneous writes to the client (actually two worker threads within the server both send a message to the same client), which is expecting just one. For unlucky interleavings, the client only gets to read one of the messages sent by the server, but I have noticed that sometimes this doesn't happen and everything works fine, this is because the two different requests to the client happen to be distanced in time.
The issue happens even when one server thread does a write before the client calls read, and in between those two events, another server thread calls another write to the same client. In this case, only the most recent write is received by the client.
From what I have understood regarding blocking mode (which is what I am using)read() and write() should block when no one's receiving from the other side. Now I don't understand why the second write from the server worker gets completely lost? Shouldn't it block if no one's receiving and then resume when the client calls read?
Should I usemutual exclusion on the socket so that the second write should wait for the previous one to be completely finished?
I hope that my issue is clear even though I am not showing any code, if necessary please tell me and I will try to post some pieces of code. I think that the issue might just be a conceptual thing, regarding my understood of read(), write() and mutual exclusion, but I understand that the issue might be somewhere else and that without code it would be hard to figure out! Thank you!

select() equivalence in I/O Completion Ports

I am developing a proxy server using WinSock 2.0 in Windows. If I wanted to develop it in blocking model, select() was the way to wait for client or remote server to receive data from. Is there any applicable way to do this so using I/O Completion Ports?
I used to have two Contexts for two directions of data using I/O Completion Ports. But having a WSARecv pending couldn't receive any data from remote server! I coudn't find the problem.
Thanks in advance.
EDIT. Here's the WorkerThread Code on currently developed I/O Completion Ports. But I am asking about how to implement select() equivalence.
I/O Completion Ports provide an indication of when an I/O operation completes, they do not indicate when it is possible to initiate an operation. In many situations this doesn't actually matter. Most of the time the overlapped I/O model will work perfectly well if you assume it is always possible to initiate an operation. The underlying operating system will, in most cases, simply do the right thing and queue the data for you until it is possible to complete the operation.
However, there are some situations when this is less than ideal. For example you can always send to a socket using overlapped I/O. You can do this even when the remote peer is not reading and the TCP stack has started to use flow control and has filled the TCP window... This simply uses resources on your local machine in a completely uncontrolled manner (not entirely uncontrolled, but controlled by the peer, which is not ideal). I write about this here and in many situations you DO need to actively manage this kind of thing by tracking how many outstanding I/O write requests you have and using that as an indication of 'readiness to send'.
Likewise if you want a 'readiness to recv' indication you could issue a 'zero byte' read on the socket. This is a read which is issued with a zero length buffer. The read returns when there is data to read but no data is returned. This would give you the indication that there is data to be read on the connection but is, IMHO, pointless unless you are suffering from the very unlikely situation of hitting the I/O page lock limit, as you may as well read the data when it becomes available rather than forcing multiple kernel to user mode transitions.
In summary, you don't really need an answer to your question. You need to look at how the API works and write your code to work with it rather than trying to force the API to work in a way that other APIs that you are familiar with work.

Libevent: multithreading to handle HTTP keep-alive connections

I am writing an HTTP reverse-proxy in C using Libevent and I would like to implement multithreading to make use of all available CPU cores. I had a look at this example: http://roncemer.com/software-development/multi-threaded-libevent-server-example/
In this example it appears that one thread is used for the full duration of a connection, but for HTTP 1.1 I don't think this will be the most effective solution as connections are kept alive by default after each request so that they can be reused later. I have noticed that even one browser panel can open several connections to one server and keep them open until the tab is closed which would immediately exhaust the thread pool. For an HTTP 1.1 proxy there will be many open connections but only very few of them actively transferring data at a given moment.
So I was thinking of an alternative, to have one event base for all incoming connections and have the event callback functions delegate to worker threads. This way we could have many open connections and make use of a thread only when data arrives on a connection, returning it back to the pool once the data has been dealt with.
My question is: is this a suitable implementation of threads with Libevent?
Specifically – is there any need to have one event base per connection as in the example or is one for all connections sufficient?
Also – are there any other issues I should be aware of?
Currently the only problem I can see is with burstiness, when data is received in many small chunks triggering many read events per HTTP response which would lead to a lot of handing-off to worker threads. Would this be a problem? If it would be, then it could be somewhat negated using Libevent's watermarking, although I'm not sure how that works if a request arrives in two chunks and the second chunk is sufficiently small to leave the buffer size below the watermark. Would it then stay there until more data arrives?
Also, I would need to implement scheduling so that a chunk is only sent once the previous chunk has been fully sent.
The second problem I thought of is when the thread pool is exhausted, i.e. all threads are currently doing something, and another read event occurs – this would lead to the read event callback blocking. Does that matter? I thought of putting these into another queue, but surely that's exactly what happens internally in the event base. On the other hand, a second queue might be a good way to organise scheduling of the chunks without blocking worker threads.

Reconnecting with hiredis

I'm trying to reconnect to the Redis server on disconnect.
I'm using redisAsyncConnect and I've setup a callback on disconnect. In the callback I try to reconnect with the same command I use at the very start of the program to establish the connection but it's not working. Can't seem to reconnect.
Can anyone help me out with an example?
Managing Redis (re)connections asynchronously is a bit tricky when an event loop is used.
Here is an example implementing a small zset polling daemon connecting to a list of Redis instances, which is resilient to disconnection events. The ae event loop is used (it is the one used by Redis itself).
http://gist.github.com/4149768
Check the following functions:
connectCallback
disconnectCallback
checkConnections
reconnectIfNeeded
The main daemon loop does its activity only when the connection is available. Once per second, a second time initiated callback checks if some connections have to be reestablished. We have found this mechanism quite reliable.
Note: error management is crude in this example for brevity sake. Real production code should manage errors in a more graceful way.
One tricky point when dealing with multiple asynchronous connections is the fact there is no user defined contextual data passed as a parameter of the corresponding callbacks. Cleaning the data associated to a connection after a disconnection event can be a bit difficult.

Is there any benefit to using epoll with a very small number of file descriptors?

Would the following single threaded UDP client application see a performance benefit from using epoll over simply calling recvfrom/sendto on non-blocking sockets?
Let me explain the client.
I am writing single threaded UDP based client (custom protocol) that both sends and receives data using non-blocking I/O and my colleague suggested I use epoll for this.
The client sends and receives multiple packets of information that are all associated with a unique session id and multiple sessions can be run simultaneously.
If I use epoll, there will be a limited number of maybe 10-20 file descriptors which epoll_wait could wait on. Each file descriptor would be associated with one session. So that's maximum 10 - 20 sessions and this number will be enforced.
Each session has it's own state machine. From a single thread I need to run each state machine reasonably frequently and poll the associated socket as well.
In my case, I'd have to use epoll_wait with a timeout of zero or some very small value so that I can give CPU time to run the state machines for each session.
If there is data for a session then it needs to be directed to the associated state machine.
However, I can't really see much benefit of this design with such a small number of file descriptors.
The way I see it is I have two design options:
1. In my main loop using epoll I can poll the descriptors using epoll_wait with either a small timeout or no timeout.
How it handles data at this point is where I'm getting a bit stuck... either I read it right away and then throw it into a queue for each state machine to pick up when it's run, or I set a flag on the state machine to tell it that data is waiting and when the state machine runs it'll pick it up with a call to recvfrom. Or, I read the data and handle it right away and run the state machine for it.
Or...
2, Just run each state machine from the main loop and call recvfrom. If I get some data, handle it. If I don't then do whatever else the state machine requires. Is there huge overhead calling recvfrom when there is no data?
With going the epoll route I'm coding in some extra complexity. If there is a strong likelyhood for it be faster in my case then I will start doing it. However, if the second way which really simple works just as well then I would not use epoll.
Any thoughts?
No, and in fact performance will be much worse using epoll if adding and removing file descriptors from the set to poll is anything but an extremely rare event. With poll, a single syscall performs the entire operation. With epoll, you need multiple syscalls to modify the set and then wait on it.
Unless you're writing a server that's intended to scale to tens, hundreds, or thousands of thousands of long-term persistent connections, epoll is not only premature optimization, but actually a pessimization. It's also completely nonstandard and non-portable.

Resources