I have a DBus server which exposes a method that requires a huge time to complete (about 3 minutes).
The client performs a synchronous call to this method.
The problem is, after exactly 25 secs the client throws an error because 'did not receive a reply'.
Unfortunately, I cannot change the client, so I cannot make the call asynchronous, as it should be.
I tried to use this line in my server configuration:
<limit name = "reply_timeout">240000</limit>
but the situation does not change.
Any idea?
That limit parameter configures the bus daemon, which is only one of the processes involved. The others are the client and the server, and the particular D-Bus library used on each end may have a default timeout for synchronous messages. And 25 seconds is indeed the _DBUS_DEFAULT_TIMEOUT_VALUE in libdbus, the C reference implementation.
Changing the timeout in the client, for example in dbus_connection_send_with_reply_and_block, is easier than changing the API to be asynchronous.
Related
I'm using VxWorks 5.4 and attempting to connect to a server via TCP. A server which I'm going to be sending logs to, but for some reason at boot it fails or takes even up to 6 seconds - and is blocking the continuation of the task that the connection attempt was made in, which obviously is a big no no.
I have checked if the problem is one the server side by making a simple c program in windows that would connect to that server, and it takes no time at all (milliseconds).
I have "solved" the problem by making a task that would attempt "connectwithtineout" every 1-2 seconds and it does work (initiates the connection after around 2 fails in around 20ms), but I don't really like this approach and would have liked to initiate the actual connection when whatever I need that I'm missing is there and up instead of checking if I can connect every time.
After trying to investigate what the issue could have been, eventually the problem was about how a session is being closed between my system and the server.
You see, when you have a client running on some app on your windows/ or whatever other system, when you shut it down, it goes through some processes that close the session properly.
That is not the case in my system where to close it I essentially unplug the wire - thereby not having my system go through a shutdown process that involves properly closing the session.
After the system is up again, the connect function cannot be performed because my system tries to make the same session as the "dead one" which the server thinks is running.
Solving the problem was easy from the server side, just have a keepalive functionality - if your system doesn't respond for a while that you decide, close the session.
I'm trying to send incoming messages to multiple stateful functions but I couldn't fully understand how to do. For the sake of understandability let's say one of my stateful function getting some integers and sending them to couple of remote functions. These functions adds this integers to their state values and saves it as the new state.
When one of these 2 remote functions fails, the other should continue to work the same way.
When the failed function recovered, it should process messages that it cannot process during failure.
I thought about sending them one after another as below, but I don't think it will work
context.send(RemoteFuncType1,someID,someInteger);
context.send(RemoteFuncType2,someID,someInteger);
...
how can I do this in a fault tolerant way?
if possible how it works in the background?
The way you are suggesting to do it is the correct way!
StateFun would deliver the messages to the remote functions in a consistent manner. If one of the functions is experiencing a short downtime, StateFun would retry sending the message until:
It would successfully deliver it (with back off)
A maximum timeout for retries would be reached. When a timeout is reached the whole StateFun job would be rewind to a
previously consistent checkpoint.
Since StateFun is managing message delivery and the state of the functions (remote included) it would make sure that a consistent state and message would be delivered to each function.
In your example: the second remote function would receive someInteger with whatever state it had before, once recovered.
To get a deeper understanding of how checkpointing works in Flink and how it enables exactly once processing I’d recommend the following:
https://ci.apache.org/projects/flink/flink-docs-stable/internals/stream_checkpointing.html
Using OkHttp3 with Retrofit to send simple post request to a service inside local network. I use the same implementation in different projects. The only difference in the scenario is it runs on raspberry pi (armv6l) platform.
The symptom is simply explained i invoke a request synchronous or asynchronous doesn't make a difference and those request are executed delayed for from around 30 up to 60 seconds.
I don't know what to do and investigate it deeper. Also wireshark exposes me exactly the same.
If i invoke the request via cURL it works as expected.
Thank you for all assistance to solve this issue.
I have a program that needs to:
Handle 20 connections. My program will act as client in every connection, each client connecting to a different server.
Once connected my client should send a request to the server every second and wait for a response. If no request is sent within 9 seconds, the server will time out the client.
It is unacceptable for one connection to cause problems for the rest of the connections.
I do not have access to threads and I do not have access to non-blocking sockets. I have a single-threaded program with blocking sockets.
Edit: The reason I cannot use threads and non blocking sockets is that I am on a non-standard system. I have a single RTOS(Real-Time Operating System) task available.
To solve this, use of select is necessary but I am not sure if it is sufficient.
Initially I connect to all clients. But select can only be used to see if a read or write will block or not, not if a connect will.
So when I have connected to say 2 clients and they are all waiting to be served, what if the 3rd does not work, the connection will block causing the first 2 connections to time out as well.
Can this be solved?
I think the connection-issue can be solved by setting a timeout for the connect-operation, so that it will fail fast enough. Of course that will limit you if the network really is working, but you have a very long (slow) path to some of the server(s). That's bad design, but your requirements are pretty harsh.
See this answer for details on connection-timeouts.
It seems you need to isolate the connections. Well, if you cannot use threads you can always resort to good-old-processes.
Spawn each client by forking your server process and use traditional IPC mechanisms if communication between them is required.
If you can neither use a multiprocess approach I'm afraid you'll have a hard time doing that.
I'm trying to reconnect to the Redis server on disconnect.
I'm using redisAsyncConnect and I've setup a callback on disconnect. In the callback I try to reconnect with the same command I use at the very start of the program to establish the connection but it's not working. Can't seem to reconnect.
Can anyone help me out with an example?
Managing Redis (re)connections asynchronously is a bit tricky when an event loop is used.
Here is an example implementing a small zset polling daemon connecting to a list of Redis instances, which is resilient to disconnection events. The ae event loop is used (it is the one used by Redis itself).
http://gist.github.com/4149768
Check the following functions:
connectCallback
disconnectCallback
checkConnections
reconnectIfNeeded
The main daemon loop does its activity only when the connection is available. Once per second, a second time initiated callback checks if some connections have to be reestablished. We have found this mechanism quite reliable.
Note: error management is crude in this example for brevity sake. Real production code should manage errors in a more graceful way.
One tricky point when dealing with multiple asynchronous connections is the fact there is no user defined contextual data passed as a parameter of the corresponding callbacks. Cleaning the data associated to a connection after a disconnection event can be a bit difficult.