detecting when a client-side application shuts down the connection inorderly - c

I am developing a 9p server, it is pretty much like an nfs server. Subsequent mounting and unmounting causes no socket descriptor file leakage because I am able to close the socket. However, in the following scenario, the server does not do a proper clean up and close the socket. The scenario is, when the client at Machine A mount a FS from the server machine. Then for some reason, Machine A restarts or is shut down. If this happens, I am expecting the server to clean up the work and close the socket but for some reason it blocks on read(). I thought a read() should return 0 when a connection is closed but it doesnt. I assume thats because a proper tcp termination has not occured so the server is waiting for some data from the client. Here is a pseudo code of my server
while(1){
n = read(sockfd, buffer, 4); //4 is protocol header that specifies the size
if ( n == 0 ) break;
/* iteratively read the rest of bytes until the incoming message ends */
}
cleanup(); // close socket and some other tasks
However, when the client restarts while the server is blocking on read, nothing happens. What is the best way and easiest to solve this? Some people suggest running a separate thread that checks connections but this is too involved. I am sure there must be a faster way

When the client does a shutdown then the OS on client terminates all TCP connection. But when the client crashes or it is switched off or when an network problem occurs somewhere at path between the client and the server then there is no way to deliver an information to server and the server may be blocked in the read() call forever.
There are two possible solutions. Either you can use standard TCP keep alive probes or you can implement an application level health-check.
TCP keep alive
TCP keep-alive is well described for example at http://tldp.org/HOWTO/TCP-Keepalive-HOWTO/overview.html :
In order to understand what TCP keepalive (which we will just call keepalive) does, you need do nothing more than read the name: keep TCP alive. This means that you will be able to check your connected socket (also known as TCP sockets), and determine whether the connection is still up and running or if it has broken...
When you want you application use TCP keep alive the just set the socket option (error checking is missing):
int optval = 1;
socklen_t optlen = sizeof(optval);
setsockopt(socket, SOL_SOCKET, SO_KEEPALIVE, &optval, optlen);
The TCP keep alive is easy to use but it depends on the OS configuration and application cannot set own timeouts because they are configurable system wide.
Application level health check
Use an application level mechanism when you need application specific timeouts for disconnection detection. There are plenty of ways how to implement it. The idea is to send periodically a piece of useless data and assume connection is destroyed when it is not received.

I want to amend Zaboj Campula's good answer with the most important way to deal with this: Timeouts. Normally, you would assign a timeout to any socket operation. A typical value is 30 seconds. That way there is no need for keep alives most of the time. Connection failure will be detected within 30 seconds.
Some people suggest running a separate thread that checks connections but this is too involved.
That does not work because your machine does not know that the connection is gone. There is nothing to check.

Related

detecting connection state in epoll linux

There are many threads regarding how to detect if a socket is connected or not using various methods like getpeername / getsockopt w/ SO_ERROR. https://man7.org/linux/man-pages/man2/getpeername.2.html would be a good way for me to detect if a socket is connected or not. The problem is, it does not say anything about if the connection is in progress... So if i call connect, it is in progress, then i call getpeername, will it say it is an error (-1) even though the connection is still in progress?
If it does, I can implement a counter-like system that will eventually kill the socket if it is still in progress after x seconds.
Short Answer
I think that, if getpeername() returns ENOTCONN, that simply means that the tcp connection request has not yet succeeded. For it to not return ENOTCONN, I think the client end needs to have received the syn+ack from the server and sent its own ack, and the server end needs to have received the client's ack.
Thereafter all bets are off. The connection might subsequently be interrupted, but getpeername() has no way of knowing this has happened.
Long Answer
A lot of it depends on how fussy and short-term one wants to be about knowing if the connection is up.
Strictly Speaking...
Strictly speaking with maximum fussiness, one cannot know. In a packet switched network there is nothing in the network that knows (at any single point in time) for sure that there is a possible connection between peers. It's a "try it and see" thing.
This contrasts to a circuit switched network (e.g. a plain old telephone call), where there is a live circuit for exclusive use between peers (telephones); provided current is flowing, you know the circuit is complete even if the person at the other end of the phone call is silent.
Note that if the two computers were connected by a single Ethernet cable (no router, no switches, just a cable between NICs), that is effectively a fixed circuit (not even a circuit-switched network).
Relaxing a Little...
Focusing on what one can know about a connection in a packet switched network. As others have already said, the answer is that, really, one has to send and receive packets constantly to know if the network can still connect the two peers.
Such an exchange of packets occurs with a tcp socket connect() - the connecting peer sends a special packet to say "please can I connect to you", and the serving peer replies "yes", the client then says "thank you!" (syn->, <-syn+ack, ack->). But thereafter the packets flow between peers only if the applications send and receive data, or elects to close the connection (fin).
Calling something like getpeername() I think is somewhat misleading, depending on your requirements. It's fine, if you trust the network infrastructure and remote computer and its application to not break, and not crash.
It's possible for the connect() to succeed, then something breaks somewhere in the network (e.g. the peer's network connection is unplugged, or the peer crashes), and there is no knowledge at your end of the network that that has happened.
The first thing you can know about it is if you send some traffic and fail to get a response. The response is, initially, the tcp acks (which allows your network stack to clear out some of its buffers), and then possibly an actual message back from the peer application. If you keep sending data out into the void, the network will quite happily route packets as far as it can, but your tcp stack's buffers will fill up due to the lack of acks coming back from the peer. Eventually, your network socket blocks on a call to write(), because the local buffers are full.
Various Options...
If you're writing both applications (server and client), you can write the application to "ping pong" the connection periodically; just send a message that means nothing other than "tell me you heard this". Successful ping-ponging means that, at least within the last few seconds, the connection was OK.
Use a library like ZeroMQ. This library solves many issues with using network connections, and also includes (in modern version) socket heartbeats (i.e. a ping pong). It's neat, because ZeroMQ looks after the messy business of making, restoring and monitoring connections with a heartbeat, and can notify the application whenever the connection state changes. Again, you need to be writing both client and server applications, because ZeroMQ has it's own protocol on top of tcp that is not compatible with just a plain old socket. If you're interested in this approach, the words to look for in the API documentation is socket monitor and ZMQ_HEARTBEAT_IVL;
If, really, only one end needs to know the connection is still available, that can be accomplished by having the other end just sending out "pings". That might fit a situation where you're not writing the software at both ends. For example, a server application might be configured (rather than re-written) to stream out data regardless of whether the client wants it or not, and the client ignores most of it. However, the client knows that if it is receiving data it then also knows there is a connection. The server does not know (it's just blindly sending out data, up until its writes() eventually block), but may not need to know.
Ping ponging is also good in that it gives some indication of the performance of the network. If one end is expecting a pong within 5 seconds of sending a ping but doesn't get it, that indicates that all is not as expected (even if packets are eventually turning up).
This allows discrimination between networks that are usefully working, and networks that are delivering packets but too slowly to be useful. The latter is still technically "connected" and is probably represented as connected by other tests (e.g. calling getpeername()), but it may as well not be.
Limited Local Knowledge...
There is limited things one can do locally to a peer. A peer can know whether its connection to the network exists (e.g. the NIC reports a live connection), but that's about it.
My Opinion
Personally speaking, I default to ZeroMQ these days if at all possible. Even if it means a software re-write, that's not so bad as it seems. This is because one is generally replacing code such as connect() with zmq_connect(), and recv() with zmq_revc(), etc. There's often a lot of code removal too. ZeroMQ is message orientated, a tcp socket is stream orientated. Quite a lot of applications have to adapt tcp into a message orientation, and ZeroMQ replaces all the code that does that.
ZeroMQ is also well supported across numerous languages, either in bindings and / or re-implementations.
man connect
If the initiating socket is connection-mode, .... If the connection cannot be established immediately and O_NONBLOCK is not set for the file descriptor for the socket, connect() shall block for up to an unspecified timeout interval until the connection is established. If the timeout interval expires before the connection is established, connect() shall fail and the connection attempt shall be aborted.
If connect() is interrupted by a signal that is caught while blocked waiting to establish a connection, connect() shall fail and set errno to [EINTR], but the connection request shall not be aborted, and the connection shall be established asynchronously.
If the connection cannot be established immediately and O_NONBLOCK is set for the file descriptor for the socket, connect() shall fail and set errno to [EINPROGRESS], but the connection request shall not be aborted, and the connection shall be established asynchronously.
When the connection has been established asynchronously, select() and poll() shall indicate that the file descriptor for the socket is ready for writing.
If the socket is in blocking mode, connect will block while the connection is in progress. After connect returns, you'll know if a connection has been established (or not).
A signal could interrupt the (blocking/waiting) process, the connection routine will then switch to asynchronous mode.
If the socket is in non blocking mode (O_NONBLOCK) and the connection cannot be established immediately, connect will fail with the error EINPROGRESS and like above switching to asynchronous mode, that means, you'll have to use select or poll to figure out if the socket is ready for writing (indicates established connection).

Connecting to server TCP in C

I need to clarify something. I'm making a server/client TCP program in C.
What happens if a client tries to connect (using connect()) when the server is not stuck in accept()? I mean, when it's busy? What does connect() return?
EDIT:
I'm on Linux environment.
if (connect(...) < 0) {
// ERROR AND LEAVE
}
This is what I'm doing in my client. From what I've read and learned, if the server is busy and not accepting, connect() should wait a little bit, and then return -1, if the server is still busy. Is that right?
If so, how do I avoid that "little bit"? I want it to return -1 right away.
From what I've read and learned, if the server is busy and not accepting, connect() should wait a little bit, and then return -1, if the server is still busy. Is that right?
The acceptance of the TCP connection, i.e. the TCP handshake, is fully done in the OS kernel independent from a call to accept. accept just returns already accepted connections to user space. Thus, even if the server is currently busy the connection will succeed as long as there is still space in the pending queue. The size of the pending queue is set with listen. If the pending queue is full since the application did not retrieve accepted connections from it for some time but clients still connected, then the server OS will reject the connect attempt, i.e. connect will fail.

is there a way to see if a client socket on my serverside is dead?

when a client connects to my server side, after they connect if they switch to a VPN or something the server side still says the socket is alive and still tries to read from it i tried using another thread to check all my sockets constantly with read and close it if it returns -1 but it still doesn't do anything
It very depends on what type of protocol you use, but generalized question is : yes and no. You have to learn network protocol stack to know what you csn do in you situations, details of which you did not disclose.
Usual way to solve this problem is establish some policy or two way cpmmunocation. E.g. there was np data or "i'm alive" message send from client X for duration of time Y, we close connection. Or, send a regular "ping" message to client C and expect a response before period Y expires.
If we're talking TCP, and if the client's connection is properly closed, a message is sent to the server, so the server will know the connection is closed, so read/recv will return 0 bytes indicating EOF.
But you're asking about the times when the client becomes unable to communicate with the server. Detecting an absence of messages is necessarily done using a timeout.
You can have the server "ping" the client (send a message to which the client must respond) periodically.
You can have the client send a message periodically (a "heartbeat") when idle.
Either way, no message (of any kind) for X seconds indicates a broken connection.
If you enable the SO_KEEPALIVE socket option on each new TCP connection, the OS will automatically ping the remote side periodically to see if it still responds, and close the connection if it doesn't. The default timeout is several hours, but many OSes allow you to configure a lower timeout on a per-socket basis. Unfortunately, each one is different in how to do this. Linux, for example, uses the TCP_KEEPIDLE socket option. NetBSD (And probably other BSDs) uses TCP_KEEPALIVE. And so on.

Detecting disconnected client side socket connection (cross-platform applicability)

I know there are hundreds of answers for this question, but I cannot get it done with respect to my situation. The scenario is like this, we have a server written in TCP/IP protocol, we have multiple clients connected to this server. The client here is a software module which before starting on a client machine, registers its presence in the server and loads the functionalities. But the problem is that this software module getting crashed and there by no socket.close() is called, this will make its footprints still present in the server even though its crashed. How to recognize this?
I am using select() method in the client to notify any info from client and server (vice-versa)
I cannot create a process separately for each client request in the server, neither can I create a parent-child mechanism in the client machine.
tcp-keepalive is not applicable as we need to tweak the registry in Windows? I need a x-platform compatible solution.
I have read that recv() to the connected socket in the server code will return some values from 'that' client such as 0 for socket closed? Can I use this to clear off the client socket registration in the server database? Will this work?
You didn't specify what method you are using to handle socket events in your server side code. Whatever method for polling your sockets you are using, recv() will return 0 or possibly -1/SOCKET_ERROR when a client crashes.
To detect inactive client connections most server applications send out some form of heartbeat or ping message periodically within the application layer protocol. When an ACK fails to be sent from the client the server application will then get notification the client disconnected via recv() returning 0 or SOCKET_ERROR with an error code of something like WSAENETRESET, WSAECONNABORTED, WSAETIMEDOUT, or WSAECONNRESET (see the various error codes here). Often after the server sends the heartbeat to a client TCP port that is no longer active an ICMP packet is sent in response that will alert your server that the port or host not active (recv() will immediately notify you of this event).
If you wish to turn the TCP keep alive timer on you can use the socket option SO_KEEPALIVE. The interval can also be set using SIO_KEEPALIVE_VALS.
Edit: Keep in mind the various error codes and option SIO_KEEPALIVE_VALS are Win32 specific. To handle these events for other operating systems you will need to use operating specific ways of retrieving error codes and setting the TCP keep alive interval if you choose to do so. My best suggestion to keeping your code cross platform compatible is to simply implement an application layer heartbeat message into your protocol or some other application layer specific timeout. Doing so will allow you to forget about managing TCP keep alives.
Update
I cannot comment on EJP's answer, but it's important to point out that by calling send() he is effectively recommending you implement a heartbeat/ping message in the application layer of your protocol. While checking the return value of send() is important, if you are polling/selecting read events you will be notified of the TCP connection being disconnected immediately upon calling recv() the moment the connection is deemed broken by the TCP stack. If you wait for your application timer to try to send some data using send() that could be many seconds (depending on the length of your interval timer) after recv() has already notified you that the connection is broken. In other words: pay attention to the recv() return values as well as your send() return values.
tcp-keepalive is not applicable as we need to tweak the registry in Windows?
TCP keepalive is an option if you can accept the default timeout of two hours.
I need a x-platform compatible solution.
TCP keepalive is cross-platform.
I have read that recv() to the connected socket in the server code will return some values from 'that' client such as 0 for socket closed?
it will return zero if the peer closes its socket, and on some platforms if the peer process merely exits without closing it.
Can I use this to clear off the client socket registration in the server database? Will this work?
Only if you can rely on the peer closing the socket properly.
It seems to me that what you should be doing is debugging the client code so it doesn't crash, and using TCP keepalive as a long-term backup.
You should also be aware that send() to a peer that has exited will sooner or later fail with an ECONNRESET error.

How to detect that the client is still connected (and not hung-up) using recv()?

I have written a multiclient Server program in C on SuSE Linux Enterprise Server 12.3 (x86_64), I am using one thread per client to receive data.
My problem is:
I am using one terminal to run the server, and using several other terminals to telnet to my server (as client). I have used recv() in the server to receive data from client, I have also applied checks for return value of recv() i.e. Error on -1; Conn. Closed on 0 & Normal operation else. I have not used any flags in recv().
My program works fine if I just close the telnet session (i.e. disconnect client) normally using Ctrl+] and close, but if I forcefully terminate the client using kill <pid> then my server is unable to detect loss of connection.
How to fix that?
Constraint: I do not want to put condition on client side, I want to fix this on server side only.
You can enable SO_KEEPALIVE on the socket in your server.
/* enable keep-alive on the socket */
int one = 1;
setsockopt(sock, SOL_SOCKET, SO_KEEPALIVE, &one, sizeof(one));
By default, when keep-alive is enabled, the connection has to be idle for 2 hours before a keep-alive probe is attempted. You can adjust the keep-alive times to be a little more aggressive by adjusting the TCP_KEEPIDLE parameter:
int idletime = 120; /* in seconds */
setsockopt(sock, IPPROTO_TCP, TCP_KEEPIDLE, &idletime, sizeof(idletime));
When a probe is sent, it expects an acknowledgement from the other end. If there is an acknowledgement, the probe stays silent until the idle timer expires again. The keep-alive probe is retried again, by default every 75 seconds, if no acknowledgement to the probe is received. This can be adjusted with the TCP_KEEPINTVL option. The TCP_KEEPCNT option controls how many successive failures triggers the connection to be dropped. By default, that number is 9.
These options are available on Linux. BSD has similar options, but are named differently.
About all you'd be able to do is implement a timeout of some sort. You won't be able to determine for certain that the client has disconnected unless it actually does the disconnect itself. The closest you'll get is noticing that the client was required to send something and failed to do so in a timely manner.
As for why: TCP is just a layer over top of IP. There's nothing actually connecting the two computers; a "connection" is simply an acknowledgement that another machine exists and has agreed to exchange info with you using TCP. The "connection" abstraction only holds as long as both sides act according to the rules. Forcefully killing the client makes it unable to hold up its end of the deal, so the server is left hanging.
My program works fine if I just close the telnet session (i.e. disconnect client) normally using Ctrl+] and close, but if I forcefully terminate the client using kill or closing the terminal, then my server is unable to detect loss of connection.
In either case the client socket gets closed either by telnet or by the kernel when it destroys telnet process. Your server must receive FIN segment which causes recv() return 0 (after all pending data has been read from the socket).
You are probably not processing all returns codes from recv() correctly.

Resources