How to notify server that client is closing connection - c

I have a server that is running a select() loop that sometimes continues blocking when the client closes the connection from its side. The select() loop handles all other read/write operations correctly and sets the correct file descriptor in the fd_set, leading me to believe that it is not an issue with the file descriptor setup on the server-side.
The way I planned on handling the client closing the connection was to have the select() break due to activity on the socket (closing it from the client-side), see that the fd was set for that socket, and then try to read from it - and if the read returned 0, then close the connection. However, because the select() doesn't always return when the client side closes the connection, there is no attempt to check the fd_set and subsequently try to read from the socket.
As a workaround, I implemented a "stop code" that the client writes to the server just before closing the connection, and this write causes the select() to break and the server reads the "stop code" and knows to close the socket. The only problem with this solution is the "stop code" is an arbitrary string of bytes that could potentially appear in regular traffic, as the normal data being written can contain random strings that could potentially contain the "stop code". Is there a better way to handle the client closing the connection from its end? Or is the method I described the general "best practice"?
I think my issue has something to do with OpenSSL, as the connection in question is an OpenSSL tunnel, and it is the only file descriptor in the set giving me issues.

The way I planned on handling the client closing the connection was to have the select() break due to activity on the socket (closing it from the client-side), see that the fd was set for that socket, and then try to read from it - and if the read returned 0, then close the connection. However, because the select() doesn't always return when the client side closes the connection, there is no attempt to check the fd_set and subsequently try to read from the socket.
Regardless of whether you are using SSL or not, select() can tell you when the socket is readable (has data available to read), and a graceful closure is a readable condition (a subsequent read operation reports 0 bytes read). It is only abnormal disconnects that select() can't report (unless you use the exceptfds parameter, but even that is not always guaranteed). The best way to handle abnormal disconnects is to simply use timeouts in your own code. If you don't receive data from the client for awhile, just close the connection. The client will have to send data periodically, such as a small heartbeat command, if it wants to stay connected.
Also, when using OpenSSL, if you are using the older ssl_... API functions (ssl_new(), ssl_set_fd(), ssl_read(), ssl_write(), etc), make sure you are NOT just blindly calling select() whenever you want, that you call it ONLY when OpenSSL tells you to (when an SSL read/write operation reports an SSL_ERROR_WANT_(READ|WRITE) error). This is an area where alot of OpenSSL newbies tend to make the same mistake. They try to use OpenSSL on top of pre-existing socket logic that waits for a readable notification before then reading data. This is the wrong way to use the ssl_... API. You are expected to ask OpenSSL to perform a read/write operation unconditionally, and then if it needs to wait for new data to arrive, or pending data to send out, it will tell you and you can then call select() accordingly before retrying the SSL read/write operation again.
On the other hand, if you are using the newer bio_... API functions (bio_new(), bio_read(), bio_write(), etc), you can take control of the underlying socket I/O and not let OpenSSL manage it for you, thus you can do whatever you want with select() (or any other socket API you want).
As a workaround, I implemented a "stop code" that the client writes to the server just before closing the connection, and this write causes the select() to break and the server reads the "stop code" and knows to close the socket.
That is a very common approach in many Internet protocols, regardless of whether SSL is used or not. It is a very distinct and explicit way for the client to say "I'm done" and both parties can then close their respective sockets.
The only problem with this solution is the "stop code" is an arbitrary string of bytes that could potentially appear in regular traffic, as the normal data being written can contain random strings that could potentially contain the "stop code".
Then either your communication protocol is not designed properly, or your code is not processing the protocol correctly. In a properly-designed and correctly-processed protocol, there will not be any such ambiguity. There needs to be a clear distinction between the various commands that your protocol defines. Your "stop code" would be one such command amongst other commands. Random data in one command should not be mistakenly treated as a different command. If you are experiencing that problem, you need to fix it.

Related

detecting connection state in epoll linux

There are many threads regarding how to detect if a socket is connected or not using various methods like getpeername / getsockopt w/ SO_ERROR. https://man7.org/linux/man-pages/man2/getpeername.2.html would be a good way for me to detect if a socket is connected or not. The problem is, it does not say anything about if the connection is in progress... So if i call connect, it is in progress, then i call getpeername, will it say it is an error (-1) even though the connection is still in progress?
If it does, I can implement a counter-like system that will eventually kill the socket if it is still in progress after x seconds.
Short Answer
I think that, if getpeername() returns ENOTCONN, that simply means that the tcp connection request has not yet succeeded. For it to not return ENOTCONN, I think the client end needs to have received the syn+ack from the server and sent its own ack, and the server end needs to have received the client's ack.
Thereafter all bets are off. The connection might subsequently be interrupted, but getpeername() has no way of knowing this has happened.
Long Answer
A lot of it depends on how fussy and short-term one wants to be about knowing if the connection is up.
Strictly Speaking...
Strictly speaking with maximum fussiness, one cannot know. In a packet switched network there is nothing in the network that knows (at any single point in time) for sure that there is a possible connection between peers. It's a "try it and see" thing.
This contrasts to a circuit switched network (e.g. a plain old telephone call), where there is a live circuit for exclusive use between peers (telephones); provided current is flowing, you know the circuit is complete even if the person at the other end of the phone call is silent.
Note that if the two computers were connected by a single Ethernet cable (no router, no switches, just a cable between NICs), that is effectively a fixed circuit (not even a circuit-switched network).
Relaxing a Little...
Focusing on what one can know about a connection in a packet switched network. As others have already said, the answer is that, really, one has to send and receive packets constantly to know if the network can still connect the two peers.
Such an exchange of packets occurs with a tcp socket connect() - the connecting peer sends a special packet to say "please can I connect to you", and the serving peer replies "yes", the client then says "thank you!" (syn->, <-syn+ack, ack->). But thereafter the packets flow between peers only if the applications send and receive data, or elects to close the connection (fin).
Calling something like getpeername() I think is somewhat misleading, depending on your requirements. It's fine, if you trust the network infrastructure and remote computer and its application to not break, and not crash.
It's possible for the connect() to succeed, then something breaks somewhere in the network (e.g. the peer's network connection is unplugged, or the peer crashes), and there is no knowledge at your end of the network that that has happened.
The first thing you can know about it is if you send some traffic and fail to get a response. The response is, initially, the tcp acks (which allows your network stack to clear out some of its buffers), and then possibly an actual message back from the peer application. If you keep sending data out into the void, the network will quite happily route packets as far as it can, but your tcp stack's buffers will fill up due to the lack of acks coming back from the peer. Eventually, your network socket blocks on a call to write(), because the local buffers are full.
Various Options...
If you're writing both applications (server and client), you can write the application to "ping pong" the connection periodically; just send a message that means nothing other than "tell me you heard this". Successful ping-ponging means that, at least within the last few seconds, the connection was OK.
Use a library like ZeroMQ. This library solves many issues with using network connections, and also includes (in modern version) socket heartbeats (i.e. a ping pong). It's neat, because ZeroMQ looks after the messy business of making, restoring and monitoring connections with a heartbeat, and can notify the application whenever the connection state changes. Again, you need to be writing both client and server applications, because ZeroMQ has it's own protocol on top of tcp that is not compatible with just a plain old socket. If you're interested in this approach, the words to look for in the API documentation is socket monitor and ZMQ_HEARTBEAT_IVL;
If, really, only one end needs to know the connection is still available, that can be accomplished by having the other end just sending out "pings". That might fit a situation where you're not writing the software at both ends. For example, a server application might be configured (rather than re-written) to stream out data regardless of whether the client wants it or not, and the client ignores most of it. However, the client knows that if it is receiving data it then also knows there is a connection. The server does not know (it's just blindly sending out data, up until its writes() eventually block), but may not need to know.
Ping ponging is also good in that it gives some indication of the performance of the network. If one end is expecting a pong within 5 seconds of sending a ping but doesn't get it, that indicates that all is not as expected (even if packets are eventually turning up).
This allows discrimination between networks that are usefully working, and networks that are delivering packets but too slowly to be useful. The latter is still technically "connected" and is probably represented as connected by other tests (e.g. calling getpeername()), but it may as well not be.
Limited Local Knowledge...
There is limited things one can do locally to a peer. A peer can know whether its connection to the network exists (e.g. the NIC reports a live connection), but that's about it.
My Opinion
Personally speaking, I default to ZeroMQ these days if at all possible. Even if it means a software re-write, that's not so bad as it seems. This is because one is generally replacing code such as connect() with zmq_connect(), and recv() with zmq_revc(), etc. There's often a lot of code removal too. ZeroMQ is message orientated, a tcp socket is stream orientated. Quite a lot of applications have to adapt tcp into a message orientation, and ZeroMQ replaces all the code that does that.
ZeroMQ is also well supported across numerous languages, either in bindings and / or re-implementations.
man connect
If the initiating socket is connection-mode, .... If the connection cannot be established immediately and O_NONBLOCK is not set for the file descriptor for the socket, connect() shall block for up to an unspecified timeout interval until the connection is established. If the timeout interval expires before the connection is established, connect() shall fail and the connection attempt shall be aborted.
If connect() is interrupted by a signal that is caught while blocked waiting to establish a connection, connect() shall fail and set errno to [EINTR], but the connection request shall not be aborted, and the connection shall be established asynchronously.
If the connection cannot be established immediately and O_NONBLOCK is set for the file descriptor for the socket, connect() shall fail and set errno to [EINPROGRESS], but the connection request shall not be aborted, and the connection shall be established asynchronously.
When the connection has been established asynchronously, select() and poll() shall indicate that the file descriptor for the socket is ready for writing.
If the socket is in blocking mode, connect will block while the connection is in progress. After connect returns, you'll know if a connection has been established (or not).
A signal could interrupt the (blocking/waiting) process, the connection routine will then switch to asynchronous mode.
If the socket is in non blocking mode (O_NONBLOCK) and the connection cannot be established immediately, connect will fail with the error EINPROGRESS and like above switching to asynchronous mode, that means, you'll have to use select or poll to figure out if the socket is ready for writing (indicates established connection).

client-server application using fifos

I'm trying to write a client-server application in C using 2 fifos (client_to_server and server_to_client).
A version of the app where the client writes a command to the server who reads it works well, but when I add in client the lines in order to read the answer from the server it doesn't work anymore: the server gets blocked in reading the command from the client (as if there is nothing in client_to_server fifo, although the client written in it). What could be the problem in this case?
You are using fputs to send data to the server. That means that the data could stay in a local buffer until the buffer is full or you explicitely flush it. When you do not wait for the answer but exit from the client, the fifo is implicitely flushed and closed, causing the server to recieve something. But if you start waiting in the client without a prior flush, you end with a deadlock.
But remember: pipes were invented for single way communications. If you want 2 way communications with acknowlegements and/or synchronizations, you should considere using sockets.

Programmatically detect if local web server has hung

I realise that I'll get at least one answer along the lines of "(re)write the code so it doesn't hang" but let's assume we don't live in that shiny happy utopia just yet...
In our embedded system we have a big SDK including a web-server (Boa) which is the primary method of user interaction.
It's possible, during certain phases of the moon, that something can cause the web server to hang or become otherwise stuck in such a way that the process appears running normally (not crashed/dead/using 100% CPU) but does not serve any web pages.
So, the question is, how do we test/detect this situation?
To test whether the server is hung, create a TCP socket and connect to port 80 on IP address 127.0.0.1 (loopback address). Then send the following text over the socket
GET / HTTP/1.1\r\n\r\n
Most servers will interpret that as a request for index.html. Alternatively, you could implement an undocumented URL for testing (which allows for a shorter, predetermined response), e.g.
GET /test/fdoaoqfaf12491r2h1rfda HTTP/1.1\r\n\r\n
You then need to read the response from the server. This involves using select with a reasonable timeout to determine whether any data came back from the server, and if so, use recv to read the data. The response from the server will consist of a header followed by content. The header consists of lines of text, with a blank line at the end of the header. Lines end with \r\n, so the end of the header is \r\n\r\n.
Getting the content involves calling select and recv until recv returns 0. This assumes that the server will send the response and then close the socket. Some sophisticated servers will leave a socket open to allow multiple requests over the same socket. A simple embedded server should not be doing that. (If your server is trying to use the same socket for multiple requests, then you need to figure out how to turn that feature off.)
That's all very well and good, but you really need to rewrite your code so it doesn't hang.
The mostly likely cause of the problem is that the server has a bunch of dangling sockets, i.e. connections from clients that were never properly cleaned up. Dangling sockets will eventually prevent the server from accepting more connections, either because the server has a limit on the number of open connections, or because the process that's running the server uses up all of its file descriptors.
The first thing to check is the TCP timeout value. One project that I worked on had a default timeout of 5 hours, which meant that dangling sockets stayed open for 5 hours. A reasonable timeout is 1 minute.
Then you need to create a client that deliberately misbehaves. Clients can misbehave by
leaving a socket open without reading the server's response
abruptly closing the socket while reading the response
gracefully closing the socket while reading the response
The first situation should be handled by the TCP timeout. The other two need to be properly handled by the server code. Graceful and abrupt socket closure is controlled via the SO_LINGER option of ioctl and the shutdown function. After the client misbehaves, check the number of open file descriptors in the server process, to verify that the server has handled the situation correctly.

Lost messages with non-blocking OpenSSL in C

Context: I'm developing a client-server application that is fairly solid most of the time, despite frequent network problems, outages, broken pipes, and so on. I use non-blocking sockets, select(), and OpenSSL to deliver messages between one or more nodes in a cluster, contingent on application-level heartbeats. Messages are queued and not removed from the queue until the entire message has been transferred and all the SSL_write()s return successfully. I maintain two sockets for each relationship, one incoming and one outgoing. I do this for a reason, and that's because it's much easier to detect a failed connection (very frequent) on a write than it is on a read. If a client is connecting, and I already have a connection, I replace it. Basically, the client performing the write is responsible for detecting errors and initiating a new connection (which will then replace the existing (dead) read connection on the server). This has worked well for me with one exception.
Alas, I'm losing messages. 99.9% of the time, the messages go through fine. But every now and then, I'll send, and I have no errors detected on either side for a few minutes... and then I'll get an error on the socket. The problem is that SSL_write has already returned successfully.
Let me guess: if I was blocking this would be fine, but since I'm non-blocking, I don't wait for the read on my remote end. As long as my TCP buffer can fit more, I keep stuffing things in the pipe. And when my socket goes poof, I lose anything in that buffer yet to be delivered?
How can I deal with this? Are application-level acks really necessary? (I'd rather not travel down the long road of complicated lost-acks and duplicate message complexity) Is there an elegant way to know what message I've lost? Or is there a way I can delay removal from my queue until I know it has been delivered? (Without an ack, how?)
Thanks for any help in advance.

Not getting sigpipe on first call to 'send' after client disconnected

I am working on a TCP server side app, which forwards data to a client.
The problem I am facing is that I try to find out on my server side app if my client disconnected and which data was sent and which not.
My research showed that there are basically two ways to find that out:
1) read from the socket and check if the FIN signal came back
2) waiting for the sigpipe signal on the send call
The first solution doesn't seem reliable to me, as I can't guarantee that the client doesn't send any random data and as such would make my test succeed even though it shouldn't.
The problem with the second solution is that I only get the sigpipe after X following calls to send and as such can't guarantee which data was really sent and which not. I read here on SO and on other sites, that the sigpipe is only supposed to come after the second call to send, I can reproduce that behavior if I only send and receive over localhost but not if I really use the network.
My question now is if it's normal that X can vary and if yes which parameters I might look at to alter that behavior or if that is not reliable possible due to TCP nature.
TCP connection is bidirectional. A FIN from a client signals that the client won't be sending any more data, but the data in the other direction (from the server to the client) can still be sent (if client does not reset the connection with the RST). The reliable way to detect the FIN from the client, is to read from the client socket (if you are using socket interface) until the read returns 0.
TCP guarantees that if both ends terminate connection with a FIN that is acknowledged, all data that was exchanged within the connection, was received by the other side. If the connection is terminated with the RST, TCP by itself gives you no way to determine which data was successfully read by the other side. To do it, you need some application level mechanism, such as application level acknowledgements. But the best way would be to design your protocol in such a way, that connection, under normal circumstances, is always closed gracefully (FINs from both sides, no RSTs).

Resources