How to cleanly interrupt a thread blocking on a recv call? - c

I have a multithreaded server written in C, with each client thread looking something like this:
ssize_t n;
struct request request;
// Main loop: receive requests from the client and send responses.
while(running && (n = recv(sockfd, &request, sizeof(request), 0)) == sizeof(request)) {
// Process request and send response.
}
if(n == -1)
perror("Error receiving request from client");
else if(n != sizeof(act))
fprintf(stderr, "Error receiving request from client: Incomplete data\n");
// Clean-up code.
At some point, a client meets a certain criteria where it must be disconnected. If the client is regularly sending requests, this is fine because it can be informed of the disconnection in the responses; However sometimes the clients take a long time to send a request, so the client threads end up blocking in the recv call, and the client does not get disconnected until the next request/response.
Is there a clean way to disconnect the client from another thread while the client thread is blocking in the recv call? I tried close(sockfd) but that causes the error Error receiving request from client: Bad file descriptor to occur, which really isn't accurate.
Alternatively, is there a better way for me to be handling errors here?

So you have at least these possibilities:
(1) pthread_kill will blow the thread out of recv with errno == EINTR and you can clean up and exit the thread on your own. Some people think this is nasty. Depends, really.
(2) Make your client socket(s) non-blocking and use select to wait on input for a specific period of time before checking if a switch used between the threads has been set to indicated they should shut down.
(3) In combo with (2) have each thread share a pipe with the master thread. Add it to the select. If it becomes readable and contains a shutdonw request, the thread shuts itself down.
(4) Look into the pthread_cancel mechanism if none of the above (or variations thereof) do not meet your needs.

Shutdown the socket for input from another thread. That will cause the reading thread to receive an EOS, which should cause it to close the socket and terminate if it is correctly written.

To interrupt the thread, make the socket non-blocking (set O_NONBLOCK using fcntl) and then signal the thread with pthread_kill. This way, recv will fail with either EINTR if it was sleeping, or EAGAIN or EWOULDBLOCK if it wasn’t (also maybe if SA_RESTART is in effect, didn’t check). Note that the socket doesn’t need to, and actually should not, be non-blocking before that. (And of course the signal needs to be handled; empty handler is sufficient).
To be sure to catch the stop-signal but not anything else, use a flag; there are things that may go wrong. For example, recv may fail with EINTR on some spurious signal. Or it may succeed if there was some data available, effectively ignoring the stop request.
And what not to do:
Don’t use pthread_kill alone or with any plain check. It may arrive right before issuing the recv syscall, too early to interrupt it but after all the checks.
Don’t close the socket. That may not even work, and as #R.. pointer out, is dangerous as the socket file descriptor may be reused between close and recv (unless you’re sure nothing opens file descriptors).

Related

Is it possible to do epoll on accept event?

Let's suppose I've created a listening socket:
sock = socket(...);
bind(sock,...);
listen(sock, ...);
Is it possible to do epoll_wait on sock to wait for incoming connection? And how do I get client's socket fd after that?
The thing is on the platform I'm writing for sockets cannot be non-blocking, but there is working epoll implementation with timeouts, and I need to accept connection and work with it in a single thread so that it doesn't hang if something goes wrong and connection doesn't come.
Without knowing what this non-standard platform is it's impossible to know exactly what semantics they gave their epoll call. But on the standard epoll on Linux, a listening socket will be reported as "readable" when an incoming connection arrives, and then you can accept the connection by calling accept. If you leave the socket in blocking mode, and always check for readability using epoll's level-triggered mode before each call to accept, then this should work – the only risk is that if you somehow end up calling accept when no connection has arrived, then you'll get stuck. For example, this could happen if there are two processes sharing a listening socket, and they both try to accept the same connection. Or maybe it could happen if an incoming connection arrives, and then is closed again before you call accept. (Pretty sure in this case Linux still lets the accept succeed, but this kind of edge case is exactly where I'd be suspicious of a weird platform doing something weird.) You'd want to check these things.
Non-blocking mode is much more reliable because in the worst case, accept just reports that there's nothing to accept. But if that's not available, then you might be able to get away with something like this...
Since this answer is the first up in the results in duckduckgo. I will just chime in to say that under GNU/Linux 4.18.0-18-generic (Ubuntu 18.10).
The asynchronously accept an incoming connection using one has to watch for errno value EWOULDBLOCK (11) and then add the socket to epoll read set.
Here is a small shot of scheme code that achieves that:
(define (accept fd)
(let ((out (socket:%accept fd 0 0)))
(if (= out -1)
(let ((code (socket:errno)))
(if (= code EWOULDBLOCK)
(begin
(abort-to-prompt fd 'read)
(accept fd))
(error 'accept (socket:strerror code))))
out)))
In the above (abort-to-prompt fd 'read) will pause the coroutine and add fd to epoll read set, done as follow:
(epoll-ctl epoll EPOLL-CTL-ADD fd (make-epoll-event-in fd)))
When the coroutine is unpaused, the code proceed after the abort to call itself recursively (in tail-call position)
In the code I am working in Scheme, it is a bit more involving since I rely on call/cc to avoid callbacks. The full code is at source hut.
That is all.

Trying to exit from a blocking UDP socket read

This is a question similar to Proper way to close a blocking UDP socket. I have a thread in C which is reading from a UDP socket. The read is blocking. I would like to know if it is possible to be able to exit the thread, without relying on the recv() returning? For example can I close the socket from another thread and safely expect the socket read thread to exit? Didn't see any high voted answer on that thread, thats why I am asking it again.
This really depends on what system you're running under. For example, if you're running under a POSIX-compliant system and your thread is cancelable, the recv() call will be interrupted when you cancel the thread since it's a cancel point.
If you're using an older socket implementation, you could set a signal handler for your thread for something like SIGUSR1 and hope nobody else wanted it and signal, since recv() will interrupt on a signal. Your best option is not to block, if at all possible.
I don't think closing a socket involved in a blocking operation is a safe guaranteed way of terminating the operation. For instance, kernel.org warns darkly:
It is probably unwise to close file descriptors while they may be in
use by system calls in other threads in the same process. Since a
file descriptor may be reused, there are some obscure race conditions
that may cause unintended side effects.
Instead you could use a signal and make recv fail with EINTR
(make sure SA_RESTART is not enabled). You can send a signal to a
specific thread with pthread_kill
You could enable SO_RCVTIMEO on the socket before starting the recv
call
Personally I usually try to stay clear of all the signal nastiness but it's a viable option.
You've got a couple of options for that. A signal will interrupt the read operation, so all you need to do is make sure a signal goes off. The recv operation should fail with error number EINTR.
The simplest option is to set up a timer to interrupt your own process after some timeout e.g. 30 seconds:
itimerval timer
timeval time;
time.tv_sec = 30;
time.tv_usec = 0;
timer.it_value = time;
if( setitimer( ITIMER_REAL, &timer, NULL ) != 0 )
printf( "failed to start timer\n" );
You'll get a SIGALRM after the specified time, which will interrupt your blocking operation, and give you the chance to repeat the operation or quit.
You cannot deallocate a shared resource while another thread is or might be using it. In practice, you will find that you cannot even write code to do what you suggest.
Think about it. When you go to call close, how can you possibly know that the other thread is actually blocked in recv? What if it's about to call recv, but then another thread calls socket and gets the descriptor you just closed? Now, not only will that thread not detect any error, but it will be calling recv on the wrong socket!
There is probably a good way to solve your outer problem, the reason you need to exit from a blocking UDP socket read. There are also several ugly hacks available. The basic approach is to make the socket non-blocking and instead of making a blocking UDP socket read, fake a blocking read with select or poll. You can then abort this loop several ways:
One way is to have select time out and check an 'abort' flag when select returns.
Another way is to also select on the read end of a pipe. Send a single byte to the pipe to abort the select.
If posix complient system, you can try to monitor your thread:
pthread_create with a function that makes your recv and pthread_cond_signal just after, then returns.
The calling thread makes a pthread_cond_timedwait with the desired timeout and terminates the called thread if timed_out.

"Connection Reset by Peer" if server calls close() immediately after write()

I have a AF_INET/SOCK_STREAM server written in C running on Android/Linux which looks more ore less like this:
...
for (;;) {
client = accept(...);
read(client, &message, sizeof(message));
response = process(&message);
write(client, response, sizeof(*response));
close(client);
}
As far as I know, the call to close should not terminate the connection to the client immediately, but it apparently does: The client reports "Connection Reset by Peer" before it has had a chance to read the server's response.
If I insert a delay between write() and close() the client can read the response as expected.
I got a hint that it might have to do with the SO_LINGER option, but I checked it's value and both members of struct linger (l_onoff, l_linger) have a value of zero.
Any ideas?
Stevens describes a configuration in which this can happen, but it depends on the client sending more data after the server has called close() (after the client should “know” that the connection is being closed). UNP 2nd ed s5.12.
Try tcpdumping the conversation to find out what’s really going on. If there's any possibility that a “clever” gateway (e.g. NAT) is between the two endpoints, tcpdump both ends and look for discrepancies.
Connection gets reset when you call close() on connection with data being sent. Specially for this case the sequence of shutdown() with SHUT_WR flag and then blocking read() is used.
Shutting down the writing end of the socket sends FIN and returns immediately, and the said read() blocks and returns 0 as soon as your peer replies with FIN in due turn. Basically, this is what you need in place of the delay between write() and close() you are talking about.
You do not need do anything with linger options in this case, leave it all to default.
SO_LINGER should be set (i.e. set to 1 not 0) if you want queued data to be sent before a close is effected.
SO_LINGER
Lingers on a close() if data is present. This option controls the
action taken when unsent messages
queue on a socket and close() is
performed. If SO_LINGER is set, the
system shall block the calling thread
during close() until it can transmit
the data or until the time expires. If
SO_LINGER is not specified, and
close() is issued, the system handles
the call in a way that allows the
calling thread to continue as quickly
as possible. This option takes a
linger structure, as defined in the
header, to specify the
state of the option and linger
interval.

close vs shutdown socket?

In C, I understood that if we close a socket, it means the socket will be destroyed and can be re-used later.
How about shutdown? The description said it closes half of a duplex connection to that socket. But will that socket be destroyed like close system call?
This is explained in Beej's networking guide. shutdown is a flexible way to block communication in one or both directions. When the second parameter is SHUT_RDWR, it will block both sending and receiving (like close). However, close is the way to actually destroy a socket.
With shutdown, you will still be able to receive pending data the peer already sent (thanks to Joey Adams for noting this).
None of the existing answers tell people how shutdown and close works at the TCP protocol level, so it is worth to add this.
A standard TCP connection gets terminated by 4-way finalization:
Once a participant has no more data to send, it sends a FIN packet to the other
The other party returns an ACK for the FIN.
When the other party also finished data transfer, it sends another FIN packet
The initial participant returns an ACK and finalizes transfer.
However, there is another "emergent" way to close a TCP connection:
A participant sends an RST packet and abandons the connection
The other side receives an RST and then abandon the connection as well
In my test with Wireshark, with default socket options, shutdown sends a FIN packet to the other end but it is all it does. Until the other party send you the FIN packet you are still able to receive data. Once this happened, your Receive will get an 0 size result. So if you are the first one to shut down "send", you should close the socket once you finished receiving data.
On the other hand, if you call close whilst the connection is still active (the other side is still active and you may have unsent data in the system buffer as well), an RST packet will be sent to the other side. This is good for errors. For example, if you think the other party provided wrong data or it refused to provide data (DOS attack?), you can close the socket straight away.
My opinion of rules would be:
Consider shutdown before close when possible
If you finished receiving (0 size data received) before you decided to shutdown, close the connection after the last send (if any) finished.
If you want to close the connection normally, shutdown the connection (with SHUT_WR, and if you don't care about receiving data after this point, with SHUT_RD as well), and wait until you receive a 0 size data, and then close the socket.
In any case, if any other error occurred (timeout for example), simply close the socket.
Ideal implementations for SHUT_RD and SHUT_WR
The following haven't been tested, trust at your own risk. However, I believe this is a reasonable and practical way of doing things.
If the TCP stack receives a shutdown with SHUT_RD only, it shall mark this connection as no more data expected. Any pending and subsequent read requests (regardless whichever thread they are in) will then returned with zero sized result. However, the connection is still active and usable -- you can still receive OOB data, for example. Also, the OS will drop any data it receives for this connection. But that is all, no packages will be sent to the other side.
If the TCP stack receives a shutdown with SHUT_WR only, it shall mark this connection as no more data can be sent. All pending write requests will be finished, but subsequent write requests will fail. Furthermore, a FIN packet will be sent to another side to inform them we don't have more data to send.
There are some limitations with close() that can be avoided if one uses shutdown() instead.
close() will terminate both directions on a TCP connection. Sometimes you want to tell the other endpoint that you are finished with sending data, but still want to receive data.
close() decrements the descriptors reference count (maintained in file table entry and counts number of descriptors currently open that are referring to a file/socket) and does not close the socket/file if the descriptor is not 0. This means that if you are forking, the cleanup happens only after reference count drops to 0. With shutdown() one can initiate normal TCP close sequence ignoring the reference count.
Parameters are as follows:
int shutdown(int s, int how); // s is socket descriptor
int how can be:
SHUT_RD or 0
Further receives are disallowed
SHUT_WR or 1
Further sends are disallowed
SHUT_RDWR or 2
Further sends and receives are disallowed
This may be platform specific, I somehow doubt it, but anyway, the best explanation I've seen is here on this msdn page where they explain about shutdown, linger options, socket closure and general connection termination sequences.
In summary, use shutdown to send a shutdown sequence at the TCP level and use close to free up the resources used by the socket data structures in your process. If you haven't issued an explicit shutdown sequence by the time you call close then one is initiated for you.
I've also had success under linux using shutdown() from one pthread to force another pthread currently blocked in connect() to abort early.
Under other OSes (OSX at least), I found calling close() was enough to get connect() fail.
"shutdown() doesn't actually close the file descriptor—it just changes its usability. To free a socket descriptor, you need to use close()."1
Close
When you have finished using a socket, you can simply close its file descriptor with close; If there is still data waiting to be transmitted over the connection, normally close tries to complete this transmission. You can control this behavior using the SO_LINGER socket option to specify a timeout period; see Socket Options.
ShutDown
You can also shut down only reception or transmission on a connection by calling shutdown.
The shutdown function shuts down the connection of socket. Its argument how specifies what action to perform:
0
Stop receiving data for this socket. If further data arrives, reject it.
1
Stop trying to transmit data from this socket. Discard any data waiting to be sent. Stop looking for acknowledgement of data already sent; don’t retransmit it if it is lost.
2
Stop both reception and transmission.
The return value is 0 on success and -1 on failure.
in my test.
close will send fin packet and destroy fd immediately when socket is not shared with other processes
shutdown SHUT_RD, process can still recv data from the socket, but recv will return 0 if TCP buffer is empty.After peer send more data, recv will return data again.
shutdown SHUT_WR will send fin packet to indicate the Further sends are disallowed. the peer can recv data but it will recv 0 if its TCP buffer is empty
shutdown SHUT_RDWR (equal to use both SHUT_RD and SHUT_WR) will send rst packet if peer send more data.
linux: shutdown() causes listener thread select() to awake and produce error. shutdown(); close(); will lead to endless wait.
winsock: vice versa - shutdown() has no effect, while close() is successfully catched.

Can a socket be closed from another thread when a send / recv on the same socket is going on?

Can a socket be closed from another thread when a send / recv on the same socket is going on?
Suppose one thread is in blocking recv call and another thread closes the same socket, will the thread in the recv call know this and come out safely?
I would like to know if the behavior will differ between different OS / Platforms. If yes, how will it behave in Solaris?
In linux closing a socket won't wake up recv(). Also, as #jxh says:
If a thread is blocked on recv() or send() when the socket is closed
by a different thread, the blocked thread will receive an error.
However, it is difficult to detect the correct remedial action after
receiving the error. This is because the file descriptor number
associated with the socket may have been picked up by yet a different
thread, and the blocked thread has now been woken up on an error for a
"valid" socket. In such a case, the woken up thread should not call
close() itself.
The woken up thread will need some way to differentiate whether the
error was generated by the connection (e.g. a network error) that
requires it to call close(), or if the error was generated by a
different thread having called close() on it, in which case it should
just error out without doing anything further to the socket.
So the best way to avoid both problems is to call shutdown() instead of close(). shutdown() will make the file descriptor still available, so won't be allocated by another descriptor, also will wake up recv() with an error and the thread with the recv() call can close the socket the normal way, like a normal error happened.
I don't know Solaris network stack implementation but I'll throw out my theory/explanation of why it should be safe.
Thread A enters some blocking system call, say read(2), for this given socket. There's no data in socket receive buffer, so thread A is taken off the processor an put onto wait queue for this socket. No network stack events are initiated here, connection state (assuming TCP) has not changed.
Thread B issues close(2) on the socket. While kernel socket structure should be locked while thread B is accessing it, no other thread is holding that lock (thread A released the lock when it was put to sleep-wait). Assuming there's no outstanding data in the socket send buffer, a FIN packet is sent and the connection enters the FIN WAIT 1 state (again I assume TCP here, see connection state diagram)
I'm guessing that socket connection state change would generate a wakeup for all threads blocked on given socket. That is thread A would enter a runnable state and discover that connection is closing. The wait might be re-entered if the other side has not sent its own FIN, or the system call would return with eof otherwise.
In any case, internal kernel structures will be protected from inappropriate concurrent access. This does not mean it's a good idea to do socket I/O from multiple threads. I would advise to look into non-blocking sockets, state machines, and frameworks like libevent.
For me, shutdown() socket from another thread do the job in Linux
If a thread is blocked on recv() or send() when the socket is closed by a different thread, the blocked thread will receive an error. However, it is difficult to detect the correct remedial action after receiving the error. This is because the file descriptor number associated with the socket may have been picked up by yet a different thread, and the blocked thread has now been woken up on an error for a "valid" socket. In such a case, the woken up thread should not call close() itself.
The woken up thread will need some way to differentiate whether the error was generated by the connection (e.g. a network error) that requires it to call close(), or if the error was generated by a different thread having called close() on it, in which case it should just error out without doing anything further to the socket.
Yes, it is ok to close the socket from another thread. Any blocked/busy threads that are using that socket will report a suitable error.

Resources