In Linux if we call blocking recv from one thread and close for the same socket from another thread, recv doesn't exit.
Why?
The "why" is simply that that's how it works, by design.
Within the kernel, the recv() call has called fget() on the struct file corresponding to the file descriptor, and this will prevent it from being deallocated until the corresponding fput().
You will simply have to change your design (your design is inherently racy anyway - for this to happen, you must have no locking protecting the file descriptor in userspace, which means that the close() could have happened just before the recv() call - and the file descriptor even been reused for something else).
If you want to wake up another thread that's blocking on a file descriptor, you should have it block on select() instead, with a pipe included in the file descriptor set that can be written to by the main thread.
Check that all file descriptors for the socket have been closed. If any remain open at the "remote end" (assuming this is the one you attempt to close), the "peer has not performed an orderly shutdown".
If this still doesn't work, call shutdown(sock, SHUT_RDWR) on the remote end, this will shut the socket down regardless of reference counts.
Related
I am reading in a blocked way from a device/filedescriptor.
It might happen, that in a different thread the device is closed and filedescriptor is deleted. Unfortunatly the read doesn't return or take notice and keeps blocking.
As a workaround I could do a while loop with select as a timeout. If a timeout happens, I can check the filedescriptor and in case it is gone not calling read but return.
I am wondering, if there is a better way in Linux-C ?
The code you are describing has an inherent race condition - if another thread could be a in blocking read() on a file descriptor when you close() that file descriptor, the other thread could just as well be just about to call read() instead.
You can't call close() unless you know that all other threads are no longer in a position to be using that file descriptor at all.
The easiest way to handle cases like you describe is for one thread to be the 'owning' thread of each file descriptor, that is responsible for closing the file descriptor. Other threads don't directly close it - instead they mark the file descriptor as "to be closed" in some shared data structure and wake up the owning thread.
You can make it possible to wake the owning thread by having it not block in read() but instead block in select() or poll() with another file descriptor - usually a pipe - in the set as well as the target file descriptor. The thread is woken by writing to the other end of that pipe.
Once a file descriptor is closed by other thread, it's not easy to verify it happened. What if other thread re-opened a file and got the same
file descriptor? On a successful close() call, you can't access the file descriptor again and it'll be undefined. On a failed close() call, POSIX leaves the state of the file descriptor unspecified.
The select() option suffers from the same as described above.
Your problem is not really anything different to any other data race issue in a multi-threaded program. I suggest you re-write the
code so that threads don't access the file descriptor without synchronization. Or, avoid multiple threads reading from the same file descriptor if that's possible.
Say that I call select() on a FD_SET containing a bunch of read file descriptors. What happens if during the select() call, one of the file descriptor closes? Assuming that some sort of error occurs, then is it my responsibility to find and remove the closed file descriptor from the set?
I don't believe this is specified anywhere; some systems may immediately return from select while others may continue blocking. Note that the only way this can happen is in a multi-threaded process (otherwise, the close cannot happen during select; even if it happened from a signal handler, select would have already been interrupted by the signal). As such, this situation arising probably indicates you have bigger issues to worry about. If one of the file descriptors you're polling can be closed during select, the bigger issue is that the same file descriptor might be reassigned to a newly opened file (e.g. one opened in another unrelated thread) immediately after the close, and the thread that's polling might then wrongly perform IO on the new file "belonging to" a different thread.
If you have a data object that consists of a set of file descriptors that will be polled with select in a multithreaded program, you almost surely need to be using some sort of synchronization primitive to control access to that set, and adding or removing file descriptors should require a lock that's mutually exclusive with the possibility that select (or any IO on the members) is in progress.
Of course in a multi-threaded program, it may be better not to use select at all and instead let blocking IO in multiple threads achieve the desired result without complicated locking logic.
The select() system call takes three fd_set parameters: Send, Receive, Exception. To check, if an error occurs on a reading file descriptor include it in the read (receive) and in the error (exceprion) set - seeing it in the exception set on return from select() means, an exception has occurred on that socket, giving you the chance to find out what.
In general network sockets with any sort of exception will no longer be fit to send and receive.
Even if you've read all the sent data, a closed socket is always regarded as ready to read. Select will unblock, signaling that socket to be available.
how is select for reading being handled on Linux systems in case the process was forked after opening a udp socket?
Especially - is it possible that in this kind of program:
so = open socket
fork
for(;;) {
select() for reading on socket so
recv from so
}
two packets will wake up only one of the processes (in case they arrive before the waiting process is notified / exits select) and the second one of those packets will not be received?
Or can I assume that for UDP, every packet will always wake up a process or leave the flag set?
Each processes, the parent and child, has a fie descriptor for the same socket. The per file descriptor attributes are independent (e.g. blocking, being able to close the socket).
In your scenario it is indeed feasible legal for one of the processes, for example to be waken and read the data from the socket before the other one getting into select.
Your question is not actually affected by the fork() at all.
select() returns if one of the file descriptors in the read set is readable. If you don't read from it and call select() again, it will still be readable. It will be remain readable until there is no more data to read from it.
In other words, select() is level-triggered, not edge-triggered.
Can a socket be closed from another thread when a send / recv on the same socket is going on?
Suppose one thread is in blocking recv call and another thread closes the same socket, will the thread in the recv call know this and come out safely?
I would like to know if the behavior will differ between different OS / Platforms. If yes, how will it behave in Solaris?
In linux closing a socket won't wake up recv(). Also, as #jxh says:
If a thread is blocked on recv() or send() when the socket is closed
by a different thread, the blocked thread will receive an error.
However, it is difficult to detect the correct remedial action after
receiving the error. This is because the file descriptor number
associated with the socket may have been picked up by yet a different
thread, and the blocked thread has now been woken up on an error for a
"valid" socket. In such a case, the woken up thread should not call
close() itself.
The woken up thread will need some way to differentiate whether the
error was generated by the connection (e.g. a network error) that
requires it to call close(), or if the error was generated by a
different thread having called close() on it, in which case it should
just error out without doing anything further to the socket.
So the best way to avoid both problems is to call shutdown() instead of close(). shutdown() will make the file descriptor still available, so won't be allocated by another descriptor, also will wake up recv() with an error and the thread with the recv() call can close the socket the normal way, like a normal error happened.
I don't know Solaris network stack implementation but I'll throw out my theory/explanation of why it should be safe.
Thread A enters some blocking system call, say read(2), for this given socket. There's no data in socket receive buffer, so thread A is taken off the processor an put onto wait queue for this socket. No network stack events are initiated here, connection state (assuming TCP) has not changed.
Thread B issues close(2) on the socket. While kernel socket structure should be locked while thread B is accessing it, no other thread is holding that lock (thread A released the lock when it was put to sleep-wait). Assuming there's no outstanding data in the socket send buffer, a FIN packet is sent and the connection enters the FIN WAIT 1 state (again I assume TCP here, see connection state diagram)
I'm guessing that socket connection state change would generate a wakeup for all threads blocked on given socket. That is thread A would enter a runnable state and discover that connection is closing. The wait might be re-entered if the other side has not sent its own FIN, or the system call would return with eof otherwise.
In any case, internal kernel structures will be protected from inappropriate concurrent access. This does not mean it's a good idea to do socket I/O from multiple threads. I would advise to look into non-blocking sockets, state machines, and frameworks like libevent.
For me, shutdown() socket from another thread do the job in Linux
If a thread is blocked on recv() or send() when the socket is closed by a different thread, the blocked thread will receive an error. However, it is difficult to detect the correct remedial action after receiving the error. This is because the file descriptor number associated with the socket may have been picked up by yet a different thread, and the blocked thread has now been woken up on an error for a "valid" socket. In such a case, the woken up thread should not call close() itself.
The woken up thread will need some way to differentiate whether the error was generated by the connection (e.g. a network error) that requires it to call close(), or if the error was generated by a different thread having called close() on it, in which case it should just error out without doing anything further to the socket.
Yes, it is ok to close the socket from another thread. Any blocked/busy threads that are using that socket will report a suitable error.
A situation I have under Windows XP (SP3) has been driving me nuts, and I'm reaching the end of my tether, so maybe someone can provide some inspiration.
I have a C++ networking program (non-GUI). This program is built to compile and run under Windows, MacOS/X, and Linux, so it uses select() and non-blocking I/O as the basis for its event loop.
In addition to its networking duties, this program needs to read text commands from stdin, and exit gracefully when stdin is closed. Under Linux and MacOS/X, that's easy enough -- I just include STDIN_FILENO in my read fd_set to select(), and select() returns when stdin is closed. I check to see that FD_ISSET(STDIN_FILENO, &readSet) is true, try to read some data from stdin, recv() returns 0/EOF, and so I exit the process.
Under Windows, on the other hand, you can't select on STDIN_FILE_HANDLE, because it's not a real socket. You can't do non-blocking reads on STDIN_FILE_HANDLE, either. That means there is no way to read stdin from the main thread, since ReadFile() might block indefinitely, causing the main thread to stop serving its network function.
No problem, says I, I'll just spawn a thread to handle stdin for me. This thread will run in an infinite loop, blocking in ReadFile(stdinHandle), and whenever ReadFile() returns data, the stdin-thread will write that data to a TCP socket. That socket's connection's other end will be select()'d on by the main thread, so the main thread will see the stdin data coming in over the connection, and handle "stdin" the same way it would under any other OS. And if ReadFile() returns false to indicate that stdin has closed, the stdin-thread just closes its end of the socket-pair so that the main thread will be notified via select(), as described above.
Of course, Windows doesn't have a nice socketpair() function, so I had to roll my own using listen(), connect(), and accept() (as seen in the CreateConnectedSocketPair() function here. But I did that, and it seems to work, in general.
The problem is that it doesn't work 100%. In particular, if stdin is closed within a few hundred milliseconds of when the program starts up, about half the time the main thread doesn't get any notification that the stdin-end of the socket-pair has been closed. What I mean by that is, I can see (by my printf()-debugging) that the stdin-thread has called closesocket() on its socket, and I can see that the main thread is select()-ing on the associated socket (i.e. the other end of the socket-pair), but select() never returns as it should... and if it does return, due to some other socket selecting ready-for-whatever, FD_ISSET(main_thread_socket_for_socket_pair, &readSet) returns 0, as if the connection wasn't closed.
At this point, the only hypothesis I have is that there is a bug in Windows' select() implementation that causes the main thread's select() not to notice that the other end of the socket-pair has closed by the stdin-thread. Is there another explanation? (Note that this problem has been reported under Windows 7 as well, although I haven't looked at it personally on that platform)
Just for the record, this problem turned out to be a different issue entirely, unrelated to threading, Windows, or stdin. The actual problem was an inter-process deadlock, where the parent process was blocked, waiting for the child processes to quit, but sometimes the child processes would be simultaneously blocked, waiting on the parent to supply them with some data, and so nothing would move forward.
Apologies to all for wasting your time on a red herring; if there's a standard way to close this case as unwarranted, let me know and I'll do it.
-Jeremy
Is it possible you have a race condition? Eg. Do you ensure that the CreateConnectedSocketPair() function has definitely returned before the stdin-thread has a chance to try closing its socket?
I am studying in your code. In the CreateConnectedSocketPair(), socket1 is used for listen(), and newfd is used for send/recv data. So, why does "socket1 = newfd"? How to close the listenfd then?
Not a solution, but as a workaround, couldn't you send some magic "stdin has closed" message across the TCP socket and have your receiving end disconnect its socket when it sees that and run whatever 'stdin has closed' handler?
Honestly your code is too long and I don't have time right now to spend on it.
Most likely the problem is in some cases closing the socket doesn't cause a graceful (FIN) shutdown.
Checking for exceptions returning from your select may catch the remainder of cases. There is also the (slim) possibility that no notification is actually being sent to the socket that the other end has closed. In that case, there is no way other than timeouts or "keep alive"/ping messages between the endpoints to know that the socket has closed.
If you want to figure out exactly what is happening, break out wireshark and look for FINs and RSTs (and the absence of anything). If you see the proper FIN sequence going across when your socket is closed, then the problem must be in your code. if you see RST, it may be caught by exceptions, and if you don't see anything you'll need to devise a way in your protocol to 'ping' each side of the connection to make sure they are still alive, or set a sufficiently short timeout for more data.
Rather than chasing perceived bugs in select(), I'm going to address your original fallacy that drove you away from simple, reliable, single-threaded design.
You said "You can't do non-blocking reads on STDIN_FILE_HANDLE, either. That means there is no way to read stdin from the main thread, since ReadFile() might block indefinitely" but this simply isn't the whole story. Look at ReadConsoleInput, WSAEventSelect, and WaitForMultipleObjects. The stdin handle will be signalled only when there is input and ReadConsoleInput will return immediately (pretty much the same idea behind select() in Unix).
Or, use ReadFileEx and WaitForMultipleObjectsEx to have the console reads fire off an APC (which isn't all that asynchronous, it runs on the main thread and only during WaitForMultipleObjectsEx or another explicit wait function).
If you want to stick with using a second thread to get async I/O on stdin, then you might try closing the handle being passed to select instead of doing a socket shutdown (via closesocket on the other end). In my experience select() tends to return really quickly when one of the fds it is waiting on gets closed.
Or, maybe your problem is the other way around. The select docs say "For connection-oriented sockets, readability can also indicate that a request to close the socket has been received from the peer". Typically you'd send that "request to close the socket" by calling shutdown(), not closesocket().