How to close a SOCKET when using IOCP? - c

I have a server application that uses IOCP. I want to know what is the proper way to close a SOCKET.
If I simply call closesocket() (for a SOCKET with a handle of for example 12345), and this SOCKET has pending IO operations (for example: a pending WSARecv() request), then the following scenario could happen:
I call closesocket() which will destroy the SOCKET.
I accept another SOCKET with the same handle of 12345.
I deque the pending WSARecv() completion packet for the SOCKET with the handle of 12345. Now I would assume that this completion packet is for the current SOCKET with the handle of 12345, but in fact it is for the SOCKET that was previously closed (this is he main problem with this approach).
So this is obviously a bad approach.
The second approach that seems correct is the following:
I associate a struct instance with each SOCKET. The struct has the following members: an int called number_of_pending_IO_operations, and a boolean called Is_SOCKET_being_closed.
When I issue an IO operation for the SOCKET (for example: a WSASend() request), I increment number_of_pending_IO_operations by 1, and when I deque a completion packet for the SOCKET, I decrement number_of_pending_IO_operations by 1.
Now, when I want to close the SOCKET, I don't simply call closesocket(), but rather I call CancelIOEx() to cancel all pending IO operations for the SOCKET, and I also set Is_SOCKET_being_closed to true.
When I am about to issue another IO operation (for example: a WSASend() request), I would check the value of Is_SOCKET_being_closed, and if it is true, I would not issue the IO operation.
Now I simply wait for all of the completion packets to be dequeued, and when number_of_pending_IO_operations reaches 0 and Is_SOCKET_being_closed is set to true, I call closesocket().
Of course I would have race conditions, so I would use critical sections.
Is the second approach a correct way to close a SOCKET, or is there a better way?

I had a similar issue in my client-server application. I think I have no races, no rundown protection, and no critical sections. There are some tradeoffs though.
only one WSARecv() at a time. Multiple buffers -- maybe, but only one WSARecv(). WSARecv completes (possibly, inline), packet pops from completion port, I quickly check what I got and issue another WSARecv()
I have a flag (actually, a counter that can only go up) for Is_SOCKET_being_closed. Counter because I may simultaneously decide to kill the socket in two different places.
after setting Is_SOCKET_being_closed, I call shutdown() followed by CancelIoEx() which will break the previously issued WSARecv().
there is a race between CancelIoEx and completion of WSARecv() -- I eliminate it with double checking Is_SOCKET_being_closed immediately before issuing WSARecv(). Even if race occurs, WSARecv() is doomed to fail because shutdown().
the structure is refcounted, where one ref is held by the receive state machine, and one ref is held for every legal owner of the ptr (as in, potential sender). If you have a legal ref, you can make a copy which you can hand off to someone else (it differs here from rundown references as their AddRef() may fail). I will only closesocket() when that ref drains to zero, because I cannot guarantee that there is no thread that has passed all the checks and about to issue a WSARecv/WSASend.

Related

When a non-blocking send() only transfers partial data, can we assume it would return EWOULDBLOCK the next call?

Two cases are well-documented in the man pages for non-blocking sockets:
If send() returns the same length as the transfer buffer, the entire transfer finished successfully, and the socket may or may not be in a state of returning EAGAIN/EWOULDBLOCK the next call with >0 bytes to transfer.
If send() returns -1 and errno is EAGAIN/EWOULDBLOCK, none of the transfer finished, and the program needs to wait until the socket is ready for more data (EPOLLOUT in the epoll case).
What's not documented for nonblocking sockets is:
If send() returns a positive value smaller than the buffer size.
Is it safe to assume that the send() would return EAGAIN/EWOULDBLOCK on even one more byte of data? Or should a non-blocking program try to send() one more time to get a conclusive EAGAIN/EWOULDBLOCK? I'm worried about putting an EPOLLOUT watcher on the socket if it's not actually in a "would block" state to respond to it coming out of.
Obviously, the latter strategy (trying again to get something conclusive) has well-defined behavior, but it's more verbose and puts a hit on performance.
A call to send has three possible outcomes:
There is at least one byte available in the send buffer →send succeeds and returns the number of bytes accepted (possibly fewer than you asked for).
The send buffer is completely full at the time you call send.
→if the socket is blocking, send blocks
→if the socket is non-blocking, send fails with EWOULDBLOCK/EAGAIN
An error occurred (e.g. user pulled network cable, connection reset by peer) →send fails with another error
If the number of bytes accepted by send is smaller than the amount you asked for, then this consequently means that the send buffer is now completely full. However, this is purely circumstantial and non-authorative in respect of any future calls to send.
The information returned by send is merely a "snapshot" of the current state at the time you called send. By the time send has returned or by the time you call send again, this information may already be outdated. The network card might put a datagram on the wire while your program is inside send, or a nanosecond later, or at any other time -- there is no way of knowing. You'll know when the next call succeeds (or when it doesn't).
In other words, this does not imply that the next call to send will return EWOULDBLOCK/EAGAIN (or would block if the socket wasn't non-blocking). Trying until what you called "getting a conclusive EWOULDBLOCK" is the correct thing to do.
If send() returns the same length as the transfer buffer, the entire transfer finished successfully, and the socket may or may not be in a blocking state.
No. The socket remains in the mode it was in: in this case, non-blocking mode, assumed below throughout.
If send() returns -1 and errno is EAGAIN/EWOULDBLOCK, none of the transfer finished, and the program needs to wait until the socket is isn't blocking anymore.
Until the send buffer isn't full any more. The socket remains in non-blocking mode.
If send() returns a positive value smaller than the buffer size.
There was only that much room in the socket send buffer.
Is it safe to assume that the send() would block on even one more byte of data?
It isn't 'safe' to 'assume [it] would block' at all. It won't. It's in non-blocking mode. EWOULDBLOCK means it would have blocked in blocking mode.
Or should a non-blocking program try to send() one more time to get a conclusive EAGAIN/EWOULDBLOCK?
That's up to you. The API works whichever you decide.
I'm worried about putting an EPOLLOUT watcher on the socket if it's not actually blocking on that.
It isn't 'blocking on that'. It isn't blocking on anything. It's in non-blocking mode. The send buffer got filled at that instant. It might be completely empty a moment later.
I don't see what you're worried about. If you have pending data and the last write didn't send it all, select for writability, and write when you get it. If such a write sends everything, don't select for writability next time.
Sockets are usually writable, unless their send buffer is full, so don't select for writability all the time, as you just get a spin loop.

what is the best way to handle new clients using select() in a server?

I want to write a asynchronous socket server in C, but before I do I'm doing some research. Looking at the select() socket example shown here: http://www.gnu.org/s/hello/manual/libc/Server-Example.html#Server-Example I can see that the example program will only accept one client per select loop (if I'm reading it right). So if there are 20 clients and two more try to connect, will it only accept the 21st client then process the other 20 (worst case, assuming all 20 others require reading) and THEN accept the 22nd? Would it be better if I break the loop after accepting a client so it can select() again and take care of all pending clients before processing connected ones? Or does that defeat the purpose of using select()? Thanks.
The server pattern shown in the example you linked to is fine; there isn't any significant problem introduced by the loop only accepting one socket per iteration.
The key point to keep in mind is that in a well-designed select() loop, the only place the process should ever block is inside the select() call. In particular, if coded correctly, the server will never block inside send(), recv(), or accept(). It's best to set all sockets to non-blocking mode (via fcntl(fd, F_SETFL, O_NONBLOCK)) in order to guarantee this behavior.
Given that, the precise ordering of "which clients get serviced first within any particular event loop iteration" doesn't matter, because all clients' sockets get handled very soon after they have data ready-for-read (or buffer space ready-for-write), and all new connections get accepted quickly.
select() leaves it up to the programmer to decide how to handle notifications. One call to select() can indicate that any or all sockets have bytes to be read and some connection requests to process. The application can process all the notifications before calling select() or process one notification before calling select() again.
You can use poll() on the listening socket after accept() to see if there are more clients waiting to connect.
Note the the number of concurrent connect-attempts is handled by the backlog parameter to listen(server_sock, backlog) - backlog is 1 in the sample you reference.

Removing a handle from a I/O completion port and other questions about IOCP

The CreateIoCompletionPort function allows the creation of a new I/O completion port and the registration of file handles to an existing I/O completion port.
Then, I can use any function, like a recv on a socket or a ReadFile on a file with a OVERLAPPED structure to start an asynchronous operation.
I have to check whether the function call returned synchronously although it was called with an OVERLAPPED structure and in this case handle it directly. In the other case, when ERROR_IO_PENDING is returned, I can use the GetQueuedCompletionStatus function to be notified when the operation completes.
The question which arise are:
How can I remove a handle from the I/O completion port? For example, when I add sockets to the IOCP, how can I remove closed ones? Should I just re-register another socket with the same completion key?
Also, is there a way to make the calls ALWAYS go over the I/O completion port and don't return synchronously?
And finally, is it possible for example to recv asynchronously but to send synchronously? For example when a simple echo service is implemented: Can I wait with an asynchronous recv for new data but send the response in a synchronous way so that code complexity is reduced? In my case, I wouldn't recv a second time anyways before the first request was processed.
What happens if an asynchronous ReadFile has been requested, but before it completes, a WriteFile to the same file should be processed. Will the ReadFile be cancelled with an error message and I have to restart the read process as soon as the write is complete? Or do I have to cancel the ReadFile manually before writing? This question arises in combination with a communication device; so, the write and read should not do problems if happening concurrently.
How can I remove a handle from the I/O completion port?
In my experience you can't disassociate a handle from a completion port. However, you may disable completion port notification by setting the low-order bit of your OVERLAPPED structure's hEvent field: See the documentation for GetQueuedCompletionStatus.
For example, when I add sockets to the IOCP, how can I remove closed ones? Should I just re-register another socket with the same completion key?
It is not necessary to explicitly disassociate a handle from an I/O completion port; closing the handle is sufficient. You may associate multiple handles with the same completion key; the best way to figure out which request is associated with the I/O completion is by using the OVERLAPPED structure. In fact, you may even extend OVERLAPPED to store additional data.
Also, is there a way to make the calls ALWAYS go over the I/O completion port and don't return synchronously?
That is the default behavior, even when ReadFile/WriteFile returns TRUE. You must explicitly call SetFileCompletionNotificationModes to tell Windows to not enqueue a completion packet when TRUE and ERROR_SUCCESS are returned.
is it possible for example to recv asynchronously but to send synchronously?
Not by using recv and send; you need to use functions that accept OVERLAPPED structures, such as WSARecv, WSASend, or alternatively ReadFile and WriteFile. It might be more handy to use the latter if your code is meant to work multiple types of I/O handles, such as both sockets and named pipes. Those functions provide a synchronous mode, so if you use those them you can mix asynchronous and synchronous calls.
What happens if an asynchronous ReadFile has been requested, but before it completes, a WriteFile to the same file should be processed?
There is no implicit cancellation. As long as you're using separate OVERLAPPED structures for each read/write to a full-duplex device, I see no reason why you can't do concurrent I/O operations.
As I’ve already pointed out there, the commonly held belief that it is impossible to remove handles from completion ports is wrong, probably caused by the abscence of any hint whatsoever on how to do this from nearly all documentation I could find. Actually, it’s pretty easy:
Call NtSetInformationFile with the FileReplaceCompletionInformationenumerator value for FileInformationClass and a pointer to a FILE_COMPLETION_INFORMATION structure for the FileInformation parameter. In this structure, set the Port member to NULL (or nullptr, in C++) to disassociate the file from the port it’s currently attached to (I guess if it isn’t attached to any port, nothing would happen),
or set Port to a valid HANDLE to another completion port to associate the file with that one instead.
First some important corrections.
In case the overlapped I/O operation completes immediately (ReadFile or similar I/O function returns success) - the I/O completion is already scheduled to the IOCP.
Also, according to your questions I think you confuse between the file/socket handles, and the specific I/O operations issued on them.
Now, regarding your questions:
AFAIK there is no conventional way to remove a file/socket handle from the IOCP (usually you just don't have to do this). You talk about removing closed handles from the IOCP, which is absolutely incorrect. You can't remove a closed handle, because it does not reference a valid kernel object anymore!
A more correct question should be how the file/socket should be properly closed. The answer is: just close your handle. All the outstanding I/O operations (issued on this handle) will return soon with an error code (abortion). Then, in your completion routine (the one that calls GetQueuedCompletionStatus in a loop) should perform the per-I/O needed cleanup.
As I've already said, all the I/O completion arrives at IOCP in both synchronous and asynchronous cases. The only situation where it does not arrive at IOCP is when an I/O completes synchronously with an error. Anyway, if you want a unified processing - in such a case you may post an artificial completion data to IOCP (use PostQueuedCompletionStatus).
You should use WSASend and WSARecv (not recv and send) for overlapped I/O. Nevertheless, even of the socket was opened with flag WSA_FLAG_OVERLAPPED - you are allowed to call the I/O functions without specifying the OVERLAPPED structure. In such a case those functions work synchronously.
So that you may decide on synchronous/asynchronous modes for every function call.
There is no problem to mix overlapped read/write requests. The only delicate point here is what happens if you try to read the data from the file position where you're currently writing to. The result may depend on subtle things, such as order of completion of I/Os by the hardware, some PC timing parameters and etc. Such a situation should be avoided.
How can I remove a handle from the I/O completion port? For example, when I add sockets to the IOCP, how can I remove closed ones? Should I just re-register another socket with the same completion key?
You've got it the wrong way around. You set the I/O completion port to be used by a file object - when the file object is deleted, you have nothing to worry about. The reason you're getting confused is because of the way Win32 exposes the underlying native API functionality (CreateIoCompletionPort does two very different things in one function).
Also, is there a way to make the calls
ALWAYS go over the I/O completion port
and don't return synchronously?
This is how it's always been. Only starting with Windows Vista can you customize how the completion notifications are handled.
What happens if an asynchronous
ReadFile has been requested, but
before it completes, a WriteFile to
the same file should be processed.
Will the ReadFile be cancelled with an
error message and I have to restart
the read process as soon as the write
is complete?
I/O operations in Windows are asynchronous inherently, and requests are always queued. You may not think this is so because you have to specify FILE_FLAG_OVERLAPPED in CreateFile to turn on asynchronous I/O. However, at the native layer, synchronous I/O is really an add-on, convenience thing where the kernel keeps track of the file position for you and waits for the I/O to complete before returning.

Call recv() on the same blocking socket from two threads

What happens if I have one socket, s, there is no data currently available on it, it is a blocking socket, and I call recv on it from two threads at once? Will one of the threads get the data? Will both get it? Will the 2nd call to recv return with an error?
One thread will get it, and there's no way to tell which.
This doesn't seem like a reasonable design. Is there a reason why you need two threads calling recv() on the same socket?
Socket implementations should be thread-safe, so exactly one thread should get the data when it becomes available. The other call should just block.
I can't find a reference for this, but here's my understanding:
A vendor's guarantee of thread-safety may mean only that multiple threads can each safely use their own sockets; it does not guarantee atomicity across a single call, and it doesn't promise any particular allocation of the socket's data among multiple threads.
Suppose thread A calls recv() on a socket that's receiving TCP data streaming in at a high rate. If recv() needs to be an atomic call, then thread A could block all other threads from executing, because it needs to be running continuously to pull in all the data (until its buffer is full, anyway.) That wouldn't be good. Hence, I would not assume that recv() is immune to context switching.
Conversely, suppose thread A makes a blocking call to recv() on a TCP socket, and the data is coming in slowly. Hence the call to recv() returns with errno set to EAGAIN.
In either of these cases, suppose thread B calls recv() on the same socket while thread A is still receiving data. When does thread A stop getting data handed to it so that thread B can start receiving data? I don't know of a Unix implementation that will try to remember that thread A was in the middle of an operation on the socket; instead, it's up to the application (threads A and B) to negotiate their use of it.
Generally, it's best to design the app so that only one of the threads will call recv() on a single socket.
From the man page on recv
A recv() on a SOCK_STREAM socket
returns as much available information
as the size of the buffer supplied can
hold.
Lets assume you are using TCP, since it was not specified in the question. So suppose you have thread A and thread B both blocking on recv() for socket s. Once s has some data to be received it will unblock one of the threads, lets say A, and return the data. The data returned will be of some random size as far as we are concerned. Thread A inspects the data received and decides if it has a complete "message", where a message is an application level concept.
Thread A decides it does not have a complete message, so it calls recv() again. BUT in the meantime B was already blocking on the same socket, and has received the rest of the "message" that was intended for thread A. I am using intended loosely here.
Now both thread A and thread B have an incomplete message, and will, depending on how the code is written, throw the data away as invalid, or cause weird and subtle errors.
I wish I could say I didn't know this from experience.
So while recv() itself is technically thread safe, it is a bad idea to have two threads calling it simultaneously if you are using it for TCP.
As far as I know it is completely safe when you are using UDP.
I hope this helps.

How to signal select() to return immediately?

I have a worker thread that is listening to a TCP socket for incoming traffic, and buffering the received data for the main thread to access (let's call this socket A). However, the worker thread also has to do some regular operations (say, once per second), even if there is no data coming in. Therefore, I use select() with a timeout, so that I don't need to keep polling. (Note that calling receive() on a non-blocking socket and then sleeping for a second is not good: the incoming data should be immediately available for the main thread, even though the main thread might not always be able to process it right away, hence the need for buffering.)
Now, I also need to be able to signal the worker thread to do some other stuff immediately; from the main thread, I need to make the worker thread's select() return right away. For now, I have solved this as follows (approach basically adopted from here and here):
At program startup, the worker thread creates for this purpose an additional socket of the datagram (UDP) type, and binds it to some random port (let's call this socket B). Likewise, the main thread creates a datagram socket for sending. In its call to select(), the worker thread now lists both A and B in the fd_set. When the main thread needs to signal, it sendto()'s a couple of bytes to the corresponding port on localhost. Back in the worker thread, if B remains in the fd_set after select() returns, then recvfrom() is called and the bytes received are simply ignored.
This seems to work very well, but I can't say I like the solution, mainly as it requires binding an extra port for B, and also because it adds several additional socket API calls which may fail I guess – and I don't really feel like figuring out the appropriate action for each of the cases.
I think ideally, I would like to call some function which takes A as input, and does nothing except makes select() return right away. However, I don't know such a function. (I guess I could for example shutdown() the socket, but the side effects are not really acceptable :)
If this is not possible, the second best option would be creating a B which is much dummier than a real UDP socket, and doesn't really require allocating any limited resources (beyond a reasonable amount of memory). I guess Unix domain sockets would do exactly this, but: the solution should not be much less cross-platform than what I currently have, though some moderate amount of #ifdef stuff is fine. (I am targeting mainly for Windows and Linux – and writing C++ by the way.)
Please don't suggest refactoring to get rid of the two separate threads. This design is necessary because the main thread may be blocked for extended periods (e.g., doing some intensive computation – and I can't start periodically calling receive() from the innermost loop of calculation), and in the meanwhile, someone needs to buffer the incoming data (and due to reasons beyond what I can control, it cannot be the sender).
Now that I was writing this, I realized that someone is definitely going to reply simply "Boost.Asio", so I just had my first look at it... Couldn't find an obvious solution, though. Do note that I also cannot (easily) affect how socket A is created, but I should be able to let other objects wrap it, if necessary.
You are almost there. Use a "self-pipe" trick. Open a pipe, add it to your select() read and write fd_set, write to it from main thread to unblock a worker thread. It is portable across POSIX systems.
I have seen a variant of similar technique for Windows in one system (in fact used together with the method above, separated by #ifdef WIN32). Unblocking can be achieved by adding a dummy (unbound) datagram socket to fd_set and then closing it. The downside is that, of course, you have to re-open it every time.
However, in the aforementioned system, both of these methods are used rather sparingly, and for unexpected events (e.g., signals, termination requests). Preferred method is still a variable timeout to select(), depending on how soon something is scheduled for a worker thread.
Using a pipe rather than socket is a bit cleaner, as there is no possibility for another process to get hold of it and mess things up.
Using a UDP socket definitely creates the potential for stray packets to come in and interfere.
An anonymous pipe will never be available to any other process (unless you give it to it).
You could also use signals, but in a multithreaded program you'll want to make sure that all threads except for the one you want have that signal masked.
On unix it will be straightforward with using a pipe. If you are on windows and want to keep using the select statement to keep your code compatible with unix, the trick to create an unbound UDP socket and close it, works well and easy. But you have to make it multi-threadsafe.
The only way I found to make this multi-threadsafe is to close and recreate the socket in the same thread as the select statement is running. Of course this is difficult if the thread is blocking on the select. And then comes in the windows call QueueUserAPC. When windows is blocking in the select statement, the thread can handle Asynchronous Procedure Calls. You can schedule this from a different thread using QueueUserAPC. Windows interrupts the select, executes your function in the same thread, and continues with the select statement. You can now in your APC method close the socket and recreate it. Guaranteed thread safe and you will never loose a signal.
To be simple:
a global var saves the socket handle, then close the global socket, the select() will return immediately: closesocket(g_socket);

Resources