How should I handle file descriptor 'dependencies' when using epoll? - c

I'm writing an HTTP/2 server in C, using epoll. Let's say a client asks for /index.html - I need to open a file descriptor pointing to that file and then send it back to the socket whenever I read a chunk of it. So I'd have an event loop that looks something like this:
while (true)
events = epoll_wait()
for event in events
if event is on a socket
handle socket i/o
else if event is on a disk file
read as much as possible, and send to associated socket
However this poses a problem. If the socket then closes (for whatever reason), the file descriptor for index.html will also get closed too. But it's possible that the index.html FD will have already been queued for reading (i.e, it's already in events, since you closed it between calls to epoll_wait), and as such when the for loop gets to processing the FD I'll now be accessing a 'dangling' FD.
If this was a single threaded program I'd try and hack around the issue by looking at file descriptor numbers, but unfortunately I'm running the same epoll loop on multiple threads which means that I can't predict what FD numbers will be in use at a given moment. It's totally possible that by the time the invalid read on the file comes around, another thread will have claimed that FD, so the call to read won't explicitly fail but I'll probably get a use after free anyway for trying to send it on a socket that doesn't exist anymore.
What's the best way of dealing with this issue? Maybe I should take an entirely different approach and not have file I/O on the same epoll loop at all.

Related

How to properly handle socket accept returning "Too many open files"

I have a listening socket on a tcp port. The process itself is using setrlimit(RLIMIT_NOFILE,&...); to configure how many sockets are allowed for the process.
For tests RLIMIT_NOFILE is set to 20 and of course for production it will be set to a sanely bigger number. 20 is good for easily reaching the limit in a test environment.
The server itself has no issues like descriptor leak or similar, but trying to solve the problem by increasing RLIMIT_NOFILE obviously cannot do, because in real life there is no guarantee the the limit will not be reached, no matter how high it is set.
The problem is that after reaching the limit accept returns Too many open files and unless a file or socket is closed the event loop starts spinning without delay, eating 100% of one core. Even if the client closes the connection (e.g. because of timeout), the server will loop until a file descriptor is available to process and close the already dead incoming connection. EDIT: On the other hand the client stalls and there is no good way to know that the server is overloaded.
My question: is there some standard way to handle this situation by closing the incoming connection after accept returns Too many open files.
Several dirty approaches come to mind:
To close and reopen the listening socket with the hope that all pending connections will be closed (this is quite dirty because in threaded server some other thread may get the freed file descriptor)
To track open file descriptor count (this cannot be properly done with external libraries that will have some untracked file descriptors)
To check if file descriptor number is near the limit and start closing incoming connections before the situation happens (this is rather implementation specific and while it will work on Linux, there is no guarantee that other OS will handle file descriptors in the same way)
EDIT: One more dirty and ugly approach:
To keep one spare fd (e.g. dup(STDIN_FILENO) or open("/dev/null",...)) that will be used in case accept fails. The sequence will be:
... accept failed
// stop threads
close(sparefd);
newconnection = accept(...);
close(newconnection);
sparefd = open("/dev/null",...);
// release threads
The main drawback with this approach is thread synchronization to prevent other threads to get the just freed spare fd.
You shouldn't use setrlimit to control how many simultaneous connections your process can handle. Your tiny little bit of socket code is saying to the whole rest of the application, "I only want to have N connections open at a time, and this is the only way I know how to do it, so... nothing else in the process can have any files!". What would happen if everybody did that?
The proper way to do what you want is easy -- keep track of how many connections you have open, and just don't call accept until you can handle another one.
I understand that your code is in a library. The library encounters a resource limit event. I would distinguish, generally, between events which are catastrophic (memory exhaustion, can't open listening socket) and those which are probably temporary. Catastrophic events are hard to handle: without memory, even logging or an orderly shutdown may be impossible.
Too many open files, by contrast, is a condition which is probably temporary, not least because we are the resource hog. Temporary error conditions are luckily trivial to handle: By waiting. This is what you don't do: You should wait for a spell after accept returns "Too many open files", before you call accept again. That will solve the 100% CPU load problem. (I assume that our server performs some work on each connection which is at some point finished, so that the file descriptors of the client connections which our library holds are eventually closed.)
There remains the problem that the library cannot know the requirements of the user code. (How long should the pause between accepts be?1 Is it at all acceptable (sic) to let connection requests wait at all? Do we give up at some point?) It is imperative to report errors back to the user code, so that the user code has a chance to see and fix the error.
If the user code gets the file descriptor back, that's easy: Return accept's error code (and make sure to document that possibility). So I assume that the user code never sees gritty details like file descriptors but instead gets some data, for example. It may even be that the library performs just side effects, possibly concurrently, so that user code never sees any return value which would be usable to communicate errors. Then the library must provide some other way to signal the error condition to the user code. This may impose restrictions on how the user code can use the library: Perhaps before or after certain function calls, or simply periodically, an error status must be actively checked.
1By the way, it is not clear to me, even after reading the accept man page, whether the client's connect fails (because the connection request has been de-queued on the server side but cannot be handled), or whether the request simply stays in the queue so that the client is oblivious of the server's problems, apart from a delay.
notice that multiplexing syscalls such as poll(2) can work (so wait without busy spin looping) on accept-ing sockets (and on connected sockets also, or any other kind of stream file descriptor).
So just have your event loop handle them (probably with other readable & writable file descriptors). And don't call accept(2) when you don't want to.

Avoid send to block when not using O_NONBLOCK

I have to write a chat client-server for a class using unix sockets (without O_NONBLOCK) and select for asynchronous I/O on them. At the moment, on the server, i read 1024 bytes from the client, and directly handle it.
For example, in case of a message, i will receive a command formatted as MSG <msg> (representing a client sending a message), i will go through all the sockets of the connected clients and write the message on them.
This approach actually works but i recently found by reading the man of send that it can blocks if the socket buffer is full and is the flag O_NONBLOCK is not set on the socket.
I think this problem could happen when a client does not read for some reasons (crash, bugged etc.) and this would be critical for my server since it will basically blocks until this client read again.
So here is my question:
What is the correct approach on a potentially blocking socket to avoid send to block if the socket buffer is full?
I'm currently using select only to check if there is something to read on sockets but maybe i should use it also to see if i can write on a particular socket too? And also, can i know how many bytes i can read/write when select returns? For example, if select "tells" that i can write on this socket, how can i know how many bytes i can write at most before writing on this socket actually becomes blocking?
Thanks.
You could use setsockopt() together with SO_SNDTIMEO to set up a maximum amount of time send() will try to do its work.
See man setsockoptand man 7 socket for details.
It might be horrible. If you don't go NONBLOCK-ing mode and calling select(), which internally puts the process on sleep for specific timeout value. That means, fd will be blocked for that specific time period.
This approach actually works but i recently found by reading the man of send that it can blocks if the socket buffer is full and is the flag O_NONBLOCK is not set on the socket.
This is why you use select, but it still isn't reliable, as man select states:
Under Linux, select() may report a socket file descriptor as "ready for reading", while nevertheless a subsequent read blocks. This could for example happen when data has
arrived but upon examination has wrong checksum and is discarded. There may be other circumstances in which a file descriptor is spuriously reported as ready. Thus it may
be safer to use O_NONBLOCK on sockets that should not block.

Unix: What happens when a read file descriptor closes while calling select()

Say that I call select() on a FD_SET containing a bunch of read file descriptors. What happens if during the select() call, one of the file descriptor closes? Assuming that some sort of error occurs, then is it my responsibility to find and remove the closed file descriptor from the set?
I don't believe this is specified anywhere; some systems may immediately return from select while others may continue blocking. Note that the only way this can happen is in a multi-threaded process (otherwise, the close cannot happen during select; even if it happened from a signal handler, select would have already been interrupted by the signal). As such, this situation arising probably indicates you have bigger issues to worry about. If one of the file descriptors you're polling can be closed during select, the bigger issue is that the same file descriptor might be reassigned to a newly opened file (e.g. one opened in another unrelated thread) immediately after the close, and the thread that's polling might then wrongly perform IO on the new file "belonging to" a different thread.
If you have a data object that consists of a set of file descriptors that will be polled with select in a multithreaded program, you almost surely need to be using some sort of synchronization primitive to control access to that set, and adding or removing file descriptors should require a lock that's mutually exclusive with the possibility that select (or any IO on the members) is in progress.
Of course in a multi-threaded program, it may be better not to use select at all and instead let blocking IO in multiple threads achieve the desired result without complicated locking logic.
The select() system call takes three fd_set parameters: Send, Receive, Exception. To check, if an error occurs on a reading file descriptor include it in the read (receive) and in the error (exceprion) set - seeing it in the exception set on return from select() means, an exception has occurred on that socket, giving you the chance to find out what.
In general network sockets with any sort of exception will no longer be fit to send and receive.
Even if you've read all the sent data, a closed socket is always regarded as ready to read. Select will unblock, signaling that socket to be available.

Removing a handle from a I/O completion port and other questions about IOCP

The CreateIoCompletionPort function allows the creation of a new I/O completion port and the registration of file handles to an existing I/O completion port.
Then, I can use any function, like a recv on a socket or a ReadFile on a file with a OVERLAPPED structure to start an asynchronous operation.
I have to check whether the function call returned synchronously although it was called with an OVERLAPPED structure and in this case handle it directly. In the other case, when ERROR_IO_PENDING is returned, I can use the GetQueuedCompletionStatus function to be notified when the operation completes.
The question which arise are:
How can I remove a handle from the I/O completion port? For example, when I add sockets to the IOCP, how can I remove closed ones? Should I just re-register another socket with the same completion key?
Also, is there a way to make the calls ALWAYS go over the I/O completion port and don't return synchronously?
And finally, is it possible for example to recv asynchronously but to send synchronously? For example when a simple echo service is implemented: Can I wait with an asynchronous recv for new data but send the response in a synchronous way so that code complexity is reduced? In my case, I wouldn't recv a second time anyways before the first request was processed.
What happens if an asynchronous ReadFile has been requested, but before it completes, a WriteFile to the same file should be processed. Will the ReadFile be cancelled with an error message and I have to restart the read process as soon as the write is complete? Or do I have to cancel the ReadFile manually before writing? This question arises in combination with a communication device; so, the write and read should not do problems if happening concurrently.
How can I remove a handle from the I/O completion port?
In my experience you can't disassociate a handle from a completion port. However, you may disable completion port notification by setting the low-order bit of your OVERLAPPED structure's hEvent field: See the documentation for GetQueuedCompletionStatus.
For example, when I add sockets to the IOCP, how can I remove closed ones? Should I just re-register another socket with the same completion key?
It is not necessary to explicitly disassociate a handle from an I/O completion port; closing the handle is sufficient. You may associate multiple handles with the same completion key; the best way to figure out which request is associated with the I/O completion is by using the OVERLAPPED structure. In fact, you may even extend OVERLAPPED to store additional data.
Also, is there a way to make the calls ALWAYS go over the I/O completion port and don't return synchronously?
That is the default behavior, even when ReadFile/WriteFile returns TRUE. You must explicitly call SetFileCompletionNotificationModes to tell Windows to not enqueue a completion packet when TRUE and ERROR_SUCCESS are returned.
is it possible for example to recv asynchronously but to send synchronously?
Not by using recv and send; you need to use functions that accept OVERLAPPED structures, such as WSARecv, WSASend, or alternatively ReadFile and WriteFile. It might be more handy to use the latter if your code is meant to work multiple types of I/O handles, such as both sockets and named pipes. Those functions provide a synchronous mode, so if you use those them you can mix asynchronous and synchronous calls.
What happens if an asynchronous ReadFile has been requested, but before it completes, a WriteFile to the same file should be processed?
There is no implicit cancellation. As long as you're using separate OVERLAPPED structures for each read/write to a full-duplex device, I see no reason why you can't do concurrent I/O operations.
As I’ve already pointed out there, the commonly held belief that it is impossible to remove handles from completion ports is wrong, probably caused by the abscence of any hint whatsoever on how to do this from nearly all documentation I could find. Actually, it’s pretty easy:
Call NtSetInformationFile with the FileReplaceCompletionInformationenumerator value for FileInformationClass and a pointer to a FILE_COMPLETION_INFORMATION structure for the FileInformation parameter. In this structure, set the Port member to NULL (or nullptr, in C++) to disassociate the file from the port it’s currently attached to (I guess if it isn’t attached to any port, nothing would happen),
or set Port to a valid HANDLE to another completion port to associate the file with that one instead.
First some important corrections.
In case the overlapped I/O operation completes immediately (ReadFile or similar I/O function returns success) - the I/O completion is already scheduled to the IOCP.
Also, according to your questions I think you confuse between the file/socket handles, and the specific I/O operations issued on them.
Now, regarding your questions:
AFAIK there is no conventional way to remove a file/socket handle from the IOCP (usually you just don't have to do this). You talk about removing closed handles from the IOCP, which is absolutely incorrect. You can't remove a closed handle, because it does not reference a valid kernel object anymore!
A more correct question should be how the file/socket should be properly closed. The answer is: just close your handle. All the outstanding I/O operations (issued on this handle) will return soon with an error code (abortion). Then, in your completion routine (the one that calls GetQueuedCompletionStatus in a loop) should perform the per-I/O needed cleanup.
As I've already said, all the I/O completion arrives at IOCP in both synchronous and asynchronous cases. The only situation where it does not arrive at IOCP is when an I/O completes synchronously with an error. Anyway, if you want a unified processing - in such a case you may post an artificial completion data to IOCP (use PostQueuedCompletionStatus).
You should use WSASend and WSARecv (not recv and send) for overlapped I/O. Nevertheless, even of the socket was opened with flag WSA_FLAG_OVERLAPPED - you are allowed to call the I/O functions without specifying the OVERLAPPED structure. In such a case those functions work synchronously.
So that you may decide on synchronous/asynchronous modes for every function call.
There is no problem to mix overlapped read/write requests. The only delicate point here is what happens if you try to read the data from the file position where you're currently writing to. The result may depend on subtle things, such as order of completion of I/Os by the hardware, some PC timing parameters and etc. Such a situation should be avoided.
How can I remove a handle from the I/O completion port? For example, when I add sockets to the IOCP, how can I remove closed ones? Should I just re-register another socket with the same completion key?
You've got it the wrong way around. You set the I/O completion port to be used by a file object - when the file object is deleted, you have nothing to worry about. The reason you're getting confused is because of the way Win32 exposes the underlying native API functionality (CreateIoCompletionPort does two very different things in one function).
Also, is there a way to make the calls
ALWAYS go over the I/O completion port
and don't return synchronously?
This is how it's always been. Only starting with Windows Vista can you customize how the completion notifications are handled.
What happens if an asynchronous
ReadFile has been requested, but
before it completes, a WriteFile to
the same file should be processed.
Will the ReadFile be cancelled with an
error message and I have to restart
the read process as soon as the write
is complete?
I/O operations in Windows are asynchronous inherently, and requests are always queued. You may not think this is so because you have to specify FILE_FLAG_OVERLAPPED in CreateFile to turn on asynchronous I/O. However, at the native layer, synchronous I/O is really an add-on, convenience thing where the kernel keeps track of the file position for you and waits for the I/O to complete before returning.

Broken Pipe error

I am using write() on a opened data socket in FTP implementation to send the file out. But after writing some data it is hanging for some time; and after that it is returning with Broken pipe error. any help in this will greatly appreciated. My process reads packets from one buff and writes in to the socket. I noticed this problem with increased bandwidth. If i increased number of packets to be processed then the problem is coming. i am using FreeBSD.
I am using two threads one reads packets and writes in to a buffer ... second thread reads these packets from buffer and writes in to socket.
Thanks For your help
Alexander
SIGPIPE is sent to your process by the kernel when attempt to write data to a broken pipe is detected. This might happen, for example, if receiving side has closed the socket while you writing, or if socket is accidentally closed from another thread, etc. There are a lot of possible reasons for that. Most applications tend to ignore this signal and handle errors basing on "write" return code because there is nothing reasonable you can do in SIGPIPE signal processing handler. Basically, set SIGPIPE handler to SIG_IGN in order to ignore it and look at a list of possible return codes from "write" system call and handle them accordingly.
EPIPE may be set as an error code, and/or SIGPIPE raised (depending on flags), when you attempt to write to a file descriptor that has closed. It is likely that the remote endpoint of your connection has closed, and you've not checked for the close/EOF event (typically returned via the read event when poll/selecting, or a return value of zero from read/recv).

Resources