On an edge triggered epoll event I read a socket (or multiple sockets, if required) until there is no more data (EAGAIN or EWOULDBLOCK) then loop back to epoll_wait. What happens if, while processing that read, another socket (one that is not currently being read) becomes ready to read? Would the edge triggered epoll ignore this as it wasn't blocking in an epoll_wait at the time of the trigger/signal or would it return with the socket in the events array immediately on the next call to epoll_wait?
This will indeed work, epoll acts as if all events which happened to the epoll group before you made the epoll_wait call happened the moment you make the call. epoll is designed to be used this way so do not worry about this kind of usage. As long as you handle all of the events which had been triggered at the time epoll_wait returns you need not worry about any which happens between calls to it, they will be caught next time you call it.
Basically: Your usage is fine, keep going :)
Related
I have a server application that uses IOCP. I want to know what is the proper way to close a SOCKET.
If I simply call closesocket() (for a SOCKET with a handle of for example 12345), and this SOCKET has pending IO operations (for example: a pending WSARecv() request), then the following scenario could happen:
I call closesocket() which will destroy the SOCKET.
I accept another SOCKET with the same handle of 12345.
I deque the pending WSARecv() completion packet for the SOCKET with the handle of 12345. Now I would assume that this completion packet is for the current SOCKET with the handle of 12345, but in fact it is for the SOCKET that was previously closed (this is he main problem with this approach).
So this is obviously a bad approach.
The second approach that seems correct is the following:
I associate a struct instance with each SOCKET. The struct has the following members: an int called number_of_pending_IO_operations, and a boolean called Is_SOCKET_being_closed.
When I issue an IO operation for the SOCKET (for example: a WSASend() request), I increment number_of_pending_IO_operations by 1, and when I deque a completion packet for the SOCKET, I decrement number_of_pending_IO_operations by 1.
Now, when I want to close the SOCKET, I don't simply call closesocket(), but rather I call CancelIOEx() to cancel all pending IO operations for the SOCKET, and I also set Is_SOCKET_being_closed to true.
When I am about to issue another IO operation (for example: a WSASend() request), I would check the value of Is_SOCKET_being_closed, and if it is true, I would not issue the IO operation.
Now I simply wait for all of the completion packets to be dequeued, and when number_of_pending_IO_operations reaches 0 and Is_SOCKET_being_closed is set to true, I call closesocket().
Of course I would have race conditions, so I would use critical sections.
Is the second approach a correct way to close a SOCKET, or is there a better way?
I had a similar issue in my client-server application. I think I have no races, no rundown protection, and no critical sections. There are some tradeoffs though.
only one WSARecv() at a time. Multiple buffers -- maybe, but only one WSARecv(). WSARecv completes (possibly, inline), packet pops from completion port, I quickly check what I got and issue another WSARecv()
I have a flag (actually, a counter that can only go up) for Is_SOCKET_being_closed. Counter because I may simultaneously decide to kill the socket in two different places.
after setting Is_SOCKET_being_closed, I call shutdown() followed by CancelIoEx() which will break the previously issued WSARecv().
there is a race between CancelIoEx and completion of WSARecv() -- I eliminate it with double checking Is_SOCKET_being_closed immediately before issuing WSARecv(). Even if race occurs, WSARecv() is doomed to fail because shutdown().
the structure is refcounted, where one ref is held by the receive state machine, and one ref is held for every legal owner of the ptr (as in, potential sender). If you have a legal ref, you can make a copy which you can hand off to someone else (it differs here from rundown references as their AddRef() may fail). I will only closesocket() when that ref drains to zero, because I cannot guarantee that there is no thread that has passed all the checks and about to issue a WSARecv/WSASend.
The following strategies seem to work well:
Using a single thread/process with a nonblocking accept() call on the listener socket, regardless of how the program handles the accepted request.
Using multiple threads/processes with a blocking accept() call in each process. When a connection comes in, this wakes up exactly one accept().
What doesn't work well is EPOLLIN watching the listener socket each thread/process with accept() in a callback. This wakes every thread/process up, though only one can succeed in actually accept()ing. This is just like the bad old days of blocking accept() causing a stampede when a connection would come in.
Is there a way to only have a single thread/process wake up to accept() while still using EPOLLIN? Or should I rewrite to use blocking accept()s, just isolated using threads?
It's not an option to have only a single thread/process run accept() because I'm trying to manage the processes as a pool in a way where each process doesn't need to know whether it's the only daemon accept()ing on the listener socket.
You need to use EPOLLET or EPOLLONESHOT so that exactly one thread gets woken by the EPOLLIN event when a new connection comes in. The handling thread then needs to call accept in a loop until it returns EAGAIN (EPOLLET) or manually reset with epoll_ctl (EPOLLONESHOT) in order for more connections to be handled.
In general when using multiple threads and epoll, you want to use EPOLLET or EPOLLONESHOT. Otherwise when an event happens, multiple threads will be woken to handle it and they may interfere with each other. At best, they'll just waste time figuring out that some other thread is handling the event before waiting again. At worst they'll deadlock or corrupt stuff.
How about multiple sockets listening on the same proto+address+port? This can be accomplished with the Linux SO_REUSERPORT. https://lwn.net/Articles/542629/ . I have not tried it, but I think it should work even with epoll, since only one socket gets the actual event.
caveat emptor This is a non-portable, Linux-only solution. SO_REUSEPORT also suffers from some bugs/features that are detailed in the linked article.
I've seen a few write-ups comparing select() with poll() or epoll(), and I've seen many guides discussing the actual usage of select() with multiple sockets.
However, what I can't seem to find is a comparison to a non-blocking recv() call without select(). In the event of only having 1 socket to read from and 1 socket to write to, is there any justification for using the select() call? The recv() method can be setup to not block and return an error (WSAEWOULDBLOCK) when there is no data available, so why bother to call select() when you have no other sockets to examine? Is the non-blocking recv() call much slower?
You wouldn't want a non-blocking call to recv without some other means for waiting for data on the socket as you poll infinitely eating up cpu time.
If you have no other sockets to examine and nothing else to do in the same thread, a blocking call to read is likely to be the most efficient solution. Although in such a situation, considering the efficiency of this is like to be premature optimisation.
These kinds of considerations only tend to come into play as the socket count increases.
Nonblocking calls are only faster in the context of handling multiple sockets on a single thread.
If there is no data available, and you use non-blocking IO, recv() will return immediately.
Then what should the program do ? You would need to call recv() in a loop until data becomes available - this just uses CPU for pretty much no reason.
Spinning on recv() and burning CPU in that manner is very undesirable; you'd rather want the process to wait until data becomes available and get woken up; that's what select()/poll() and similar does.
And, sleep() in the loop in order to not burn CPU is not a good solution either. You'd introduce high latency in the processing as the program will not be able to process data as soon as the data is available.
select() and friends let you design the workflow in such a way that slowness of one socket does not impede the speed at which you can serve another. Imagine that data arrives fast from the receiving socket and you want to accept it as fast as possible and store in memory buffers. But the sending socket is slow. When you've filled up the sending buffers of the OS and send() gave you EWOULDBLOCK, you can issue select() to wait on both receiving and sending sockets. select() will fall through if either new data on the receiving socket arrived, or some buffers are freed and you can write more data to the sending socket, whichever happens first.
Of course a more realistic use case for select() is when you have multiple sockets to read from and/or to write to, or when you must pass the data between your two sockets in both directions.
In fact, select() tells you when the next read or write operation on a socket is known to succeed, so if you only try to read and write when select allows you, your program will almost work even if you didn't make the sockets non-blocking! It is still unwise to do, because there exist edge cases when the next operation still may block despite select() reported that the socket as "ready".
On the other hand, making the sockets non-blocking and not using select() is almost never advisable because of the reason explained by #Troy.
My application has ONLY 1 Unix TCP socket that it uses to recv() and send(). The socket is non-blocking. Given this, is there an advantage in doing a select() before a send()/recv()?
If the underlying TCP pipe is not ready for an I/O, the send()/recv() should immediately return with an EWOULDBLOCK or EAGAIN. So, what's the point of doing a select()? Seems like, it might only cause an additional system call overhead in this case. Am I missing anything?
EDIT: Forgot to mention: The application is single-threaded.
If your socket is non-blocking, then you need select (or preferably poll, which does not have the broken FD_SETSIZE limit and associated dangers) to block for you in place of the blocking that would be taking place (if the socket were not non-blocking) in send and recv. Otherwise you will spin, using 100% cpu time to do-nothing. In most cases, you could just as easily make the socket blocking and do away with select/poll. However, there is one interesting case to consider: blocking IO could deadlock if your program is blocked in send and the program at the other end of the socket is also blocked in send (or the opposite). With non-blocking IO and select/poll, you naturally detect this situation and process the pending input when writing your output is not possible.
You could just do recv() in a loop, but then you'll be consuming a lot of CPU time.
Better to wait on select() to avoid the extra CPU overhead. If you need to be doing background tasks, add a timeout to select() so you can wake periodically, even with no network traffic.
If your application is letency sensitive then in may be justified to spin in a tight recv() loop without select() and give it a dedicated CPU (otherwise scheduler will punish it and you end up having massive latency). If your app cannot afford it but still does gie a thread to serve this socket then just make the socket blocking on read side and let scheduler wake your thread up when data is available. On the sending side again depends on what you need, either make the socket blocking or spin.
Only if your application is single threaded and the logic is "receive-process-reply" you absolutely need a non blocking read/write socket, selector, and a write queue, so that you receive when data is there, process, pit response to the queue, register for writability, flush the queue to the socket when writable, unregister from writability. readability is to be registered for all the time.
I've been working on a polling TCP daemon for some time now. Recently, I've read that non-blocking sockets can sometimes throw an EWOULDBLOCK error during a send() or recv(). My understanding is that if recv() throws an EWOULDBLOCK, this (usually) means that there's nothing to receive. But what I'm unclear on is under what circumstances send() would throw an EWOULDBLOCK, and what would be proper procedure for handling such an event?
If send() throws an EWOULDBLOCK, should the daemon simply move on from that event, onto the next one? Using a polling interface like epoll, will a new event be fired when the descriptor becomes ready for writing?
what I'm unclear on is under what
circumstances send() would throw an
EWOULDBLOCK
When the sending-buffer (typically held by the OS, but, anyway, somewhere in the TCP/IP stack) is full and the counterpart hasn't acknowledged any of the bits sent to it from the buffer yet (so the stack must retain everything in the buffer in case a resend is necessary).
what would be proper procedure for
handling such an event?
In one way or another you must wait until the counterpart does acknowledge some of the packets sent to it, thereby allowing the TCP/IP stack to free some space for more "sending". Both classical select and more modern epoll (and in other OS's, kqueue &c) provide smart ways to perform such waiting (whether you're waiting to read something, write something, or "whichever of the two happens first"). Yep, watched-descriptors becoming ready (be it for reading or for writing) is the typical reason for epoll events!