I've been working on a polling TCP daemon for some time now. Recently, I've read that non-blocking sockets can sometimes throw an EWOULDBLOCK error during a send() or recv(). My understanding is that if recv() throws an EWOULDBLOCK, this (usually) means that there's nothing to receive. But what I'm unclear on is under what circumstances send() would throw an EWOULDBLOCK, and what would be proper procedure for handling such an event?
If send() throws an EWOULDBLOCK, should the daemon simply move on from that event, onto the next one? Using a polling interface like epoll, will a new event be fired when the descriptor becomes ready for writing?
what I'm unclear on is under what
circumstances send() would throw an
EWOULDBLOCK
When the sending-buffer (typically held by the OS, but, anyway, somewhere in the TCP/IP stack) is full and the counterpart hasn't acknowledged any of the bits sent to it from the buffer yet (so the stack must retain everything in the buffer in case a resend is necessary).
what would be proper procedure for
handling such an event?
In one way or another you must wait until the counterpart does acknowledge some of the packets sent to it, thereby allowing the TCP/IP stack to free some space for more "sending". Both classical select and more modern epoll (and in other OS's, kqueue &c) provide smart ways to perform such waiting (whether you're waiting to read something, write something, or "whichever of the two happens first"). Yep, watched-descriptors becoming ready (be it for reading or for writing) is the typical reason for epoll events!
Related
On an edge triggered epoll event I read a socket (or multiple sockets, if required) until there is no more data (EAGAIN or EWOULDBLOCK) then loop back to epoll_wait. What happens if, while processing that read, another socket (one that is not currently being read) becomes ready to read? Would the edge triggered epoll ignore this as it wasn't blocking in an epoll_wait at the time of the trigger/signal or would it return with the socket in the events array immediately on the next call to epoll_wait?
This will indeed work, epoll acts as if all events which happened to the epoll group before you made the epoll_wait call happened the moment you make the call. epoll is designed to be used this way so do not worry about this kind of usage. As long as you handle all of the events which had been triggered at the time epoll_wait returns you need not worry about any which happens between calls to it, they will be caught next time you call it.
Basically: Your usage is fine, keep going :)
Two cases are well-documented in the man pages for non-blocking sockets:
If send() returns the same length as the transfer buffer, the entire transfer finished successfully, and the socket may or may not be in a state of returning EAGAIN/EWOULDBLOCK the next call with >0 bytes to transfer.
If send() returns -1 and errno is EAGAIN/EWOULDBLOCK, none of the transfer finished, and the program needs to wait until the socket is ready for more data (EPOLLOUT in the epoll case).
What's not documented for nonblocking sockets is:
If send() returns a positive value smaller than the buffer size.
Is it safe to assume that the send() would return EAGAIN/EWOULDBLOCK on even one more byte of data? Or should a non-blocking program try to send() one more time to get a conclusive EAGAIN/EWOULDBLOCK? I'm worried about putting an EPOLLOUT watcher on the socket if it's not actually in a "would block" state to respond to it coming out of.
Obviously, the latter strategy (trying again to get something conclusive) has well-defined behavior, but it's more verbose and puts a hit on performance.
A call to send has three possible outcomes:
There is at least one byte available in the send buffer →send succeeds and returns the number of bytes accepted (possibly fewer than you asked for).
The send buffer is completely full at the time you call send.
→if the socket is blocking, send blocks
→if the socket is non-blocking, send fails with EWOULDBLOCK/EAGAIN
An error occurred (e.g. user pulled network cable, connection reset by peer) →send fails with another error
If the number of bytes accepted by send is smaller than the amount you asked for, then this consequently means that the send buffer is now completely full. However, this is purely circumstantial and non-authorative in respect of any future calls to send.
The information returned by send is merely a "snapshot" of the current state at the time you called send. By the time send has returned or by the time you call send again, this information may already be outdated. The network card might put a datagram on the wire while your program is inside send, or a nanosecond later, or at any other time -- there is no way of knowing. You'll know when the next call succeeds (or when it doesn't).
In other words, this does not imply that the next call to send will return EWOULDBLOCK/EAGAIN (or would block if the socket wasn't non-blocking). Trying until what you called "getting a conclusive EWOULDBLOCK" is the correct thing to do.
If send() returns the same length as the transfer buffer, the entire transfer finished successfully, and the socket may or may not be in a blocking state.
No. The socket remains in the mode it was in: in this case, non-blocking mode, assumed below throughout.
If send() returns -1 and errno is EAGAIN/EWOULDBLOCK, none of the transfer finished, and the program needs to wait until the socket is isn't blocking anymore.
Until the send buffer isn't full any more. The socket remains in non-blocking mode.
If send() returns a positive value smaller than the buffer size.
There was only that much room in the socket send buffer.
Is it safe to assume that the send() would block on even one more byte of data?
It isn't 'safe' to 'assume [it] would block' at all. It won't. It's in non-blocking mode. EWOULDBLOCK means it would have blocked in blocking mode.
Or should a non-blocking program try to send() one more time to get a conclusive EAGAIN/EWOULDBLOCK?
That's up to you. The API works whichever you decide.
I'm worried about putting an EPOLLOUT watcher on the socket if it's not actually blocking on that.
It isn't 'blocking on that'. It isn't blocking on anything. It's in non-blocking mode. The send buffer got filled at that instant. It might be completely empty a moment later.
I don't see what you're worried about. If you have pending data and the last write didn't send it all, select for writability, and write when you get it. If such a write sends everything, don't select for writability next time.
Sockets are usually writable, unless their send buffer is full, so don't select for writability all the time, as you just get a spin loop.
I've seen a few write-ups comparing select() with poll() or epoll(), and I've seen many guides discussing the actual usage of select() with multiple sockets.
However, what I can't seem to find is a comparison to a non-blocking recv() call without select(). In the event of only having 1 socket to read from and 1 socket to write to, is there any justification for using the select() call? The recv() method can be setup to not block and return an error (WSAEWOULDBLOCK) when there is no data available, so why bother to call select() when you have no other sockets to examine? Is the non-blocking recv() call much slower?
You wouldn't want a non-blocking call to recv without some other means for waiting for data on the socket as you poll infinitely eating up cpu time.
If you have no other sockets to examine and nothing else to do in the same thread, a blocking call to read is likely to be the most efficient solution. Although in such a situation, considering the efficiency of this is like to be premature optimisation.
These kinds of considerations only tend to come into play as the socket count increases.
Nonblocking calls are only faster in the context of handling multiple sockets on a single thread.
If there is no data available, and you use non-blocking IO, recv() will return immediately.
Then what should the program do ? You would need to call recv() in a loop until data becomes available - this just uses CPU for pretty much no reason.
Spinning on recv() and burning CPU in that manner is very undesirable; you'd rather want the process to wait until data becomes available and get woken up; that's what select()/poll() and similar does.
And, sleep() in the loop in order to not burn CPU is not a good solution either. You'd introduce high latency in the processing as the program will not be able to process data as soon as the data is available.
select() and friends let you design the workflow in such a way that slowness of one socket does not impede the speed at which you can serve another. Imagine that data arrives fast from the receiving socket and you want to accept it as fast as possible and store in memory buffers. But the sending socket is slow. When you've filled up the sending buffers of the OS and send() gave you EWOULDBLOCK, you can issue select() to wait on both receiving and sending sockets. select() will fall through if either new data on the receiving socket arrived, or some buffers are freed and you can write more data to the sending socket, whichever happens first.
Of course a more realistic use case for select() is when you have multiple sockets to read from and/or to write to, or when you must pass the data between your two sockets in both directions.
In fact, select() tells you when the next read or write operation on a socket is known to succeed, so if you only try to read and write when select allows you, your program will almost work even if you didn't make the sockets non-blocking! It is still unwise to do, because there exist edge cases when the next operation still may block despite select() reported that the socket as "ready".
On the other hand, making the sockets non-blocking and not using select() is almost never advisable because of the reason explained by #Troy.
My application has ONLY 1 Unix TCP socket that it uses to recv() and send(). The socket is non-blocking. Given this, is there an advantage in doing a select() before a send()/recv()?
If the underlying TCP pipe is not ready for an I/O, the send()/recv() should immediately return with an EWOULDBLOCK or EAGAIN. So, what's the point of doing a select()? Seems like, it might only cause an additional system call overhead in this case. Am I missing anything?
EDIT: Forgot to mention: The application is single-threaded.
If your socket is non-blocking, then you need select (or preferably poll, which does not have the broken FD_SETSIZE limit and associated dangers) to block for you in place of the blocking that would be taking place (if the socket were not non-blocking) in send and recv. Otherwise you will spin, using 100% cpu time to do-nothing. In most cases, you could just as easily make the socket blocking and do away with select/poll. However, there is one interesting case to consider: blocking IO could deadlock if your program is blocked in send and the program at the other end of the socket is also blocked in send (or the opposite). With non-blocking IO and select/poll, you naturally detect this situation and process the pending input when writing your output is not possible.
You could just do recv() in a loop, but then you'll be consuming a lot of CPU time.
Better to wait on select() to avoid the extra CPU overhead. If you need to be doing background tasks, add a timeout to select() so you can wake periodically, even with no network traffic.
If your application is letency sensitive then in may be justified to spin in a tight recv() loop without select() and give it a dedicated CPU (otherwise scheduler will punish it and you end up having massive latency). If your app cannot afford it but still does gie a thread to serve this socket then just make the socket blocking on read side and let scheduler wake your thread up when data is available. On the sending side again depends on what you need, either make the socket blocking or spin.
Only if your application is single threaded and the logic is "receive-process-reply" you absolutely need a non blocking read/write socket, selector, and a write queue, so that you receive when data is there, process, pit response to the queue, register for writability, flush the queue to the socket when writable, unregister from writability. readability is to be registered for all the time.
From the manual of epoll_ctl:
EPOLLRDHUP (since Linux 2.6.17)
Stream socket peer closed connection, or shut down writing half of connection. (This flag is especially useful for writing simple code to detect peer shutdown when using Edge Triggered monitoring.)
From the manual of recv:
If no messages are available to be received and the peer has performed an orderly shutdown, recv() shall return 0.
It seems to me then that both of the above cover the same scenarios, and that as long as I catch EPOLLRDHUP events first, I should never receive a read() or recv() of length 0 (and thus don't need to bother checking for such). But is this guaranteed to be true?
If you get an event with EPOLLRDHUP=1 then just close the connection right away without reading. If you get an event with EPOLLRDHUP=0 and EPOLLIN=1 then go ahead and read, but you should be prepared to handle the possibility of recv() still returning 0, just in case. Perhaps a FIN arrives after you got EPOLLIN=1 but before you actually call recv().