I am having trouble understanding what it means to add a descriptor to writefds set for select() in linux. I wrote some simple code that adds the descriptor for stdout to the writefds set and uses a timeout of NULL. Now, my code just infinite loops checking if this descriptor is set, and if it does, it prints "WRITING". When I run my code it just keeps printing "WRITING" to infinity. The same thing happens when I do this for stdin. Again, there is no other code in the loop. Are stdin/stdout always just ready for writing?
It means you can call write on that fd and the kernel promises to not-block and consume at least 1 byte.
More details. If your socket is not in non-blocking mode and the kernel buffers associated with the socket are full, the kernel will put your thread to sleep until it can empty some of the buffer and be able to consume part of your write.
If your socket is in non-blocking mode and the kernel buffers are full, the write will return immediately without consuming any bytes.
The answer to the question "Is stdout always ready for writing" is "It depends."
stdout can be connected to anything that can be opened as a file descriptor - like a disk file, a network socket, or a pipe. The usual case is that it's connected to a terminal device.
Most of these types of file descriptors can block on writing (which means they might not be marked writeable after select() returns), but usually only if you're just written a very large amount of data to them (and so filled some kind of buffer). "Large amount" varies between the device types - if your stdout terminal is a 9600 baud serial device, then you could fill the write buffer pretty easily; an xterm, not so much.
Some device will never block - like disk files, or /dev/null, for example. (write() to a disk file might not complete immediately, but this isn't considered "blocking" - it's a "disk wait").
Yes, a truthy return from FD_ISSET(fd, &writefds) means fd is writeable. If you call select() with that FD set in the writefds after you get EWOULDBLOCK or EAGAIN (equivalent on Linux, at least) it blocks until the FD is again writeable.
There's more to it than that. For instance, an FD is also considered writeable if you've done a non-blocking connect() on it, you got EAGAIN, and call select() to wait for the connection to be established. That establishment is signalled in the writefds.
Related
I'm using C Sockets to send ICMP packets with the MSG_DONTWAIT flag set.
My program is single threaded but it expected to send messages at high frequency so I'm setting the message send as non blocking.
Is it safe to share/modify/reuse the message buffer after each call ? (Unless EAGAIN or EWOULDBLOCK is returned).
msg_control (the ancillary data) is reused and msg_control->struct in_pktinfo->ipi_ifindex (outbound interface ifindex) is modified between calls.
The iov.iov_base buffer content (not pointer!) and iov.iov_len can also change between calls.
(Less likely but still possible).
Is it OK to change ifinex and iov_base content between calsl at high frequency in non blocking mode ? (unless I get back EAGAIN or EWOULDBLOCK)
Thanks !
Yes, it's safe. On Linux, all the data you specify gets immediately copied into a buffer in the kernel, before send returns. If the kernel's buffer is full, it returns EAGAIN or EWOULDBLOCK (which are the same thing in Linux, apparently) and nothing happens. You don't have to worry that the kernel will go and send the packet later after you've changed the data in the buffer.
On Windows, non-blocking "overlapped" operations do remember your buffer and use it later - so watch out for that if you ever do non-blocking I/O on Windows. (You'll know if you do, because it's totally different from blocking I/O)
Posix supports blocking and non-blcoking file descriptors. Second ones may be opened with O_NONBLOCK flag. I have a main loop in my app, which polls some set (poll sys call) of file descriptors for POLLIN and POLLOUT events. May I still use blocking file descriptors, cause I write only when POLLOUT is set and read only when POLLIN is set?
Accroding to poll(2) man page:
POLLOUT Writing is now possible, though a write larger that the available space in a socket or pipe will still block (unless O_NONBLOCK is set).
In other words: if there is not enough space in kernel buffer associated with this fd, writing a chunk of data, larger than space available in buffer would block. If there is space available they behave identically.
So you must set all your file descriptors to be non-blocking, especially TCP sockets, cause if the process on the other side has slow connection you may face blocking write call, until client won't send you back all ACKs for every IP package.
I'm writing a TCP proxy, using edge-triggered epoll to monitor fd, splice to transmit data. Here is the problem:
How do I know the socket receive buffer is empty?
For example, if you call read(2) by asking to read a certain amount of data and read(2) returns a lower number of bytes, you can be sure of having exhausted the read I/O space for the file descriptor.
But I found that even splice(sock, 0, pfd[1], 0, 65536, SPLICE_F_NONBLOCK) < 65536 may sometimes lead to starvation.
O_NONBLOCK enabled, n > PIPE_BUF
If the pipe is full, then write(2) fails, with errno set to EAGAIN. Otherwise, from 1 to n bytes may be written (i.e., a "partial write" may occur; the caller should check the return value from write(2) to see how many bytes were actually written), and these bytes may be interleaved with writes by other processes.
So I should repeat calling splice till EAGAIN? But how can I know whether the socket receive buffer is empty or the pipe buffer is full?
Maybe you can use getsockopt syscall with SO_ERROR, and then you will known which socket is really EAGAIN, and then use epoll to watch the read/write event of that socket.
I also have this problem when adding reverse http proxy to my web server, I deem it should work, though I'm not sure if it is the best solution.
The goal is to read data from a socket without blocking. The Linux manual page says:
The receive calls normally return any data available, up to the
requested amount, rather than waiting for receipt of the full amount
requested.
Does it mean that I don't have to pass MSG_DONTWAIT flag to recv() after polling the socket descriptor with select()/poll()/epoll()?
The behaviour of recv/read depends on the characteristics of the socket itself. If the socket is marked as non-blocking, these calls should immediately return EAGAIN/EWOULDBLOCK rather than blocking the process.
The socket can be marked as non-blocking prior to reading from it, usually via fcntl or ioctl.
What this excerpt from the manual says is that, basically, reads on both blocking and non-blocking sockets are not required to fill the whole buffer that is supplied. That is why it is important to check the result of the recv/read calls in order to know how much of the buffer contains the actual data and how much is garbage.
It is not a good idea at all to use blocking sockets in conjunction with the IO polling calls such as select/poll/epoll. Even if the polling call indicates that a particular socket is ready for reading, a blocking socket would sometimes still block.
In Linux if we call blocking recv from one thread and close for the same socket from another thread, recv doesn't exit.
Why?
The "why" is simply that that's how it works, by design.
Within the kernel, the recv() call has called fget() on the struct file corresponding to the file descriptor, and this will prevent it from being deallocated until the corresponding fput().
You will simply have to change your design (your design is inherently racy anyway - for this to happen, you must have no locking protecting the file descriptor in userspace, which means that the close() could have happened just before the recv() call - and the file descriptor even been reused for something else).
If you want to wake up another thread that's blocking on a file descriptor, you should have it block on select() instead, with a pipe included in the file descriptor set that can be written to by the main thread.
Check that all file descriptors for the socket have been closed. If any remain open at the "remote end" (assuming this is the one you attempt to close), the "peer has not performed an orderly shutdown".
If this still doesn't work, call shutdown(sock, SHUT_RDWR) on the remote end, this will shut the socket down regardless of reference counts.