I'm looking for solution for this kind of problem. I want to monitor changes on a procfs file with select (I want to use select, not i_notify, because I watch another descriptor for a socket).
I've tried with something like this:
fd1 = open("/proc/my_file", O_RDONLY, 0);
FD_ZERO(&rfds);
FD_SET(fd1, &rfds);
tv.tv_sec = 500;
tv.tv_usec = 0;
retval = select(fd1+1, &rfds, NULL, NULL, &tv);
This is wrong, because file is always ready to read.
How can I be notified if there was any change in the file with select?
select(2) does not report on "changes", instead "ready to read". From the OpenGroup select() manpage:
File descriptors associated with regular files always select true for ready to read, ready to write, and error conditions.
Kernel-provided files are a little strange -- they're not quite "regular files" -- but select(2) is not the tool to determine changes in these files.
If you want to spot changes, then you must use inotify(7). Though I really wouldn't be surprised if not all files in procfs(5) use this mechanism -- many might not know when their underlying data changes. Consider /proc/loadavg -- it might change every single time you read it, but there's no real point making those changes visible via inotify(7).
If you want to know if there is any change in a file descriptor, you could use the library libevent.
libevent is an asynchronous event notification software library. The libevent API provides a mechanism to execute a callback function when
a specific event occurs on a file descriptor or after a timeout has
been reached. Furthermore, libevent also supports callbacks due to
signals or regular timeouts.
Currently, libevent supports /dev/poll, kqueue(2), select(2), poll(2),
epoll(4) and Solaris's event ports
Or you could also use epoll.
epoll is a scalable I/O event notification mechanism for Linux, first introduced in Linux 2.5.44 1. It is meant to replace the older POSIX select(2) and poll(2) system calls, to achieve better performance in more demanding applications, where the number of watched file descriptors is large (unlike the older system calls, which operate at O(n), epoll operates in O(1) [2]). epoll is similar to FreeBSD's kqueue, in that it operates on a configurable kernel object, exposed to user space as a file descriptor of its own.
Example of using epoll
Related
Some Unix code I am working on depends on being able to poll over a small number of pipes. poll is a POSIX system call that (much like the older select) allows the process to wait until one or more file descriptors is "ready" for reading or writing, which means one can proceed to do so without blocking. This is useful to implement event loops where waiting is clearly separated from the rest of the communication.
Is it possible to do the same for Windows pipe handles - wait for one or more of them to become "ready" for reading/writing?
Existing SO advice on the matter, such as answers to this question, recommend the use of completion ports. However as far as I can tell, completion ports require initiating reading/writing beforehand, and then waiting for (or being notified of) the completion of those operations. This approach does not fit the architecture of the code, which strongly separates the polling code from the reading/writing code, the latter calling into a library that uses the regular ReadFile and WriteFile on the underlying handle.
If there is no direct equivalent to poll, could one abuse completion ports to provide something similar? In other words, is it possible to create IO completion events that announce "you can now call ReadFile (WriteFile) on this handle without it blocking" and wait for them using WaitForMultipleObjects or GetQueuedCompletionStatus?
I'm new to linux programming and not entirely familiar with all the synchronization facilities so I'd like to ask more knowledgeable people how they might go about solving this problem.
I have a single thread that I would like to run through a loop. The stopping point in the loop will be a read operation on a socket. I want the read operation to block for some period of time and then timeout. However, I need a way unblock the thread from the read, if some event needs attention. The "event" could be any one of a number of different things so I need some way to tell the thread what cause the read to unblock.
I know that you can unblock a blocked read with a signal but I'm not sure how that's done.
See the select() system call.
This is especially useful for waiting for multiple file channels.
You can set timeout of socket operation. Example:
struct timeval timeout;
timeout.tv_sec = TIMEOUT_SEC;
timeout.tv_usec = TIMEOUT_MSEC;
setsockopt(sock_fd, SOL_SOCKET, SO_RCVTIMEO, &timeout, sizeof(timeout));
/* now receive msg */
recvmsg(sock_fd, &msg, 0);
When you want to make your socket blocking, do:
timeout.tv_sec = 0;
timeout.tv_usec = 0;
setsockopt(sock_fd, SOL_SOCKET, SO_RCVTIMEO, &timeout, sizeof(timeout));
epoll seems to be the way to go:
The epoll API performs a similar task to poll(2): monitoring multiple
file descriptors to see if I/O is possible on any of them. The epoll
API can be used either as an edge-triggered or a level-triggered inter‐
face and scales well to large numbers of watched file descriptors. The
following system calls are provided to create and manage an epoll
instance:
man epoll for more info. You might want to see "Example for Suggested Usage" section on manual.
See also epoll vs select
Sounds like you want to use select() as others have mentioned, but you also want a way to interrupt it when a "message" of some sort is available. A typical way of interrupting a select() is to use the self pipe trick. Basically you create a pipe() and also select() on the read file descriptor of the pipe. When a message arrives in the queue maintained by your program, write a byte to the pipe. This will cause your select call to return and you'll be able to check to see if your pipe is ready for reading. If it is then you know you have a message to process (whatever that is in your context), so you process it and then go back to select(). Better yet, you could have your pipe actually be your message queue. If you just use the pipe as a way to signal that messages are on your queue, make sure you actually read() the bytes out of your pipe each time through, or it will fill up eventually and block you from writing more notifications to it.
Although, as others have mentioned, why not just have one thread service your queue and do your writes to the socket, while another thread does the reads? Probably a lot simpler.
Perhaps these two libraries may be of use to you:
libev
libuv
They both use the event-driven paradigm on one or more threads (if so desired). Of course, you can implement your own event-driven framework using already mentioned APIs and conditional variables, but that might be more work than necessary.
From epoll's man page:
epoll is a variant of poll(2) that can be used either as an edge-triggered
or a level-triggered interface
When would one use the edge triggered option? The man page gives an example that uses it, but I don't see why it is necessary in the example.
When an FD becomes read or write ready, you might not necessarily want to read (or write) all the data immediately.
Level-triggered epoll will keep nagging you as long as the FD remains ready, whereas edge-triggered won't bother you again until the next time you get an EAGAIN (so it's more complicated to code around, but can be more efficient depending on what you need to do).
Say you're writing from a resource to an FD. If you register your interest for that FD becoming write ready as level-triggered, you'll get constant notification that the FD is still ready for writing. If the resource isn't yet available, that's a waste of a wake-up, because you can't write any more anyway.
If you were to add it as edge-triggered instead, you'd get notification that the FD was write ready once, then when the other resource becomes ready you write as much as you can. Then if write(2) returns EAGAIN, you stop writing and wait for the next notification.
The same applies for reading, because you might not want to pull all the data into user-space before you're ready to do whatever you want to do with it (thus having to buffer it, etc etc). With edge-triggered epoll you get told when it's ready to read, and then can remember that and do the actual reading "as and when".
In my experiments, ET doesn't guarantee that only one thread wakes up, although it often wakes up only one. The EPOLLONESHOT flag is for this purpose.
Level triggered
Use level trigger mode when you can't consume all the data in the FD and want epoll to keep triggering while data is available.
For example, if you want to receive large files from FD, and you cannot consume all the file data from the FD at one time, and want to keep the triggering continue for the next consumption. The level trigger mode could be suitable for this case.
Disadvantage
thundering herd
The EPOLLEXCLUSIVE directive is meant to prevent the thundering heard phenomenon
less efficiency
When a read/write event occurs on the monitored file descriptor, epoll_wait() notifies the handler to read or write. If you don’t read or write all the data at once (e.g., the read/write buffer is too small), then the next time epoll_wait() is called, it will notify you to continue reading or writing on the file descriptor you didn’t finish reading or writing on, but of course, if you never read or write, it will keep notifying you.
If the system has a large number of ready file descriptors that you don’t need to read or write, and they return every time, this can greatly reduce the efficiency of the handler retrieving the ready file descriptors it cares about.
use cases
redis epoll Since the IO thread of Redis is single-threaded, level trigger mode is used.
Edge triggered
Use edge triggered mode and make sure all data available is buffered and will be handled eventually.
As Chris Dodd mentioned in the comments
ET is also particularly nice with a multithreaded server on a multicore machine. You can run one thread per core and have all of them call epoll_wait on the same FD. When data comes in on an FD, exactly one thread will be woken to handle it
use cases
nginx epoll model
golang netpoll
Can anyone tell me the use and application of select function in socket programming in c?
The select() function allows you to implement an event driven design pattern, when you have to deal with multiple event sources.
Let's say you want to write a program that responds to events coming from several event sources e.g. network (via sockets), user input (via stdin), other programs (via pipes), or any other event source that can be represented by an fd. You could start separate threads to handle each event source, but you would have to manage the threads and deal with concurrency issues. The other option would be to use a mechanism where you can aggregate all the fd into a single entity fdset, and then just call a function to wait on the fdset. This function would return whenever an event occurs on any of the fd. You could check which fd the event occurred on, read that fd, process the event, and respond to it. After you have done that, you would go back and sit in that wait function - till another event on some fd arrives.
select facility is such a mechanism, and the select() function is the wait function. You can find the details on how to use it in any number of books and online resources.
The select function allows you to check on several different sockets or pipes (or any file descriptors at all if you are not on Windows), and do something based on whichever one is ready first. More specifically, the arguments for the select function are split up into three groups:
Reading: When any of the file descriptors in this category are ready for reading, select will return them to you.
Writing: When any of the file descriptors in this category are ready for writing, select will return them to you.
Exceptional: When any of the file descriptors in this category have an exceptional case -- that is, they close uncleanly, a connection breaks or they have some other error -- select will return them to you.
The power of select is that individual file/socket/pipe functions are often blocking. Select allows you to monitor the activity of several different file descriptors without having to have a dedicated thread of your program to each function call.
In order for you to get a more specific answer, you will probably have to mention what language you are programming in. I have tried to give as general an answer as possible on the conceptual level.
select() is the low-tech way of polling sockets for new data to read or for an open TCP window to write. Unless there's some compelling reason not to, you're probably better off using poll(), or epoll_wait() if your platform has it, for better performance.
I like description at gnu.org:
Sometimes a program needs to accept input on multiple input channels whenever input arrives. For example, some workstations may have devices such as a digitizing tablet, function button box, or dial box that are connected via normal asynchronous serial interfaces; good user interface style requires responding immediately to input on any device. [...]
You cannot normally use read for this purpose, because this blocks the program until input is available on one particular file descriptor; input on other channels won’t wake it up. You could set nonblocking mode and poll each file descriptor in turn, but this is very inefficient.
A better solution is to use the select function. This blocks the program until input or output is ready on a specified set of file descriptors, or until a timer expires, whichever comes first.
Per the documentation for Linux manpages and MSDN for Windows,
select() and pselect() allow a program to monitor multiple file
descriptors, waiting until one or more of the file descriptors become
"ready" for some class of I/O operation (e.g., input possible). A file
descriptor is considered ready if it is possible to perform the
corresponding I/O operation (e.g., read(2)) without blocking.
For simple explanation: often it is required for an application to do multiple things at once. For example you may access multiple sites in a web browser, a web server may want to serve multiple clients simultaneously. One needs a mechanism to monitor each socket so that the application is not busy waiting for one communication to complete.
An example: imagine downloading a large Facebook page on your smart phone whilst traveling on a train. Your connection is intermittent and slow, the web server should be able to process other clients when waiting for your communication to finish.
select(2) - Linux man page
select Function - Winsock Functions
I do not understand what the difference is between calling recv() on a non-blocking socket vs a blocking socket after waiting to call recv() after select returns that it is ready for reading. It would seem to me like a blocking socket will never block in this situation anyway.
Also, I have heard that one model for using non blocking sockets is try to make calls (recv/send/etc) on them after some amount of time has passed instead of using something like select. This technique seems slow and wasteful to be compared to using something like select (but then I don't get the purpose of non-blocking at all as described above). Is this common in networking programming today?
There's a great overview of all of the different options for doing high-volume I/O called The C10K Problem. It has a fairly complete survey of a lot of the different options, at least as of 2006.
Quoting from it, on the topic of using select on non-blocking sockets:
Note: it's particularly important to remember that readiness notification from the kernel is only a hint; the file descriptor might not be ready anymore when you try to read from it. That's why it's important to use nonblocking mode when using readiness notification.
And yes, you could use non-blocking sockets and then have a loop that waits if nothing is ready, but that is fairly wasteful compared to using something like select or one of the more modern replacements (epoll, kqueue, etc). I can't think of a reason why anyone would actually want to do this; all of the select like options have the ability to set a timeout, so you can be woken up after a certain amount of time to perform some regular action. I suppose if you were doing something fairly CPU intensive, like running a video game, you may want to never sleep but instead keep computing, while periodically checking for I/O using non-blocking sockets.
The select, poll, epoll, kqueue, etc. facilities target multiple socket/file descriptor handling scenarios. Imagine a heavy loaded web-server with hundreds of simultaneously connected sockets. How would you know when to read and from what socket without blocking everything?
If you call read on a non-blocking socket, it will return immediately if no data has been received since the last call to read. If you only had read, and you wanted to wait until there was data available, you would have to busy wait. This wastes CPU.
poll and select (and friends) allow you to sleep until there's data to read (or write, or a signal has been received, etc.).
If the only thing you're doing is sending and receiving on that socket, you might as well just use a non-blocking socket. Being asynchronous is important when you have other things to do in the meantime, such as update a GUI or handle other sockets.
For your first question, there's no difference in that scenario. The only difference is what they do when there is nothing to be read. Since you're checking that before calling recv() you'll see no difference.
For the second question, the way I see it done in all the libraries is to use select, poll, epoll, kqueue for testing if data is available. The select method is the oldest, and least desirable from a performance standpoint (particularly for managing large numbers of connections).