Equivalent of select or poll for pipes on Windows - c

Some Unix code I am working on depends on being able to poll over a small number of pipes. poll is a POSIX system call that (much like the older select) allows the process to wait until one or more file descriptors is "ready" for reading or writing, which means one can proceed to do so without blocking. This is useful to implement event loops where waiting is clearly separated from the rest of the communication.
Is it possible to do the same for Windows pipe handles - wait for one or more of them to become "ready" for reading/writing?
Existing SO advice on the matter, such as answers to this question, recommend the use of completion ports. However as far as I can tell, completion ports require initiating reading/writing beforehand, and then waiting for (or being notified of) the completion of those operations. This approach does not fit the architecture of the code, which strongly separates the polling code from the reading/writing code, the latter calling into a library that uses the regular ReadFile and WriteFile on the underlying handle.
If there is no direct equivalent to poll, could one abuse completion ports to provide something similar? In other words, is it possible to create IO completion events that announce "you can now call ReadFile (WriteFile) on this handle without it blocking" and wait for them using WaitForMultipleObjects or GetQueuedCompletionStatus?

Related

What is the purpose of epoll's edge triggered option?

From epoll's man page:
epoll is a variant of poll(2) that can be used either as an edge-triggered
or a level-triggered interface
When would one use the edge triggered option? The man page gives an example that uses it, but I don't see why it is necessary in the example.
When an FD becomes read or write ready, you might not necessarily want to read (or write) all the data immediately.
Level-triggered epoll will keep nagging you as long as the FD remains ready, whereas edge-triggered won't bother you again until the next time you get an EAGAIN (so it's more complicated to code around, but can be more efficient depending on what you need to do).
Say you're writing from a resource to an FD. If you register your interest for that FD becoming write ready as level-triggered, you'll get constant notification that the FD is still ready for writing. If the resource isn't yet available, that's a waste of a wake-up, because you can't write any more anyway.
If you were to add it as edge-triggered instead, you'd get notification that the FD was write ready once, then when the other resource becomes ready you write as much as you can. Then if write(2) returns EAGAIN, you stop writing and wait for the next notification.
The same applies for reading, because you might not want to pull all the data into user-space before you're ready to do whatever you want to do with it (thus having to buffer it, etc etc). With edge-triggered epoll you get told when it's ready to read, and then can remember that and do the actual reading "as and when".
In my experiments, ET doesn't guarantee that only one thread wakes up, although it often wakes up only one. The EPOLLONESHOT flag is for this purpose.
Level triggered
Use level trigger mode when you can't consume all the data in the FD and want epoll to keep triggering while data is available.
For example, if you want to receive large files from FD, and you cannot consume all the file data from the FD at one time, and want to keep the triggering continue for the next consumption. The level trigger mode could be suitable for this case.
Disadvantage
thundering herd
The EPOLLEXCLUSIVE directive is meant to prevent the thundering heard phenomenon
less efficiency
When a read/write event occurs on the monitored file descriptor, epoll_wait() notifies the handler to read or write. If you don’t read or write all the data at once (e.g., the read/write buffer is too small), then the next time epoll_wait() is called, it will notify you to continue reading or writing on the file descriptor you didn’t finish reading or writing on, but of course, if you never read or write, it will keep notifying you.
If the system has a large number of ready file descriptors that you don’t need to read or write, and they return every time, this can greatly reduce the efficiency of the handler retrieving the ready file descriptors it cares about.
use cases
redis epoll Since the IO thread of Redis is single-threaded, level trigger mode is used.
Edge triggered
Use edge triggered mode and make sure all data available is buffered and will be handled eventually.
As Chris Dodd mentioned in the comments
ET is also particularly nice with a multithreaded server on a multicore machine. You can run one thread per core and have all of them call epoll_wait on the same FD. When data comes in on an FD, exactly one thread will be woken to handle it
use cases
nginx epoll model
golang netpoll

Removing a handle from a I/O completion port and other questions about IOCP

The CreateIoCompletionPort function allows the creation of a new I/O completion port and the registration of file handles to an existing I/O completion port.
Then, I can use any function, like a recv on a socket or a ReadFile on a file with a OVERLAPPED structure to start an asynchronous operation.
I have to check whether the function call returned synchronously although it was called with an OVERLAPPED structure and in this case handle it directly. In the other case, when ERROR_IO_PENDING is returned, I can use the GetQueuedCompletionStatus function to be notified when the operation completes.
The question which arise are:
How can I remove a handle from the I/O completion port? For example, when I add sockets to the IOCP, how can I remove closed ones? Should I just re-register another socket with the same completion key?
Also, is there a way to make the calls ALWAYS go over the I/O completion port and don't return synchronously?
And finally, is it possible for example to recv asynchronously but to send synchronously? For example when a simple echo service is implemented: Can I wait with an asynchronous recv for new data but send the response in a synchronous way so that code complexity is reduced? In my case, I wouldn't recv a second time anyways before the first request was processed.
What happens if an asynchronous ReadFile has been requested, but before it completes, a WriteFile to the same file should be processed. Will the ReadFile be cancelled with an error message and I have to restart the read process as soon as the write is complete? Or do I have to cancel the ReadFile manually before writing? This question arises in combination with a communication device; so, the write and read should not do problems if happening concurrently.
How can I remove a handle from the I/O completion port?
In my experience you can't disassociate a handle from a completion port. However, you may disable completion port notification by setting the low-order bit of your OVERLAPPED structure's hEvent field: See the documentation for GetQueuedCompletionStatus.
For example, when I add sockets to the IOCP, how can I remove closed ones? Should I just re-register another socket with the same completion key?
It is not necessary to explicitly disassociate a handle from an I/O completion port; closing the handle is sufficient. You may associate multiple handles with the same completion key; the best way to figure out which request is associated with the I/O completion is by using the OVERLAPPED structure. In fact, you may even extend OVERLAPPED to store additional data.
Also, is there a way to make the calls ALWAYS go over the I/O completion port and don't return synchronously?
That is the default behavior, even when ReadFile/WriteFile returns TRUE. You must explicitly call SetFileCompletionNotificationModes to tell Windows to not enqueue a completion packet when TRUE and ERROR_SUCCESS are returned.
is it possible for example to recv asynchronously but to send synchronously?
Not by using recv and send; you need to use functions that accept OVERLAPPED structures, such as WSARecv, WSASend, or alternatively ReadFile and WriteFile. It might be more handy to use the latter if your code is meant to work multiple types of I/O handles, such as both sockets and named pipes. Those functions provide a synchronous mode, so if you use those them you can mix asynchronous and synchronous calls.
What happens if an asynchronous ReadFile has been requested, but before it completes, a WriteFile to the same file should be processed?
There is no implicit cancellation. As long as you're using separate OVERLAPPED structures for each read/write to a full-duplex device, I see no reason why you can't do concurrent I/O operations.
As I’ve already pointed out there, the commonly held belief that it is impossible to remove handles from completion ports is wrong, probably caused by the abscence of any hint whatsoever on how to do this from nearly all documentation I could find. Actually, it’s pretty easy:
Call NtSetInformationFile with the FileReplaceCompletionInformationenumerator value for FileInformationClass and a pointer to a FILE_COMPLETION_INFORMATION structure for the FileInformation parameter. In this structure, set the Port member to NULL (or nullptr, in C++) to disassociate the file from the port it’s currently attached to (I guess if it isn’t attached to any port, nothing would happen),
or set Port to a valid HANDLE to another completion port to associate the file with that one instead.
First some important corrections.
In case the overlapped I/O operation completes immediately (ReadFile or similar I/O function returns success) - the I/O completion is already scheduled to the IOCP.
Also, according to your questions I think you confuse between the file/socket handles, and the specific I/O operations issued on them.
Now, regarding your questions:
AFAIK there is no conventional way to remove a file/socket handle from the IOCP (usually you just don't have to do this). You talk about removing closed handles from the IOCP, which is absolutely incorrect. You can't remove a closed handle, because it does not reference a valid kernel object anymore!
A more correct question should be how the file/socket should be properly closed. The answer is: just close your handle. All the outstanding I/O operations (issued on this handle) will return soon with an error code (abortion). Then, in your completion routine (the one that calls GetQueuedCompletionStatus in a loop) should perform the per-I/O needed cleanup.
As I've already said, all the I/O completion arrives at IOCP in both synchronous and asynchronous cases. The only situation where it does not arrive at IOCP is when an I/O completes synchronously with an error. Anyway, if you want a unified processing - in such a case you may post an artificial completion data to IOCP (use PostQueuedCompletionStatus).
You should use WSASend and WSARecv (not recv and send) for overlapped I/O. Nevertheless, even of the socket was opened with flag WSA_FLAG_OVERLAPPED - you are allowed to call the I/O functions without specifying the OVERLAPPED structure. In such a case those functions work synchronously.
So that you may decide on synchronous/asynchronous modes for every function call.
There is no problem to mix overlapped read/write requests. The only delicate point here is what happens if you try to read the data from the file position where you're currently writing to. The result may depend on subtle things, such as order of completion of I/Os by the hardware, some PC timing parameters and etc. Such a situation should be avoided.
How can I remove a handle from the I/O completion port? For example, when I add sockets to the IOCP, how can I remove closed ones? Should I just re-register another socket with the same completion key?
You've got it the wrong way around. You set the I/O completion port to be used by a file object - when the file object is deleted, you have nothing to worry about. The reason you're getting confused is because of the way Win32 exposes the underlying native API functionality (CreateIoCompletionPort does two very different things in one function).
Also, is there a way to make the calls
ALWAYS go over the I/O completion port
and don't return synchronously?
This is how it's always been. Only starting with Windows Vista can you customize how the completion notifications are handled.
What happens if an asynchronous
ReadFile has been requested, but
before it completes, a WriteFile to
the same file should be processed.
Will the ReadFile be cancelled with an
error message and I have to restart
the read process as soon as the write
is complete?
I/O operations in Windows are asynchronous inherently, and requests are always queued. You may not think this is so because you have to specify FILE_FLAG_OVERLAPPED in CreateFile to turn on asynchronous I/O. However, at the native layer, synchronous I/O is really an add-on, convenience thing where the kernel keeps track of the file position for you and waits for the I/O to complete before returning.

Select function in socket programming

Can anyone tell me the use and application of select function in socket programming in c?
The select() function allows you to implement an event driven design pattern, when you have to deal with multiple event sources.
Let's say you want to write a program that responds to events coming from several event sources e.g. network (via sockets), user input (via stdin), other programs (via pipes), or any other event source that can be represented by an fd. You could start separate threads to handle each event source, but you would have to manage the threads and deal with concurrency issues. The other option would be to use a mechanism where you can aggregate all the fd into a single entity fdset, and then just call a function to wait on the fdset. This function would return whenever an event occurs on any of the fd. You could check which fd the event occurred on, read that fd, process the event, and respond to it. After you have done that, you would go back and sit in that wait function - till another event on some fd arrives.
select facility is such a mechanism, and the select() function is the wait function. You can find the details on how to use it in any number of books and online resources.
The select function allows you to check on several different sockets or pipes (or any file descriptors at all if you are not on Windows), and do something based on whichever one is ready first. More specifically, the arguments for the select function are split up into three groups:
Reading: When any of the file descriptors in this category are ready for reading, select will return them to you.
Writing: When any of the file descriptors in this category are ready for writing, select will return them to you.
Exceptional: When any of the file descriptors in this category have an exceptional case -- that is, they close uncleanly, a connection breaks or they have some other error -- select will return them to you.
The power of select is that individual file/socket/pipe functions are often blocking. Select allows you to monitor the activity of several different file descriptors without having to have a dedicated thread of your program to each function call.
In order for you to get a more specific answer, you will probably have to mention what language you are programming in. I have tried to give as general an answer as possible on the conceptual level.
select() is the low-tech way of polling sockets for new data to read or for an open TCP window to write. Unless there's some compelling reason not to, you're probably better off using poll(), or epoll_wait() if your platform has it, for better performance.
I like description at gnu.org:
Sometimes a program needs to accept input on multiple input channels whenever input arrives. For example, some workstations may have devices such as a digitizing tablet, function button box, or dial box that are connected via normal asynchronous serial interfaces; good user interface style requires responding immediately to input on any device. [...]
You cannot normally use read for this purpose, because this blocks the program until input is available on one particular file descriptor; input on other channels won’t wake it up. You could set nonblocking mode and poll each file descriptor in turn, but this is very inefficient.
A better solution is to use the select function. This blocks the program until input or output is ready on a specified set of file descriptors, or until a timer expires, whichever comes first.
Per the documentation for Linux manpages and MSDN for Windows,
select() and pselect() allow a program to monitor multiple file
descriptors, waiting until one or more of the file descriptors become
"ready" for some class of I/O operation (e.g., input possible). A file
descriptor is considered ready if it is possible to perform the
corresponding I/O operation (e.g., read(2)) without blocking.
For simple explanation: often it is required for an application to do multiple things at once. For example you may access multiple sites in a web browser, a web server may want to serve multiple clients simultaneously. One needs a mechanism to monitor each socket so that the application is not busy waiting for one communication to complete.
An example: imagine downloading a large Facebook page on your smart phone whilst traveling on a train. Your connection is intermittent and slow, the web server should be able to process other clients when waiting for your communication to finish.
select(2) - Linux man page
select Function - Winsock Functions

Asynchronous File I/O using threads in C

I'm trying to understand how asynchronous file operations being emulated using threads. I've found next-to-nothing materials to read about the subject.
Is it possible that:
a process uses a thread to open a regular file (HDD).
the parent gets the file descriptor from the thread, now it may close the thread.
the parent uses the file descriptor with a new thread, reading X bytes from the file.
the parent gets the file descriptor with the seek-position of the current file state.
the parent may repeat these operations, without the need to open, or seek, every time it wishes to "continue" reading a new chunk of the file?
This is just a wild guess of mine, would appreciate if anybody mind to shed more light to clarify how it's being emulated efficiently.
UPDATE:
By efficient I actually mean that I don't want the thread to "wait" since the moment the file been opened. Think of a HTTP non-blocking daemon which serves a client with a huge file, you want to use the thread to read chunks of the file without blocking the daemon - but you don't want to keep the thread busy while "waiting" for the actual transfer to take place, you want to use the thread for other blocking operations of other clients.
To understand asynchronous I/O better, it may be helpful to think in terms of overlapping operation. That is, the number of pending operations (operations that have been started but not yet completed) can simutaneously go above one.
A diagram that explains asynchronous I/O might look like this: http://msdn.microsoft.com/en-us/library/aa365683(VS.85).aspx
If you are using the asynchronous I/O capabilities provided by the underlying Operating System, then it is possible to asynchronously read from multiple files without spawning a equal number of threads.
If your underlying Operating System does not provide asynchronous I/O, or if you decide not to use it, in other words, you wish to emulate asynchronous operation by only using blocking I/O (the regular Read/Write provided by the Operating System) then it is necessary to spawn as many threads as the number of simutaneous I/O operations. This is because when a thread is making a function call to blocking I/O, the thread cannot continue its execution until the operation finishes. In order to start another blocking I/O operation, that operation has to be issued from another thread that is not already occupied.
When you open/create a file fire up a thread. Now store that thread id/ptr as your file handle.
Basically the thread will do nothing except sit in a loop waiting for an "event". A semaphore would be good here. When you want to do a read then you add the read command to a queue (remember to critical section the stack add), return a unique id, and then you increment the semaphore. If the thread is asleep it will now wake up and grab the first message off the queue and process it. When it has completed you remove the command from the queue.
To poll if a file read has completed you can, simply, check to see if its in the command queue. If its not there then the command has completed.
Furthermore if you want to allow synchronous reads as well then you can wait after sending the message through for an "event" to get triggered by the completion. You then check to see if the unique id is the queue and if it isn't you return control. If it still is then you go back to a wait state until the relevant unique id has been processed.

Nonblocking sockets with Select

I do not understand what the difference is between calling recv() on a non-blocking socket vs a blocking socket after waiting to call recv() after select returns that it is ready for reading. It would seem to me like a blocking socket will never block in this situation anyway.
Also, I have heard that one model for using non blocking sockets is try to make calls (recv/send/etc) on them after some amount of time has passed instead of using something like select. This technique seems slow and wasteful to be compared to using something like select (but then I don't get the purpose of non-blocking at all as described above). Is this common in networking programming today?
There's a great overview of all of the different options for doing high-volume I/O called The C10K Problem. It has a fairly complete survey of a lot of the different options, at least as of 2006.
Quoting from it, on the topic of using select on non-blocking sockets:
Note: it's particularly important to remember that readiness notification from the kernel is only a hint; the file descriptor might not be ready anymore when you try to read from it. That's why it's important to use nonblocking mode when using readiness notification.
And yes, you could use non-blocking sockets and then have a loop that waits if nothing is ready, but that is fairly wasteful compared to using something like select or one of the more modern replacements (epoll, kqueue, etc). I can't think of a reason why anyone would actually want to do this; all of the select like options have the ability to set a timeout, so you can be woken up after a certain amount of time to perform some regular action. I suppose if you were doing something fairly CPU intensive, like running a video game, you may want to never sleep but instead keep computing, while periodically checking for I/O using non-blocking sockets.
The select, poll, epoll, kqueue, etc. facilities target multiple socket/file descriptor handling scenarios. Imagine a heavy loaded web-server with hundreds of simultaneously connected sockets. How would you know when to read and from what socket without blocking everything?
If you call read on a non-blocking socket, it will return immediately if no data has been received since the last call to read. If you only had read, and you wanted to wait until there was data available, you would have to busy wait. This wastes CPU.
poll and select (and friends) allow you to sleep until there's data to read (or write, or a signal has been received, etc.).
If the only thing you're doing is sending and receiving on that socket, you might as well just use a non-blocking socket. Being asynchronous is important when you have other things to do in the meantime, such as update a GUI or handle other sockets.
For your first question, there's no difference in that scenario. The only difference is what they do when there is nothing to be read. Since you're checking that before calling recv() you'll see no difference.
For the second question, the way I see it done in all the libraries is to use select, poll, epoll, kqueue for testing if data is available. The select method is the oldest, and least desirable from a performance standpoint (particularly for managing large numbers of connections).

Resources