Non-blocking access to the file system - c

When writing a non-blocking program (handling multiple sockets) which at a certain point needs to open files using open(2), stat(2) files or open directories using opendir(2), how can I ensure that the system calls do not block?
To me it seems that there's no other alternative than using threads or fork(2).

As Mel Nicholson replied, for everything file descriptor based you can use select/poll/epoll. For everything else you can have a proxy thread-per-item (or a thread pool) with the small stack that would convert (by means of the kernel scheduler) any synchronous blocking waits to select/poll/epoll-able asynchronous events using eventfd or a unix pipe (where portability is required).
The proxy thread shall block till the operation completes and then write to the eventfd or to the pipe to wake up the select/poll/epoll.

Indeed there is no other method.
Actually there is another kind of blocking that can't be dealt with other than by threads and that is page faults. Those may happen in program code, program data, memory allocation or data mapped from files. It's almost impossible to avoid them (actually you can lock some pages to memory, but it's privileged operation and would probably backfire by making the kernel do a poor job of memory management somewhere else). So:
You can't really weed out every last chance of blocking for a particular client, so don't bother with the likes of open and stat. The network will probably add larger delays than these functions anyway.
For optimal performance you should have enough threads so some can be scheduled if the others are blocked on page fault or similar difficult blocking point.
Also if you need to read and process or process and write data during handling a network request, it's faster to access the file using memory-mapping, but that's blocking and can't be made non-blocking. So modern network servers tend to stick with the blocking calls for most stuff and simply have enough threads to keep the CPU busy while other threads are waiting for I/O.
The fact that most modern servers are multi-core is another reason why you need multiple threads anyway.

You can use the poll( ) command to check any number of sockets for data using a single thread.
See here for linux details, or man poll for the details on your system.
open( ) and stat( ) will block in the thread they are called from in all POSIX compliant systems unless called via an asynchronous tactic (like in a fork)

Related

Why IOCP is used?

I am trying to understand why IOCP is used. I can think of two reasons:
Since WSARecv() will not block, then I can handle 1000s of clients without having to create a new thread for each client (also, there is a limit on how many threads you can create, and so the number of clients you can handle will be limited).
Since WSASend() will not block, then when I want to send a large file, I don't have to create a new thread to send it (if I did not create a new thread then the UI thread will block of course).
What other reasons are there to use IOCP?
IOCP has the benefits that you mention but that is not exclusive to IOCP. I'm not that familiar with the native socket APIs but some Win32 APIs have "overlapped IO" which is asynchronous but does not require IOCP.
Another benefit is that with IOCP the number of request serving threads is (kind of) optimized by the kernel. The kernel is aware of all blocking that request serving threads do and it will see to it that there are enough, and not more, threads unblocked at all times so that the CPU is well-utilized. Ideally, you would never block and there would be as many threads as there are cores (assuming 100% load). That would be very efficient.
IOCP also helps to reduce context switching because instead of switching to another thread to process the results of an IO an existing thread that is busy already simply calls GetQueuedCompletionStatus again.
GetQueuedCompletionStatusEx can be used to reduce the number of transitions to the kernel because you can dequeue multiple IOs in one call.
Also, it cuts down on avoidable bulk data copying and protection ring cycles. Instead of the kernel having to copy data from the network stack buffers into a user-space buffer when requested by a recv() call, user-space buffers are supplied by WSARecv() and the stack can then load them directly in kernel space.

Notifying user-mode as soon as a packet arrives

(This is for a low latency system)
Assuming I have some code which transfers received UDP packets to a region of shared memory, how can I then notify the application (in user mode) that it is now time to read the shared memory? I do not want the application continuously polling eating up cpu cycles.
Is it possible to insert some code in the network stack which can call my application code immediately after it has written to the shared memory?
EDIT I added a C tag, but the application would be in C++
One way to signal an event from one Unix process to another is with POSIX semaphores. You would use sem_open to initialize and open a named semaphore that you can use cross-process.
See How can I get multiple calls to sem_open working in C?.
The lowest latency method to signal an event between processes on the same host is to spin-wait looking for a (shared) memory location to change... this avoids a system call. You expressly said you do not want the application polling, however in a multi-threaded application running on a multi-core system it may not be a bad tradeoff if you really care about latency.
Unless you are planning to use a real-time OS, there is no "immediate" protocol. The CPU resources are available in quantums of few milliseconds, and usually it takes some time for your user thread to understand it can continue.
Considering all above, any form of IPC would do: local sockets, signals, pipes, event descriptors etc. Practical difference on performance would be miserable.
Furthermore, usage of shared memory can lead to unnessessary complications in maintaining/debugging, but that's the designer's choice.

How Blocking IO Affects A Multithreaded Application/Service In Linux

Am exploring with several concepts for a web crawler in C on Linux. To decide if i'll use blocking IO, multiplexed OI, AIO, a certain combination, etc., I esp need to know (I probably should discover it for myself practically via some test code, but for expediency I prefer to know from others) when a call to IO in blocking mode is made, is it the particular thread (assuming a multithreaded app/svc) or the whole process itself that is blocked? Even more specifically, in a multitheaded (POSIX) app/service can a thread dedicated to remote read/writes block the entire process? If so, how can I unblock such a thread without terminating the entire process?
NB: Whether or not I should use blocking/nonblocking is not really the question here.
Kindly
Blocking calls block only the thread that made them, not the entire process.
Whether to use blocking I/O (with one socket per thread) or non-blocking I/O (with each thread managing multiple sockets) is something you are going to have to benchmark. But as a rule of thumb...
Linux handles multiple threads reasonably efficiently. So if you are only handling a few dozen sockets, using one thread for each is easy to code and should perform well. If you are handling hundreds of sockets, it is a closer call. And for thousands of sockets, you are almost certainly better off using one thread (or process) to manage large groups.
In the latter case, for optimal performance you probably want to use epoll, even though it is Linux-specific.

Event Driven IO And Blocking vs NonBlocking

Can someone explain to me how event-driven IO system calls like select, poll, and epoll relate to blocking vs non-blocking IO?
I don't understand how related -- if at all, these concepts are
The select system call is supported in almost all Unixes and provides means for userland applications to watch over a group of descriptors and get information about which subset of this group is ready for reading/writing. Its particular interface is a bit clunky and the implementation in most kernels is mediocre at best.
epoll is provided only in Linux for the same purpose, but is a huge improvement over select in terms of efficiency and programming interface. Other Unixes have their specialised calls too.
That said, the event-driven IO system calls do not require either blocking or non-blocking descriptors. Blocking is a behaviour that affects system calls like read, write, accept and connect. select and epoll_wait do have blocking timeouts, but that is something unrelated to the descriptors.
Of course, using these event-driven system calls with blocking descriptors is a bit odd because you would expect that you can immediately read the data without blocking after you have been notified that it is available. Always relying that a blocking descriptor won't block after you have been notified for its readiness is a bit risky because race conditions are possible.
Non-blocking, event-driven IO can make server applications vastly more efficient because threads are not needed for each descriptor (connection). Compare the Apache web server to Nginx or Lighttpd in terms of performance and you'll see the benefit.
They're largely unrelated, except that you may want to use non-blocking file descriptors with event-driven IO for the following reasons:
Old versions of Linux definitely have bugs in the kernel where read can block even after select indicated a socket was readable (it happened with UDP sockets and packets with bad checksums). Current versions of Linux may still have some such bugs; I'm not sure.
If there's any possibility that other processes have access to your file descriptors and will read/write to them, or if your program is multi-threaded and other threads might do so, then there is a race condition between select determining that the file descriptor is readable/writable and your program performing IO on it, which could result in blocking.
You almost surely want to make a socket non-blocking before calling connect; otherwise you'll block until the connection is made. Use select for writing to determine when it's successfully connected, and select for errors to determine if the connection failed.
select and similar functions (you mentioned a few) are usually used to implement an event loop in an event driven system.
I.e., instead of read()ing directly from a socket or file -- potentially blocking if the no data is available, the application calls select() on multiple file descriptors waiting for data to be available on any one of them.
When a file descriptor becomes available, you can be assured data is available and the read() operation will not block.
This is one way of processing data from multiple sources simultaneously without resorting to multiple threads.

Non-blocking native files access - single-threaded daemon in C?

I've found out that native files access has no "non-blocking" state. (I'm correct?)
I've been googling for daemons which are "non-blocking", and I've found one which achieved said behavior by threading file access operations, so that the daemon won't block.
My question is, wouldn't threading and IPC'ing such operations be rather expensive? wouldn't it make more sense to either:
A) Pre-thread pool, simply have each client at a thread and let it block for which ever blocking operations it might need. Or,
B) In case of file access blocking, use a relatively small buffer, that way it's still blocking - but one would assume that a tiny buffer for multiple operations would make more sense than paying the price of threading each operation and IPC it?
If you use threading, little IPC overhead is needed. You have the same memory space for all your threads, so a simple mutex or semaphore may be all you need. Now, if you are blocking on a mutex or semaphore too long or too often, why use async I/O in the first place?
As to the actual computation performed by threads doing I/O, they are waiting for the kernel to wake them up most of the time, so I wouldn't worry.
If your application is going to revolve around reading files and other I/O sources, you may want to read up on Reactor patterns, and event-driven programming.
Also, you mentioned a daemon, and servicing clients. If the service you provide is reading files, the computational cost of spawning a new thread to serve each client is minimal, since each individual thread will take "long" to complete requests, and block most of the time anyway. There may be a memory problem if your client count is in the thousands, but otherwise I think you'll do okay.
Give us a little more detail about what you want to do, maybe there are more straightforward ways.

Resources