Why IOCP is used? - c

I am trying to understand why IOCP is used. I can think of two reasons:
Since WSARecv() will not block, then I can handle 1000s of clients without having to create a new thread for each client (also, there is a limit on how many threads you can create, and so the number of clients you can handle will be limited).
Since WSASend() will not block, then when I want to send a large file, I don't have to create a new thread to send it (if I did not create a new thread then the UI thread will block of course).
What other reasons are there to use IOCP?

IOCP has the benefits that you mention but that is not exclusive to IOCP. I'm not that familiar with the native socket APIs but some Win32 APIs have "overlapped IO" which is asynchronous but does not require IOCP.
Another benefit is that with IOCP the number of request serving threads is (kind of) optimized by the kernel. The kernel is aware of all blocking that request serving threads do and it will see to it that there are enough, and not more, threads unblocked at all times so that the CPU is well-utilized. Ideally, you would never block and there would be as many threads as there are cores (assuming 100% load). That would be very efficient.
IOCP also helps to reduce context switching because instead of switching to another thread to process the results of an IO an existing thread that is busy already simply calls GetQueuedCompletionStatus again.
GetQueuedCompletionStatusEx can be used to reduce the number of transitions to the kernel because you can dequeue multiple IOs in one call.

Also, it cuts down on avoidable bulk data copying and protection ring cycles. Instead of the kernel having to copy data from the network stack buffers into a user-space buffer when requested by a recv() call, user-space buffers are supplied by WSARecv() and the stack can then load them directly in kernel space.

Related

Execution Patter of Multi-Threaded Server on Linux

I like to know what should be the execution pattern of Multiple Threads of a Server to implement TCP in request-response cycle of hi-performance Server (like dozens of packets with single or no system call on Linux using Packet MMAP or some other way).
Design 1) For simplicity, Start two thread in main at the start of a Server program. one thread just getting packets directly from network interface(s) like wlan0/eth0. and once number of packets read in one cycle (using while loop with poll() in Linux). wake up the other thread using conditional variable signal call. and after waking up, other thread (sender) process and send packet as tcp response.
Design 2) Start receiver thread at the start of main program. The packet receiver thread reads packets from interfaces using while loop and poll(). When number of packets received, create sender thread and pass number of packets received in one cycle to sender as parameter. Sender thread process the packets and respond as tcp response.
(I think, Design 2 will be more easy to implement but is there any design issue or possible performance issue with this approach this is the question). Since creating buffer to pass to sender thread from receiver thread need to be allocated prior to receiving packets. So I know the size of buffer to allocate. Also in this execution pattern I am creating new thread (which will return and end execution after processing packets and responding tcp response). I like to know what will be the performance issue with this approach since I am creating new thread every time I get a batch of packet from interfaces.
In first approach I am not creating more than two threads (or limited number of threads and threads can be tracked easily for logging and debugging since I will know how many thread are initially created) In second approach I don't know how many threads are hanging around and executing concurrently.
I need any advise how real website like youtube/ or others may have handled this in there hi-performance server if they had followed this way of implementing their front facing servers.
First when going to a 'real' website the magic lies in having a load balancers and a whole bunch of worker nodes to take the load and you easily exceed the boundary of a single system. For example take a look at the following AWS reference architecture for serving web pages at scale AWS Cloud Architecture for serving web whitepaper.
That being said taking this one level down it is always interesting to look at how other well-known products have solved this issue. For example NGINX has an excellent infographic available and matching blogpost describing their architecture and threading.

Non-blocking access to the file system

When writing a non-blocking program (handling multiple sockets) which at a certain point needs to open files using open(2), stat(2) files or open directories using opendir(2), how can I ensure that the system calls do not block?
To me it seems that there's no other alternative than using threads or fork(2).
As Mel Nicholson replied, for everything file descriptor based you can use select/poll/epoll. For everything else you can have a proxy thread-per-item (or a thread pool) with the small stack that would convert (by means of the kernel scheduler) any synchronous blocking waits to select/poll/epoll-able asynchronous events using eventfd or a unix pipe (where portability is required).
The proxy thread shall block till the operation completes and then write to the eventfd or to the pipe to wake up the select/poll/epoll.
Indeed there is no other method.
Actually there is another kind of blocking that can't be dealt with other than by threads and that is page faults. Those may happen in program code, program data, memory allocation or data mapped from files. It's almost impossible to avoid them (actually you can lock some pages to memory, but it's privileged operation and would probably backfire by making the kernel do a poor job of memory management somewhere else). So:
You can't really weed out every last chance of blocking for a particular client, so don't bother with the likes of open and stat. The network will probably add larger delays than these functions anyway.
For optimal performance you should have enough threads so some can be scheduled if the others are blocked on page fault or similar difficult blocking point.
Also if you need to read and process or process and write data during handling a network request, it's faster to access the file using memory-mapping, but that's blocking and can't be made non-blocking. So modern network servers tend to stick with the blocking calls for most stuff and simply have enough threads to keep the CPU busy while other threads are waiting for I/O.
The fact that most modern servers are multi-core is another reason why you need multiple threads anyway.
You can use the poll( ) command to check any number of sockets for data using a single thread.
See here for linux details, or man poll for the details on your system.
open( ) and stat( ) will block in the thread they are called from in all POSIX compliant systems unless called via an asynchronous tactic (like in a fork)

How Blocking IO Affects A Multithreaded Application/Service In Linux

Am exploring with several concepts for a web crawler in C on Linux. To decide if i'll use blocking IO, multiplexed OI, AIO, a certain combination, etc., I esp need to know (I probably should discover it for myself practically via some test code, but for expediency I prefer to know from others) when a call to IO in blocking mode is made, is it the particular thread (assuming a multithreaded app/svc) or the whole process itself that is blocked? Even more specifically, in a multitheaded (POSIX) app/service can a thread dedicated to remote read/writes block the entire process? If so, how can I unblock such a thread without terminating the entire process?
NB: Whether or not I should use blocking/nonblocking is not really the question here.
Kindly
Blocking calls block only the thread that made them, not the entire process.
Whether to use blocking I/O (with one socket per thread) or non-blocking I/O (with each thread managing multiple sockets) is something you are going to have to benchmark. But as a rule of thumb...
Linux handles multiple threads reasonably efficiently. So if you are only handling a few dozen sockets, using one thread for each is easy to code and should perform well. If you are handling hundreds of sockets, it is a closer call. And for thousands of sockets, you are almost certainly better off using one thread (or process) to manage large groups.
In the latter case, for optimal performance you probably want to use epoll, even though it is Linux-specific.

Non-blocking native files access - single-threaded daemon in C?

I've found out that native files access has no "non-blocking" state. (I'm correct?)
I've been googling for daemons which are "non-blocking", and I've found one which achieved said behavior by threading file access operations, so that the daemon won't block.
My question is, wouldn't threading and IPC'ing such operations be rather expensive? wouldn't it make more sense to either:
A) Pre-thread pool, simply have each client at a thread and let it block for which ever blocking operations it might need. Or,
B) In case of file access blocking, use a relatively small buffer, that way it's still blocking - but one would assume that a tiny buffer for multiple operations would make more sense than paying the price of threading each operation and IPC it?
If you use threading, little IPC overhead is needed. You have the same memory space for all your threads, so a simple mutex or semaphore may be all you need. Now, if you are blocking on a mutex or semaphore too long or too often, why use async I/O in the first place?
As to the actual computation performed by threads doing I/O, they are waiting for the kernel to wake them up most of the time, so I wouldn't worry.
If your application is going to revolve around reading files and other I/O sources, you may want to read up on Reactor patterns, and event-driven programming.
Also, you mentioned a daemon, and servicing clients. If the service you provide is reading files, the computational cost of spawning a new thread to serve each client is minimal, since each individual thread will take "long" to complete requests, and block most of the time anyway. There may be a memory problem if your client count is in the thousands, but otherwise I think you'll do okay.
Give us a little more detail about what you want to do, maybe there are more straightforward ways.

questions about multi threading for sockets/tcp-connections

I have a server that connects to multiple clients using TCP/IP connections, using C in Unix. Since it won't have more than 20 connections at a time, I figured I would use a thread per connection/socket. But the problem is writing to the sockets as I'll be sending user prompted msgs to clients. Once each socket is handled by a thread, how do I interact with the created thread to write to the sockets? Should each thread just read from the sockets and I'll write to sockets in the main program? Not sure if that's a good way to go about it.
My rule of thumb is that any given socket should only be operated on by a single thread(*). So if you spawn a separate I/O thread for each socket, and your main thread wants something written to an I/O thread's socket, then the main thread should send that data to the I/O thread, whereupon the I/O thread can write it to the socket.
Of course, this means you need to have a good communications method between the main thread and the I/O thread; which you could do by spawning a socket-pair for each I/O thread and having the I/O threads select()/poll() on their end of the socket-pair (to handle data coming from the main thread) as well as on their network socket.
But once you've done that, you're dealing with complexity of using select()/poll() AND multithreading, which is a lot of complexity overhead. So unless you absolutely need multithreading for some reason, I agree with the previous posters -- it's better to just handle all the sockets in a single thread, via select() or poll().
(*) It's possible to have multiple threads reading/writing to the same socket at the same time, but it's error-prone. In particular, startup and shutdown sequences can be tricky to get 100% right. That's why I try to avoid 'sharing' a given socket amongst multiple threads.
Sounds like you'd probably be better with a single thread and multiplexing the sockets (using select, poll etc). This will avoid the race conditions and locking requirements which will otherwise make the program more difficult to write.
Unless you are doing significant processor-intensive work, or waiting for IO on behalf of these clients, you'll get no benefit from using threads anyway, but the race conditions will still be there.
So I'd say, get a working implementation using a single thread, THEN if in performance testing you discover that it is lacking, refactor it to use multithreading if that seems like the best option to beat the performance problems (of course you'll be profiling it etc).
Having the main thread write to the sockets is fine, you only need to worry about having multiple threads writing to a socket at the same time.
However, I'd test the performance of using a single thread and select/poll before bothering with the muti threaded approach.

Resources