How does the libuv implementation of *non-blockingness* work exactly?

How does the libuv implementation of *non-blockingness* work exactly? - c

So I have just discovered that libuv is a fairly small library as far as C libraries go (compare to FFmpeg). I have spent the past 6 hours reading through the source code to get a feel for the event loop at a deeper level. But still not seeing where the "nonblockingness" is implemented. Where some event interrupt signal or whatnot is being invoked in the codebase.
I have been using Node.js for over 8 years so I am familar with how to use an async non-blocking event loop, but I never actually looked into the implementation.
My question is twofold:
Where exactly is the "looping" occuring within libuv?
What are the key steps in each iteration of the loop that make it non-blocking and async.
So we start with a hello world example. All that is required is this:
#include <stdio.h>
#include <stdlib.h>
#include <uv.h>
int main() {
uv_loop_t *loop = malloc(sizeof(uv_loop_t));
uv_loop_init(loop); // initialize datastructures.
uv_run(loop, UV_RUN_DEFAULT); // infinite loop as long as queue is full?
uv_loop_close(loop);
free(loop);
return 0;
}
The key function which I have been exploring is uv_run. The uv_loop_init function essentially initializes data structures, so not too much fancness there I don't think. But the real magic seems to happen with uv_run, somewhere. A high level set of code snippets from the libuv repo is in this gist, showing what the uv_run function calls.
Essentially it seems to boil down to this:
while (NOT_STOPPED) {
uv__update_time(loop)
uv__run_timers(loop)
uv__run_pending(loop)
uv__run_idle(loop)
uv__run_prepare(loop)
uv__io_poll(loop, timeout)
uv__run_check(loop)
uv__run_closing_handles(loop)
// ... cleanup
}
Those functions are in the gist.
uv__run_timers: runs timer callbacks? loops with for (;;) {.
uv__run_pending: runs regular callbacks? loops through queue with while (!QUEUE_EMPTY(&pq)) {.
uv__run_idle: no source code
uv__run_prepare: no source code
uv__io_poll: does io polling? (can't quite tell what this means tho). Has 2 loops: while (!QUEUE_EMPTY(&loop->watcher_queue)) {, and for (;;) {,
And then we're done. And the program exists, because there is no "work" to be done.
So I think I have answered the first part of my question after all this digging, and the looping is specifically in these 3 functions:
uv__run_timers
uv__run_pending
uv__io_poll
But not having implemented anything with kqueue or multithreading and having dealt relatively little with file descriptors, I am not quite following the code. This will probably help out others along the path to learning this too.
So the second part of the question is what are the key steps in these 3 functions that implement the nonblockingness? Assuming this is where all the looping exists.
Not being a C expert, does for (;;) { "block" the event loop? Or can that run indefinitely and somehow other parts of the code are jumped to from OS system events or something like that?
So uv__io_poll calls poll(...) in that endless loop. I don't think is non-blocking, is that correct? That seems to be all it mainly does.
Looking into kqueue.c there is also a uv__io_poll, so I assume the poll implementation is a fallback and kqueue on Mac is used, which is non-blocking?
So is that it? Is it just looping in uv__io_poll and each iteration you can add to the queue, and as long as there's stuff in the queue it will run? I still don't see how it's non-blocking and async.
Can one outline similar to this how it is async and non-blocking, and which parts of the code to take a look at? Basically, I would like to see where the "free processor idleness" exists in libuv. Where is the processor ever free in the call to our initial uv_run? If it is free, how does it get reinvoked, like an event handler? (Like a browser event handler from the mouse, an interrupt). I feel like I'm looking for an interrupt but not seeing one.
I ask this because I want to implement an MVP event loop in C, but just don't understand how nonblockingness actually is implemented. Where the rubber meets the road.

I think that trying to understand libuv is getting in your way of understanding how reactors (event loops) are implemented in C, and it is this that you need to understand, as opposed to the exact implementation details behind libuv.
(Note that when I say "in C", what I really means is "at or near the system call interface, where userland meets the kernel".)
All of the different backends (select, poll, epoll, etc) are, more-or-less, variations on the same theme. They block the current process or thread until there is work to be done, like servicing a timer, reading from a socket, writing to a socket, or handling a socket error.
When the current process is blocked, it literally is not getting any CPU cycles assigned to it by the OS scheduler.
Part of the issue behind understanding this stuff IMO is the poor terminology: async, sync in JS-land, which don't really describe what these things are. Really, in C, we're talking about non-blocking vs blocking I/O.
When we read from a blocking file descriptor, the process (or thread) is blocked -- prevented from running -- until the kernel has something for it to read; when we write to a blocking file descriptor, the process is blocked until the kernel accepts the entire buffer.
In non-blocking I/O, it's exactly the same, except the kernel won't stop the process from running when there is nothing to do: instead, when you read or write, it tells you how much you read or wrote (or if there was an error).
The select system call (and friends) prevent the C developer from having to try and read from a non-blocking file descriptor over and over again -- select() is, in effect, a blocking system call that unblocks when any of the descriptors or timers you are watching are ready. This lets the developer build a loop around select, servicing any events it reports, like an expired timeout or a file descriptor that can be read. This is the event loop.
So, at its very core, what happens at the C-end of a JS event loop is roughly this algorithm:
while(true) {
select(open fds, timeout);
did_the_timeout_expire(run_js_timers());
for (each error fd)
run_js_error_handler(fdJSObjects[fd]);
for (each read-ready fd)
emit_data_events(fdJSObjects[fd], read_as_much_as_I_can(fd));
for (each write-ready fd) {
if (!pendingData(fd))
break;
write_as_much_as_I_can(fd);
pendingData = whatever_was_leftover_that_couldnt_write;
}
}
FWIW - I have actually written an event loop for v8 based around select(): it really is this simple.
It's important also to remember that JS always runs to completion. So, when you call a JS function (via the v8 api) from C, your C program doesn't do anything until the JS code returns.
NodeJS uses some optimizations like handling pending writes in a separate pthreads, but these all happen in "C space" and you shouldn't think/worry about them when trying to understand this pattern, because they're not relevant.
You might also be fooled into the thinking that JS isn't run to completion when dealing with things like async functions -- but it absolutely is, 100% of the time -- if you're not up to speed on this, do some reading with respect to the event loop and the micro task queue. Async functions are basically a syntax trick, and their "completion" involves returning a Promise.

I just took a dive into libuv's source code, and found at first that it seems like it does a lot of setup, and not much actual event handling.
Nonetheless, a look into src/unix/kqueue.c reveals some of the inner mechanics of event handling:
int uv__io_check_fd(uv_loop_t* loop, int fd) {
struct kevent ev;
int rc;
rc = 0;
EV_SET(&ev, fd, EVFILT_READ, EV_ADD, 0, 0, 0);
if (kevent(loop->backend_fd, &ev, 1, NULL, 0, NULL))
rc = UV__ERR(errno);
EV_SET(&ev, fd, EVFILT_READ, EV_DELETE, 0, 0, 0);
if (rc == 0)
if (kevent(loop->backend_fd, &ev, 1, NULL, 0, NULL))
abort();
return rc;
}
The file descriptor polling is done here, "setting" the event with EV_SET (similar to how you use FD_SET before checking with select()), and the handling is done via the kevent handler.
This is specific to the kqueue style events (mainly used on BSD-likes a la MacOS), and there are many other implementations for different Unices, but they all use the same function name to do nonblocking IO checks. See here for another implementation using epoll.
To answer your questions:
1) Where exactly is the "looping" occuring within libuv?
The QUEUE data structure is used for storing and processing events. This queue is filled by the platform- and IO- specific event types you register to listen for. Internally, it uses a clever linked-list using only an array of two void * pointers (see here):
typedef void *QUEUE[2];
I'm not going to get into the details of this list, all you need to know is it implements a queue-like structure for adding and popping elements.
Once you have file descriptors in the queue that are generating data, the asynchronous I/O code mentioned earlier will pick it up. The backend_fd within the uv_loop_t structure is the generator of data for each type of I/O.
2) What are the key steps in each iteration of the loop that make it non-blocking and async?
libuv is essentially a wrapper (with a nice API) around the real workhorses here, namely kqueue, epoll, select, etc. To answer this question completely, you'd need a fair bit of background in kernel-level file descriptor implementation, and I'm not sure if that's what you want based on the question.
The short answer is that the underlying operating systems all have built-in facilities for non-blocking (and therefore async) I/O. How each system works is a little outside the scope of this answer, I think, but I'll leave some reading for the curious:
https://www.quora.com/Network-Programming-How-is-select-implemented?share=1

The first thing to keep in mind is that work must be added to libuv's queues using its API; one cannot just load up libuv, start its main loop, and then code up some I/O and get async I/O.
The queues maintained by libuv are managed by looping. The infinite loop in uv__run_timers isn't actually infinite; notice that the first check verifies that a soonest-expiring timer exists (presumably, if the list is empty, this is NULL), and if not, breaks the loop and the function returns. The next check breaks the loop if the current (soonest-expiring) timer hasn't expired. If neither of those conditions breaks the loop, the code continues: it restarts the timer, calls its timeout handler, and then loops again to check more timers. Most times when this code runs, it's going to break the loop and exit, allowing the other loops to run.
What makes all this non-blocking is the caller/user following the guidelines and API of libuv: adding your work to queues, and allowing libuv to perform its work on those queues. Processing-intensive work may block these loops and other work from running, so it's important to break your work into chunks.

btw, uv__run_idle, uv__run_check, uv__run_prepare 's source code is defined on src/unix/loop-watcher.c

Related

What is the proper way to handle non-event driven tasks with Xlib?

I am writing a graphical program for Linux, in C, using Xlib. The problem is simple.
I have an event, that occurs outside of the Xlib event loop, that needs to be handled and will cause some change to what is displayed to the screen. In my case, I am trying to make a cursor blink inside of a textbox. My event loop looks like this:
XEvent event;
while(window.loop_running == 1) {
XNextEvent(window.dis, &event);
event_handler(&window, &event); // This is a custom function to handle events
}
Solution 1a:
My first thought was to create a 2nd loop with pthreads. This 2nd loop could handle the asynchronous event and draw as needed to the screen. However, Xlib, doesn't play nice with multiple threads. Even if I used a mutex to block the event_handler() function, during asynchronous drawing in the 2nd loop, I still periodically had crashes from X. Additionally, if the event loop is not cycled, after about 10 calls from the pthread loop, the program locks up.
Solution 1b:
This could be solved by calling XInitThreads() at the start of my program. However, this causes valgrind to report a memory leak. It seems that there is some memory allocated that is not freed on exit. And calls to XCloseDisplay() just hang if I call XInitThreads(). I still haven't figured out how to destroy and clean-up windows in my program, but that might be better saved for a separate question. Additionally, calling XInitThreads() at the start of my program stops the program from freezing after 10 calls are made from the pthread loop without cycling of the event loop. However, calls to X start to block after about 10 calls from the pthread loop. Things briefly resume after the event loop is cycled, such as by mousing over the window. However, calls quickly start blocking again when in-loop event activity ceases. Interestingly, I noticed that I could replicate this issue in some other programs such as Bluefish. If I open Bluefish, start a cursor in the main textbox, and then mouseout, after about 10 seconds the cursor stops blinking. Clearly this isn't always an issue since things like a video player's display would freeze after some period of no X events being triggered.
Solution 1c:
I can stop the window from freezing by using XSendEvent() to cycle the event loop after drawing is completed from the pthread loop. However, this seems really Hacky. And I can't guarantee that it will work since I don't know at exactly what point, X will stop listening. I haven't been able to determine the root cause of this issue. As I said, it seems to happen after about 10 seconds, but this varies depending on how I change the cycle rate of the blinking cursor. I'm tempted to guess that it is a function of the actual calls to X being made. There are approximately 2 per pixel, per redraw. It has to 1) set the foreground color and 2) draw the pixel from the bitmap buffer to the screen. Currently, my window only supports a resolution of 640x480. Of course, I am just guessing that this can be used to determine the failure point since I really don't know the cause.
Solution 2:
I can drop all of this and re-implement the event loop by polling the event queue with XEventsQueued(), handling them as they come. But I'll be honest, I hate this solution. This is a really hacky solution that will increase the processing power required for this application and increase the the event response latency, since I would want to sleep the thread between polls to prevent just spinning the thread and pegging the CPU core. I am writing this program with the goal of a fast, stable, and lean program.
Does anyone have a solution? It's such a simple and fundamental problem, but I have only seen sample applications that use XNextEvent in an event loop. I haven't found any samples of how to handle out of event loop events. Thanks for the help. I am a brand new member to Stack Overflow. This is my first post. So I apologize if I make a mistake.

User: mosvy
Wrote this comment:
You should use poll() with the fd obtained with ConnectionNumber() and the fds your other events comes on. When the X11 fd is "ready", you process the events with while(XPending()){ XNextEvent(); ... }. Even then, X11 functions which are of the form request/reply (eg XQueryTree) may stall your event loop. The solution is to switch to xcb (where you could split those in their request/reply parts). IMHO xcb is just as ugly and not much better than Xlib, but it's the only thing readily available.
This works great! Thanks for pointing me in the right direction. I wish you would have written that as an answer. I would have accepted it for you.
EDIT: Deleted my previous edit because I later discovered that the problem was a mistake elsewhere in my full program. In fact, the edit was incorrect and the code didn't correctly read in events.
Here is how my event loop looks now. I'll probably reorganize it, to try to clean it up. But as a proof of concept, here:
// window is a custom struct defined and setup elsewhere in my program
// event_handler() is a custom function elsewhere in my program
window.fd = XConnectionNumber(window.dis);
struct pollfd fds;
fds.fd = window.fd;
fds.events = POLLIN;
int poll_ret;
XEvent event;
while(window.loop_running == 1) {
poll_ret = poll(&fds, 1, 10);
if (poll_ret < 0) {
window.loop_running = 0;
} else if(poll_ret > 0) {
while (XPending(window.dis) > 0) {
XNextEvent(window.dis, &event);
event_handler(&window, &event);
}
}
}
I can use poll() to listen for all events using kernel file descriptors, including X events. The file descriptor for X events is obtained with ConnectionNumber(). I also needed to set it to listen to the file descriptor events of type "POLLIN". Poll accepts an array of file descriptors. See the documentation for more information on the schema. When poll() returns, I can check if it timed out or if an event is enqued. I can then act that accordingly.
I believe this will work for custom events if needed. To do that, I will probably just need to set up my own file descriptors to interact with. I'll look into it if needed.
This solution means that I do not need to call XInitThreads() since I am not ever making simultaneous calls to X.

How to prevent linux soft lockup/unresponsiveness in C without sleep

How would be the correct way to prevent a soft lockup/unresponsiveness in a long running while loop in a C program?
(dmesg is reporting a soft lockup)
Pseudo code is like this:
while( worktodo ) {
worktodo = doWork();
}
My code is of course way more complex, and also includes a printf statement which gets executed once a second to report progress, but the problem is, the program ceases to respond to ctrl+c at this point.
Things I've tried which do work (but I want an alternative):
doing printf every loop iteration (don't know why, but the program becomes responsive again that way (???)) - wastes a lot of performance due to unneeded printf calls (each doWork() call does not take very long)
using sleep/usleep/... - also seems like a waste of (processing-)time to me, as the whole program will already be running several hours at full speed
What I'm thinking about is some kind of process_waiting_events() function or the like, and normal signals seem to be working fine as I can use kill on a different shell to stop the program.
Additional background info: I'm using GWAN and my code is running inside the main.c "maintenance script", which seems to be running in the main thread as far as I can tell.
Thank you very much.
P.S.: Yes I did check all other threads I found regarding soft lockups, but they all seem to ask about why soft lockups occur, while I know the why and want to have a way of preventing them.
P.P.S.: Optimizing the program (making it run shorter) is not really a solution, as I'm processing a 29GB bz2 file which extracts to about 400GB xml, at the speed of about 10-40MB per second on a single thread, so even at max speed I would be bound by I/O and still have it running for several hours.

While the posed answer using threads might possibly be an option, it would in reality just shift the problem to a different thread. My solution after all was using
sleep(0)
Also tested sched_yield / pthread_yield, both of which didn't really help. Unfortunately I've been unable to find a good resource which documents sleep(0) in linux, but for windows the documentation states that using a value of 0 lets the thread yield it's remaining part of the current cpu slice.
It turns out that sleep(0) is most probably relying on what is called timer slack in linux - an article about this can be found here: http://lwn.net/Articles/463357/
Another possibility is using nanosleep(&(struct timespec){0}, NULL) which seems to not necessarily rely on timer slack - linux man pages for nanosleep state that if the requested interval is below clock granularity, it will be rounded up to clock granularity, which on linux depends on CLOCK_MONOTONIC according to the man pages. Thus, a value of 0 nanoseconds is perfectly valid and should always work, as clock granularity can never be 0.
Hope this helps someone else as well ;)

Your scenario is not really a soft lock up, it is a process is busy doing something.
How about this pseudo code:
void workerThread()
{
while(workToDo)
{
if(threadSignalled)
break;
workToDo = DoWork()
}
}
void sighandler()
{
signal worker thread to finish
waitForWorkerThreadFinished;
}
void main()
{
InstallSignalHandler;
CreateSemaphore
StartThread;
waitForWorkerThreadFinished;
}

Clearly a timing issue. Using a signalling mechanism should remove the problem.
The use of printf solves the problem because printf accesses the console which is an expensive and time consuming process which in your case gives enough time for the worker to complete its work.

Asynchronous communication in c

I have to send a command over serial and receive back an answer based on the command and do something based on the message received. I was told that I have to use callbacks as this is an asynchronous operation.
I have a 2 threads, one that can send messages and one that receives the messages.
Example:
//Thread 1
sendMessage("Initialize");
//Thread 2
while(1)
{
checkForMessages();
}
How can I write a function that is initialized for a specific message and handles the message recieved.
Example:
CommHandle(Command,MsgReceived)
{
if(command)
{
if(MsgReceived == ok)
...
if(MsgReceived == error)
...
}
}

I was told that I have to use callbacks as this is an asynchronous operation.
Not necessarily. There is something in Windows called "asynchronous I/O", this is to be regarded as an internal Windows term, which is synonymous with "overlapped I/O" (explanation here). When you are using overlapped I/O, you will get a callback when the transmission is finished. This is nice, since it reduces CPU load, but it is not really necessary if your program has nothing better to do while waiting. So it depends on the nature of your application.
But no matter the nature of your application, you should indeed handle all serial communication through threads, so that you won't cause the main GUI thread to freeze up in embarrassing ways.
Having one rx and one tx thread gives you a dilemma though: they are using the same port handle and they cannot freely access it, because that wouldn't be thread-safe. The solution is to either make one single super-thread handling all transmissions, or to protect the port handle through a mutex.
I'm not sure which method that is best, I have no recommendation. I have only used the "super-thread" one myself: one obvious advantage was that I could centralize WaitFor instructions like "kill thread", "port is open", "port is closed" at one place. But at the same time the code turned out rather complex.
How can I write a function that is initialized for a specific message and handles the message recieved.
Let your thread(s) shovel their received data into some buffers. A tx buffer and a rx buffer. Depending on your serial protocol and performance, you might have to use double buffers: one that is used for the current transmission and one that contains the most recently received data.
Then from main, pick up the data from the buffers. They need to be thread-safe. Once you have gotten that far, simply write a parser like you would with any form of data and take actions from there

How do I replace select() with kevent() for higher performance?

From the Kqueue Wikipedia Page:
Kqueue provides efficient input and output event pipelines between the kernel and userland. Thus, it is possible to modify event filters as well as receive pending events while using only a single system call to kevent(2) per main event loop iteration. This contrasts with older traditional polling system calls such as poll(2) and select(2) which are less efficient, especially when polling for events on a large number of file descriptors
That sounds great. I target FreeBSD for my server, and I'm handling a significant amount of network socket fd's - using a select() on them all and figuring out who to read data from. I'd rather use kevent() calls to get higher performance, since that's what it's there for!
I've read the man page for kevent on FreeBSD here but it's cryptic to me and I'm not finding good resources that explain it. An example of using kevent to replace select would solve my problem, and would also help me get a better idea of how kevent() is used.

First, create new kqueue:
int kq=kqueue();
Now register your fd in kq:
struct kevent kev;
kev.ident=your_fd;
kev.flags=EV_ADD | EV_CLEAR;
kev.filter=EVFILT_READ;
kev.fflags=0;
kev.data=0;
kev.udata=&your_data;
int res=kevent(kq,&kev,1,0,0,0);
Finally, wait for data to arrive into your socket:
struct kevent res_kevs[5];
int res=kevent(kq,0,0,res_kevs,5,0);
After return, res_kevs[i].ident will contain your socket's descriptor, res_kevs[i].data - number of bytes ready to be read.
See man kevent for more details and features.

What you do normally, is use libevent, which takes care of all the details for you, and also means that you can move your program on to another OS which has a different scheme (e.g. Linux and epoll) for doing something similar.

select()-able timers

select() is a great system call. You can pack any number of file descriptors, socket descriptors, pipes, etc. and get notified in a synchronous fashion when input becomes available.
Is there a way to create an interval/oneshot timer and use it with select()? That would save me from having multiple threads for IO and timing.

timerfd_create does exactly this. It's a fairly recent addition to the linux kernel and might not be available on all distros yet though.

Use the timeout parameter - keep your timer events in a priority queue, check the top item and set the timeout accordingly - if the timeout is reached, then you can check that the event is ready to run, run the event and continue.
At least that's what I do.
Note that poll has a nicer interface (in some ways) and may be more efficient with lots of file descriptors.

MarkR has a nice portable solution, but here's another:
Use a POSIX timer (timer_create) and you can transform the problem into "select-able signals". This problem has a classic solution: writing to a pipe from the signal handler and selecting on the read end of the pipe.

Building on #MarkR, using a sorted structure to store callback+closure with an int and a pointer to an int. If the two ints have the same value then the event is active otherwise it was discarded.
This way events can be discarded simply by increasing an int. Perhaps not the most straightforward solution but it was all I could think of.
https://github.com/cheako/tor2web/tree/6ac67f80daaea01d14a5d07e6026e1af4258dc96/src
hextree.c contains the code for the data structure used.
schedule.c:156 is where the int is changed.
gnutls.c:197 is where the timers are created.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight