What is the proper way to handle non-event driven tasks with Xlib?

What is the proper way to handle non-event driven tasks with Xlib? - c

I am writing a graphical program for Linux, in C, using Xlib. The problem is simple.
I have an event, that occurs outside of the Xlib event loop, that needs to be handled and will cause some change to what is displayed to the screen. In my case, I am trying to make a cursor blink inside of a textbox. My event loop looks like this:
XEvent event;
while(window.loop_running == 1) {
XNextEvent(window.dis, &event);
event_handler(&window, &event); // This is a custom function to handle events
}
Solution 1a:
My first thought was to create a 2nd loop with pthreads. This 2nd loop could handle the asynchronous event and draw as needed to the screen. However, Xlib, doesn't play nice with multiple threads. Even if I used a mutex to block the event_handler() function, during asynchronous drawing in the 2nd loop, I still periodically had crashes from X. Additionally, if the event loop is not cycled, after about 10 calls from the pthread loop, the program locks up.
Solution 1b:
This could be solved by calling XInitThreads() at the start of my program. However, this causes valgrind to report a memory leak. It seems that there is some memory allocated that is not freed on exit. And calls to XCloseDisplay() just hang if I call XInitThreads(). I still haven't figured out how to destroy and clean-up windows in my program, but that might be better saved for a separate question. Additionally, calling XInitThreads() at the start of my program stops the program from freezing after 10 calls are made from the pthread loop without cycling of the event loop. However, calls to X start to block after about 10 calls from the pthread loop. Things briefly resume after the event loop is cycled, such as by mousing over the window. However, calls quickly start blocking again when in-loop event activity ceases. Interestingly, I noticed that I could replicate this issue in some other programs such as Bluefish. If I open Bluefish, start a cursor in the main textbox, and then mouseout, after about 10 seconds the cursor stops blinking. Clearly this isn't always an issue since things like a video player's display would freeze after some period of no X events being triggered.
Solution 1c:
I can stop the window from freezing by using XSendEvent() to cycle the event loop after drawing is completed from the pthread loop. However, this seems really Hacky. And I can't guarantee that it will work since I don't know at exactly what point, X will stop listening. I haven't been able to determine the root cause of this issue. As I said, it seems to happen after about 10 seconds, but this varies depending on how I change the cycle rate of the blinking cursor. I'm tempted to guess that it is a function of the actual calls to X being made. There are approximately 2 per pixel, per redraw. It has to 1) set the foreground color and 2) draw the pixel from the bitmap buffer to the screen. Currently, my window only supports a resolution of 640x480. Of course, I am just guessing that this can be used to determine the failure point since I really don't know the cause.
Solution 2:
I can drop all of this and re-implement the event loop by polling the event queue with XEventsQueued(), handling them as they come. But I'll be honest, I hate this solution. This is a really hacky solution that will increase the processing power required for this application and increase the the event response latency, since I would want to sleep the thread between polls to prevent just spinning the thread and pegging the CPU core. I am writing this program with the goal of a fast, stable, and lean program.
Does anyone have a solution? It's such a simple and fundamental problem, but I have only seen sample applications that use XNextEvent in an event loop. I haven't found any samples of how to handle out of event loop events. Thanks for the help. I am a brand new member to Stack Overflow. This is my first post. So I apologize if I make a mistake.

User: mosvy
Wrote this comment:
You should use poll() with the fd obtained with ConnectionNumber() and the fds your other events comes on. When the X11 fd is "ready", you process the events with while(XPending()){ XNextEvent(); ... }. Even then, X11 functions which are of the form request/reply (eg XQueryTree) may stall your event loop. The solution is to switch to xcb (where you could split those in their request/reply parts). IMHO xcb is just as ugly and not much better than Xlib, but it's the only thing readily available.
This works great! Thanks for pointing me in the right direction. I wish you would have written that as an answer. I would have accepted it for you.
EDIT: Deleted my previous edit because I later discovered that the problem was a mistake elsewhere in my full program. In fact, the edit was incorrect and the code didn't correctly read in events.
Here is how my event loop looks now. I'll probably reorganize it, to try to clean it up. But as a proof of concept, here:
// window is a custom struct defined and setup elsewhere in my program
// event_handler() is a custom function elsewhere in my program
window.fd = XConnectionNumber(window.dis);
struct pollfd fds;
fds.fd = window.fd;
fds.events = POLLIN;
int poll_ret;
XEvent event;
while(window.loop_running == 1) {
poll_ret = poll(&fds, 1, 10);
if (poll_ret < 0) {
window.loop_running = 0;
} else if(poll_ret > 0) {
while (XPending(window.dis) > 0) {
XNextEvent(window.dis, &event);
event_handler(&window, &event);
}
}
}
I can use poll() to listen for all events using kernel file descriptors, including X events. The file descriptor for X events is obtained with ConnectionNumber(). I also needed to set it to listen to the file descriptor events of type "POLLIN". Poll accepts an array of file descriptors. See the documentation for more information on the schema. When poll() returns, I can check if it timed out or if an event is enqued. I can then act that accordingly.
I believe this will work for custom events if needed. To do that, I will probably just need to set up my own file descriptors to interact with. I'll look into it if needed.
This solution means that I do not need to call XInitThreads() since I am not ever making simultaneous calls to X.

Related

How does the libuv implementation of non-blockingness work exactly?

So I have just discovered that libuv is a fairly small library as far as C libraries go (compare to FFmpeg). I have spent the past 6 hours reading through the source code to get a feel for the event loop at a deeper level. But still not seeing where the "nonblockingness" is implemented. Where some event interrupt signal or whatnot is being invoked in the codebase.
I have been using Node.js for over 8 years so I am familar with how to use an async non-blocking event loop, but I never actually looked into the implementation.
My question is twofold:
Where exactly is the "looping" occuring within libuv?
What are the key steps in each iteration of the loop that make it non-blocking and async.
So we start with a hello world example. All that is required is this:
#include <stdio.h>
#include <stdlib.h>
#include <uv.h>
int main() {
uv_loop_t *loop = malloc(sizeof(uv_loop_t));
uv_loop_init(loop); // initialize datastructures.
uv_run(loop, UV_RUN_DEFAULT); // infinite loop as long as queue is full?
uv_loop_close(loop);
free(loop);
return 0;
}
The key function which I have been exploring is uv_run. The uv_loop_init function essentially initializes data structures, so not too much fancness there I don't think. But the real magic seems to happen with uv_run, somewhere. A high level set of code snippets from the libuv repo is in this gist, showing what the uv_run function calls.
Essentially it seems to boil down to this:
while (NOT_STOPPED) {
uv__update_time(loop)
uv__run_timers(loop)
uv__run_pending(loop)
uv__run_idle(loop)
uv__run_prepare(loop)
uv__io_poll(loop, timeout)
uv__run_check(loop)
uv__run_closing_handles(loop)
// ... cleanup
}
Those functions are in the gist.
uv__run_timers: runs timer callbacks? loops with for (;;) {.
uv__run_pending: runs regular callbacks? loops through queue with while (!QUEUE_EMPTY(&pq)) {.
uv__run_idle: no source code
uv__run_prepare: no source code
uv__io_poll: does io polling? (can't quite tell what this means tho). Has 2 loops: while (!QUEUE_EMPTY(&loop->watcher_queue)) {, and for (;;) {,
And then we're done. And the program exists, because there is no "work" to be done.
So I think I have answered the first part of my question after all this digging, and the looping is specifically in these 3 functions:
uv__run_timers
uv__run_pending
uv__io_poll
But not having implemented anything with kqueue or multithreading and having dealt relatively little with file descriptors, I am not quite following the code. This will probably help out others along the path to learning this too.
So the second part of the question is what are the key steps in these 3 functions that implement the nonblockingness? Assuming this is where all the looping exists.
Not being a C expert, does for (;;) { "block" the event loop? Or can that run indefinitely and somehow other parts of the code are jumped to from OS system events or something like that?
So uv__io_poll calls poll(...) in that endless loop. I don't think is non-blocking, is that correct? That seems to be all it mainly does.
Looking into kqueue.c there is also a uv__io_poll, so I assume the poll implementation is a fallback and kqueue on Mac is used, which is non-blocking?
So is that it? Is it just looping in uv__io_poll and each iteration you can add to the queue, and as long as there's stuff in the queue it will run? I still don't see how it's non-blocking and async.
Can one outline similar to this how it is async and non-blocking, and which parts of the code to take a look at? Basically, I would like to see where the "free processor idleness" exists in libuv. Where is the processor ever free in the call to our initial uv_run? If it is free, how does it get reinvoked, like an event handler? (Like a browser event handler from the mouse, an interrupt). I feel like I'm looking for an interrupt but not seeing one.
I ask this because I want to implement an MVP event loop in C, but just don't understand how nonblockingness actually is implemented. Where the rubber meets the road.

I think that trying to understand libuv is getting in your way of understanding how reactors (event loops) are implemented in C, and it is this that you need to understand, as opposed to the exact implementation details behind libuv.
(Note that when I say "in C", what I really means is "at or near the system call interface, where userland meets the kernel".)
All of the different backends (select, poll, epoll, etc) are, more-or-less, variations on the same theme. They block the current process or thread until there is work to be done, like servicing a timer, reading from a socket, writing to a socket, or handling a socket error.
When the current process is blocked, it literally is not getting any CPU cycles assigned to it by the OS scheduler.
Part of the issue behind understanding this stuff IMO is the poor terminology: async, sync in JS-land, which don't really describe what these things are. Really, in C, we're talking about non-blocking vs blocking I/O.
When we read from a blocking file descriptor, the process (or thread) is blocked -- prevented from running -- until the kernel has something for it to read; when we write to a blocking file descriptor, the process is blocked until the kernel accepts the entire buffer.
In non-blocking I/O, it's exactly the same, except the kernel won't stop the process from running when there is nothing to do: instead, when you read or write, it tells you how much you read or wrote (or if there was an error).
The select system call (and friends) prevent the C developer from having to try and read from a non-blocking file descriptor over and over again -- select() is, in effect, a blocking system call that unblocks when any of the descriptors or timers you are watching are ready. This lets the developer build a loop around select, servicing any events it reports, like an expired timeout or a file descriptor that can be read. This is the event loop.
So, at its very core, what happens at the C-end of a JS event loop is roughly this algorithm:
while(true) {
select(open fds, timeout);
did_the_timeout_expire(run_js_timers());
for (each error fd)
run_js_error_handler(fdJSObjects[fd]);
for (each read-ready fd)
emit_data_events(fdJSObjects[fd], read_as_much_as_I_can(fd));
for (each write-ready fd) {
if (!pendingData(fd))
break;
write_as_much_as_I_can(fd);
pendingData = whatever_was_leftover_that_couldnt_write;
}
}
FWIW - I have actually written an event loop for v8 based around select(): it really is this simple.
It's important also to remember that JS always runs to completion. So, when you call a JS function (via the v8 api) from C, your C program doesn't do anything until the JS code returns.
NodeJS uses some optimizations like handling pending writes in a separate pthreads, but these all happen in "C space" and you shouldn't think/worry about them when trying to understand this pattern, because they're not relevant.
You might also be fooled into the thinking that JS isn't run to completion when dealing with things like async functions -- but it absolutely is, 100% of the time -- if you're not up to speed on this, do some reading with respect to the event loop and the micro task queue. Async functions are basically a syntax trick, and their "completion" involves returning a Promise.

I just took a dive into libuv's source code, and found at first that it seems like it does a lot of setup, and not much actual event handling.
Nonetheless, a look into src/unix/kqueue.c reveals some of the inner mechanics of event handling:
int uv__io_check_fd(uv_loop_t* loop, int fd) {
struct kevent ev;
int rc;
rc = 0;
EV_SET(&ev, fd, EVFILT_READ, EV_ADD, 0, 0, 0);
if (kevent(loop->backend_fd, &ev, 1, NULL, 0, NULL))
rc = UV__ERR(errno);
EV_SET(&ev, fd, EVFILT_READ, EV_DELETE, 0, 0, 0);
if (rc == 0)
if (kevent(loop->backend_fd, &ev, 1, NULL, 0, NULL))
abort();
return rc;
}
The file descriptor polling is done here, "setting" the event with EV_SET (similar to how you use FD_SET before checking with select()), and the handling is done via the kevent handler.
This is specific to the kqueue style events (mainly used on BSD-likes a la MacOS), and there are many other implementations for different Unices, but they all use the same function name to do nonblocking IO checks. See here for another implementation using epoll.
To answer your questions:
1) Where exactly is the "looping" occuring within libuv?
The QUEUE data structure is used for storing and processing events. This queue is filled by the platform- and IO- specific event types you register to listen for. Internally, it uses a clever linked-list using only an array of two void * pointers (see here):
typedef void *QUEUE[2];
I'm not going to get into the details of this list, all you need to know is it implements a queue-like structure for adding and popping elements.
Once you have file descriptors in the queue that are generating data, the asynchronous I/O code mentioned earlier will pick it up. The backend_fd within the uv_loop_t structure is the generator of data for each type of I/O.
2) What are the key steps in each iteration of the loop that make it non-blocking and async?
libuv is essentially a wrapper (with a nice API) around the real workhorses here, namely kqueue, epoll, select, etc. To answer this question completely, you'd need a fair bit of background in kernel-level file descriptor implementation, and I'm not sure if that's what you want based on the question.
The short answer is that the underlying operating systems all have built-in facilities for non-blocking (and therefore async) I/O. How each system works is a little outside the scope of this answer, I think, but I'll leave some reading for the curious:
https://www.quora.com/Network-Programming-How-is-select-implemented?share=1

The first thing to keep in mind is that work must be added to libuv's queues using its API; one cannot just load up libuv, start its main loop, and then code up some I/O and get async I/O.
The queues maintained by libuv are managed by looping. The infinite loop in uv__run_timers isn't actually infinite; notice that the first check verifies that a soonest-expiring timer exists (presumably, if the list is empty, this is NULL), and if not, breaks the loop and the function returns. The next check breaks the loop if the current (soonest-expiring) timer hasn't expired. If neither of those conditions breaks the loop, the code continues: it restarts the timer, calls its timeout handler, and then loops again to check more timers. Most times when this code runs, it's going to break the loop and exit, allowing the other loops to run.
What makes all this non-blocking is the caller/user following the guidelines and API of libuv: adding your work to queues, and allowing libuv to perform its work on those queues. Processing-intensive work may block these loops and other work from running, so it's important to break your work into chunks.

btw, uv__run_idle, uv__run_check, uv__run_prepare 's source code is defined on src/unix/loop-watcher.c

Blocking Task with Event Flags halts the program

I'm integrating FreeRTOS cmsis_v2 on my STM32F303VCx and come to a certain problem then using Event Flags when blocking the task to wait for operation approval from another task.
If the task executes the following code, all other tasks get minimal runtime (understandably because OS is constantly checking evt_flg):
for(;;)
{
flag = osEventFlagsWait (evt_flg, EventOccured, osFlagsWaitAny, 0);
if (flag == EventOccured)
{
/* Task main route */
osEventFlagsClear (evt_flg,EventOccured);
}
}
But if to set timeout to osWaitForver: osEventFlagsWait (evt_flg, EventOccured, osFlagsWaitAny, osWaitForver ), the whole program goes into HardFault.
What's the best solution for such behavior? I need the task to wait for a flag and don't block other ones, such as terminal input read, from running.

The task code the question provides is constantly busy, polling the RTOS event.
This is a design antipattern, it is virtually always better to have the task block until the event source has fired. The only exception where a call to osEventFlagsWait() with a zero timeout could make more sense is if you have to monitor several different event/data sources for which there is not a common RTOS API to wait for (and even then, this is only an "emergency exit"). Hence, osWaitForver shall be used.
Next, the reason for the HardFault should be sought. Alone in this task code, I don't see a reason for this - the HardFault source is likely somewhere else. When the area the HardFault can come from, that could be worth a new question (or already fixed). Good luck!

GLX Vsync event

I'm wondering if I could catch the screen vsync event by any file descriptor and [select | poll | epoll]ing it.
Normally, if I'm right, glXSwapBuffers() doesn't block the process so I could do something like :
int init() {
create epollfd;
add Xconnection number to it;
add some other fd like socket timer tty etc...
possibly add a vsync fd like dri/card0 or fb0 or other???
return epollfd;
}
main() {
int run = 1;
int epollfd = init();
while(run) {
epoll_wait(epollfd, ...) {
if(trigedfd = socket) {
do network computing;
}
if(trigedfd = timer) {
do physics computing;
}
if(trigedfd = tty) {
do electronic communications;
}
if(trigedfd = Xconnection number) {
switch(Xevent) {
case key event:
do key computing;
case mouse event:
do mouse computing;
case vsync???:
do GL computings;
glXSwapBuffers();
}
}
if(trigedfd = dri/card0 or fb0 or other???) {
do GL computings;
glXSwapBuffers();
}
}
}
So I could then I could trig any event regardless when vsync event happen and avoid by the same time tearing effect in the case where I use only X drawing function and possibly GL for vsync.
Could libdrm help me? the more general question is :
So what fd do I have to use to catch vsync event and how to make shure on this fd that the event that happened is a vsync event?

It looks like you can use the libdrm API to see vsync events. See this blog entry, and in particular this example code. A comment from the code explains how it works:
/* (...)
* The DRM_MODE_PAGE_FLIP_EVENT flag tells drmModePageFlip() to send us a
* page-flip event on the DRM-fd when the page-flip happened. The last argument
* is a data-pointer that is returned with this event.
* (...)
*/
You need to set up a page-flip event handler to be notified when a vsync occurs, which will be called by the drmHandleEvent method (from libdrm) which you can call when there is activity on the drm file descriptor.
However, mapping all this into an X client might prove difficult or impossible. It may be that you can open the drm device yourself and just listen for vsync events (without attempting to set the mode etc), but this might also prove impossible. Pertinent code:
drmEventContext ev;
memset(&ev, 0, sizeof(ev));
ev.version = DRM_EVENT_CONTEXT_VERSION;
ev.page_flip_handler = modeset_page_flip_event;
// When file descriptor input is available:
drmHandleEvent(fd, &ev);
// If above works, "modeset_page_flip_event" will be called on vertical refresh.
The problem is that a page flip event only seems to be generated if you have actually issued a page flip (buffer swap) request. Presumably it would be the X server that issued such requests, but it doesn't even necessarily flag that it wants to be notified when the vsync actually occurs (i.e. uses the DRM_MODE_PAGE_FLIP_EVENT flag).
There's also the difficulty of opening the correct dri device (/dev/dri/card0 or /dev/dri/card1 or ...?)
Given the difficulty/unreliability/general unworkability of all the above, the easiest solution is probably to:
Use a separate thread to wait for vsync using standard GL calls. According to this page on the OpenGL wiki you should use glXSwapIntervalEXT(1) to enable vsync, then glXSwapBuffers and glFinish to ensure that the vertical retrace actually occurs.
Then, notify the main thread that vertical retrace occurred. You can do this by sending data to a pipe or, on Linux, use an eventfd.
Update:
You may be able to use the GLX_INTEL_swap_event extension:
Accepted by the parameter of glXSelectEvent and returned in the parameter of glXGetSelectedEvent:
GLX_BUFFER_SWAP_COMPLETE_INTEL_MASK 0x04000000
Returned in the <event_type> field of a "swap complete" event:
GLX_EXCHANGE_COMPLETE_INTEL 0x8180
GLX_COPY_COMPLETE_INTEL 0x8181
GLX_FLIP_COMPLETE_INTEL 0x8182
...
A client can ask to receive swap complete GLX events on a window.
When an event is received, the caller can determine what kind of swap occurred by checking the event_type field.
This means that, if the extension is supported, you can receive swap notifications (which correspond to vertical retrace if vsync is enabled) via regular X events, which you'll see on the X connection file descriptor.

Unfortunately, from looking all over the web for extensions and interfaces to poll vsync [1], I've found the best (least-resource intensive) method that is likely to be available is the following:
Query vtrace rate (every 16.666ms for a 60hz monitors).
nanosleep some portion of that delay (say 15ms) depending on the querying accuracy and the expected variance of the vsync pulse.
clock_gettime our remaining time-quota and (optionally) do some real-time computing if we can. Typically computation should be done at step 7.
Busy-wait for vsync using one of the methods noted below. Add pause instructions.
Immediately clock_gettime to ensure accurate sleep delay on next cycle.
Do drawing or pixel-querying.
Do any other computations, use threading, watchdogs or real-time computing.
Repeat step 2.
You could dynamically increase the delay time if you trust your own code (But be too agressive and you start dropping frames).
If you really need a reliable poll and can't spinlock at all, you can do all this in another thread. The nanosleep should ensure the kernel doesn't schedule your thread for most of its lifetime.
You'll have to calibrate the delay but you can probably eyeball it. 15ms is extremely conservative and pretty much guaranteed to work.
The querying method I use results in a variance of around 16.665-16.667, so any delay below that should be enough.
It all depends on how much you trust your nanosleep implementation and your thread scheduler. Use a real-time scheduling policy if your kernel supports it (Or don't worry about any of this, take the easy way out and busy-wait like everyone else).
This is the method most screen-recorders are using.
Querying Vblank frequency
If your driver supports it, you can call glXGetVideoSyncSGI (from the GLX_SGI_video_sync extension) repeatedly and time with clock_gettime to query vblank rate (e.g. 16.666ms).
Alternatives to finding vsync rate are glXGetSwapIntervalMESA and glXQueryDrawable(dpy, drawable, GLX_SWAP_INTERVAL_EXT, &interval) (from GLX_MESA_swap_control and GLX_EXT_swap_control respectively). Benefits of GLX_SGI_video_sync are that you only need one extension (see next section).
glXGetSyncValuesOML from the GLX_OML_sync_control extension returns the same counter, but is probably less efficient because it has to fetch other irrelevant values. It might potentially return a more accurate vsync rate. It's also not available on my system, so I don't know how portable it is.
Waiting for Vsync
You can set a timer after each frame for say ~15ms after the start of the current frame (Use clock_nanosleep and non-relative time if you can), and then busy-wait the rest until the glXGetVideoSyncSGI counter changes value. Connecting the timer to a poll/event loop should then be trivial.
You could maybe do some real-time computing inside the busy loop if you're brave enough.
glXWaitForMscOML from GLX_OML_sync_control does the same thing and might have some optimizations, but it's not available on my system and I wouldn't be surprised if it is just busy-waiting like everything else. I also don't know how reliable it is. There are most certainly other methods to wait for vsync too (MESA/DRM/KMS), but they are even less portable than GLX_SGI_video_sync, and once again, they typically busy-wait.
Even if no frame counter is available, you could assume that glXSwapBuffers blocks and still do something similar (Note that on all drivers I know of, users may turn off the blocking behaviour manually. Check out GLX_EXT_swap_control to try and override this behaviour). At this point it might just be better to loop calling glXSwapBuffers in another thread, though.
If all else fails, you can busy-wait on clock_gettime instead. nanosleep used to do this, but nowadays you have to do it yourself. If you're using this method, you should probably hardcode well-known vsync values instead of relying on querying (any variance in sleep delay might result in dropped frames, that said vsync has natural variance anyway due to the asynchronous nature of the hardware so reliable vsync without driver support might be impossible).
The "compton" open-source compositor manager mentions the following methods to wait for vsync:
DRM_IOCTL_WAIT_VBLANK
Linux-specific.
Very reliable for waiting, also doesn't require OpenGL.
Properly suspends thread instead of busy-waiting.
Unfortunately, _DRM_VBLANK_SIGNAL is not implemented and _DRM_VBLANK_EVENT is broken, so a blocking wait is still the only valid use-case.
DRM actually does give accurate vsync pulses for applications that manually swap buffers, but this can hardly be called lightweight. Also, if you're swapping buffers anyway, you might as well just use glXSwapBuffers.
Recommended if you need high performance and don't care about portability.
SGI_video_sync
Cross-platform. Almost universally available. This is what I recommend.
OML_sync_control
Recommended if it's available.
EXT_swap_control
SGI_swap_control
MESA_swap_control
I would not recommend this. This is the glXSwapBuffers approach. Most drivers support this, but it's also resource-intensive and prone to break if driver settings change. Most of these implementations spinlock.
Conclusion
Unfortunately, it seems like there won't be a resource-friendly way to poll vsync anytime soon. The video driver clearly has to keep track of vsync to know when to send the next frame, so there's no technical reason why there is no easy interface to poll vsync pulse available anywhere, but unfortunately, consensus from linux, nvidia and khronos seems to be "nobody needs this".
Typically vtrace pulses are very important for pixel querying (case point: literally every compositor manager ever has a bug report or mail-list thread about vsync), but it also has uses for real-time computing.
[1] vsync a.k.a. vtrace a.k.a. buffer swap a.k.a. frame counter increment a.k.a. frame round-trip

WinAPI - Synchronizing SwapBuffers

Is it possible to synchronize SwapBuffers in many threads? When I try to turn on vertical synchronization (wglSwapIntervalEXT), it stops all threads, until it doesn't do tick (ex. when I open 3 windows, every window has approximately 20 frames [60/3] )
Every window has separate thread, and of course every thread has his own SwapBuffer function.

Swap Control is per-window in WGL (when you set it, it is applied to the window that the current render context is tied to). The association between render context and window is tied to the device context (see wglMakeCurrent (...)). You probably only need VSYNC for 1 of your 3 windows if you are reliably hitting < ~5.6 ms frame time in each.
What you should consider here is having one of your contexts set the swap interval to 1 and the remaining 2 use 0. The context that is syncrhonized to VBLANK (swap interval = 1) will lead the other two threads. That is to say, have the other two threads call glFlush (...) and then busy wait until the first thread stops blocking for VSYNC before calling SwapBuffers (...). The reason for the glFlush (...) is so that the other two threads accomplish some useful rendering tasks while you wait for the first (synchronized) swap to finish.
It sounds funny - almost like a recipe for tearing - but given the nature of Windows Vista/7/8's compositing window manager, VSYNC does not actually prevent tearing anymore. The window manager itself does that by compositing asynchronously, it effectively performs triple buffering. What VSYNC will, however, allow you to do (when done correctly) is have all 3 of your windows update their contents each refresh (avoid late frames).
If you did not bother starting your series of buffer swaps at the beginning of VBLANK, you could run into a situation where the compositing window manager displays an old frame for 2 frames because you were swapping buffers in the middle of VBLANK. Granted, you can still run into this situation if you draw an excessively long frame, but it solves the case where a short frame was unfortunately swapped too close to the vertical retrace deadline to finish in time.

Why not just add a barrier before/after calling SwapBuffers?

How to prevent linux soft lockup/unresponsiveness in C without sleep

How would be the correct way to prevent a soft lockup/unresponsiveness in a long running while loop in a C program?
(dmesg is reporting a soft lockup)
Pseudo code is like this:
while( worktodo ) {
worktodo = doWork();
}
My code is of course way more complex, and also includes a printf statement which gets executed once a second to report progress, but the problem is, the program ceases to respond to ctrl+c at this point.
Things I've tried which do work (but I want an alternative):
doing printf every loop iteration (don't know why, but the program becomes responsive again that way (???)) - wastes a lot of performance due to unneeded printf calls (each doWork() call does not take very long)
using sleep/usleep/... - also seems like a waste of (processing-)time to me, as the whole program will already be running several hours at full speed
What I'm thinking about is some kind of process_waiting_events() function or the like, and normal signals seem to be working fine as I can use kill on a different shell to stop the program.
Additional background info: I'm using GWAN and my code is running inside the main.c "maintenance script", which seems to be running in the main thread as far as I can tell.
Thank you very much.
P.S.: Yes I did check all other threads I found regarding soft lockups, but they all seem to ask about why soft lockups occur, while I know the why and want to have a way of preventing them.
P.P.S.: Optimizing the program (making it run shorter) is not really a solution, as I'm processing a 29GB bz2 file which extracts to about 400GB xml, at the speed of about 10-40MB per second on a single thread, so even at max speed I would be bound by I/O and still have it running for several hours.

While the posed answer using threads might possibly be an option, it would in reality just shift the problem to a different thread. My solution after all was using
sleep(0)
Also tested sched_yield / pthread_yield, both of which didn't really help. Unfortunately I've been unable to find a good resource which documents sleep(0) in linux, but for windows the documentation states that using a value of 0 lets the thread yield it's remaining part of the current cpu slice.
It turns out that sleep(0) is most probably relying on what is called timer slack in linux - an article about this can be found here: http://lwn.net/Articles/463357/
Another possibility is using nanosleep(&(struct timespec){0}, NULL) which seems to not necessarily rely on timer slack - linux man pages for nanosleep state that if the requested interval is below clock granularity, it will be rounded up to clock granularity, which on linux depends on CLOCK_MONOTONIC according to the man pages. Thus, a value of 0 nanoseconds is perfectly valid and should always work, as clock granularity can never be 0.
Hope this helps someone else as well ;)

Your scenario is not really a soft lock up, it is a process is busy doing something.
How about this pseudo code:
void workerThread()
{
while(workToDo)
{
if(threadSignalled)
break;
workToDo = DoWork()
}
}
void sighandler()
{
signal worker thread to finish
waitForWorkerThreadFinished;
}
void main()
{
InstallSignalHandler;
CreateSemaphore
StartThread;
waitForWorkerThreadFinished;
}

Clearly a timing issue. Using a signalling mechanism should remove the problem.
The use of printf solves the problem because printf accesses the console which is an expensive and time consuming process which in your case gives enough time for the worker to complete its work.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight