thread hang detector triggered by process stop or App Nap - c

I have written a thread hang detector which prints some debugging output (backtraces, etc.) when some thread hangs unexpectedly. Each thread which wants to be watched registers itself in the hang-detector system, specifies some timeout (in my case 5 secs) and calls some IAmAlife() function frequently (in my case about every 1-10ms).
It works great. However, in some cases, I get false positives. E.g., when I SIGSTOP the process and resume it later, it gets triggered (for example, when attaching with a debugger like GDB/LLDB). And also rarely, when the process is just not doing much, just idling, I guess MacOSX' App Nap kicks in and it also triggers the hang detector.
How could I detect such system hangs? Looking at the processor time (clock()) doesn't help that much because if my app is in a deadlock, it will probably also not consume much (if any) processor time.

So, the implementation might be a little complex, but the basic idea is simple: track the wall clock time and use that to figure a "grace period" to be added on to the amount of time that each thread can be late phoning home.
Use something like gettimeofday() (http://linux.die.net/man/2/gettimeofday) to record the time when your watcher thread first starts. Each time it re-awakes, call gettimeofday() again, take the difference, and then subtract from that the amount of "processor time" elapsed. That gives you the rough "grace period" that you should grant, since it's time that your process was not running.
The only minor complexification arises because the grace period needs to be maintained separately for each of the threads you are watching. Since you clearly know enough to write threads, I'll assume that part is well within your ability :-).

Related

why there is many schedule() call in different place?

I am tracing Linux 0.11
https://mirrors.edge.kernel.org/pub/linux/kernel/Historic/old-versions/
I see there are many schedule() call in different place, not just the one inside do_timer().
Few questions here:
do_timer() (#sched.c) will be called every time the timer timeout? This timer is based on an x86 interrupt call?
Since there are many schedule() calls outside of do_timer(), can I say that is kind of preempting? or what's the purpose?
Any operation that blocks calls schedule() to yield control.
Some tasks' state has changed, it needs to be updated in schedule().
Some tasks' are working and still a lot of work, schedule() for balance.
Since there are many schedule() calls outside of do_timer(), can I say that is kind of preempting? or what's the purpose?
For a real OS; most task switches occur because a task blocks waiting for something (user input, network packet, disk IO, ..) or a task unblocks because something it was waiting for happened (and the unblocked task has higher priority and preempts the currently running lower priority task).
The whole "task switch caused by timer IRQ" thing is mostly just a fallback to guard against malicious CPU hogs (denial of service attacks); and for normal software under normal conditions you could disable it (delete the schedule() from the timer IRQ handler) and nobody would notice or care. Note: Some people will say it's also for "non-malicious" CPU bound tasks, but CPU bound tasks are relatively rare, and (ignoring the fact that the Linux scheduler has never been good for task priorities) for CPU bound tasks it's better to rely on an effective system of task priorities (e.g. give the CPU bound tasks a low priority so that almost everything will preempt them).
Also note that various courses on OS theory start with "so simple it never actually happens in practice" concepts, which is almost always a pure round-robin scheduler with tasks that never block (often with "Hey, we can accurately predict the future and know exactly how long each task will run for" nonsense), which is mostly fine as a first step (in a "learn to walk before you run" way) but sucks big salty dog balls if it's not followed by more realistic and more complex concepts (better scheduling algorithms, task priorities, multiple simultaneous scheduling algorithms/"scheduler policies", multi-CPU, interactive/latency sensitive tasks, ..) because it leaves the student/victim with little more than misinformation (e.g. the ever re-occurring "all tasks switches are caused by timer IRQ" misconception).
do_timer() (#sched.c) will be called every time the timer timeout? This timer is based on an x86 interrupt call?
I'm guessing that the timer was the raw PIT chip's IRQ (given that Linux version 0.11 was "absolute beginner developer with no intention of making it portable" historical memorabilia from before thousands of volunteers fixed half of the worst parts).
Also don't forget that the scheduler uses time for two different things - the "current task has used too much CPU time" thing that almost never matters, and figuring out when tasks that are blocked/sleeping (e.g. because they called sleep()) should unblock/wake up. The do_timer() might be for either of these things and might be for both (I don't know without looking at it).

Determining cause of delay/pause - kernel scheduler etc

System is an embedded Linux/Busybox core on a small embedded board with a web server (Boa) running.
We are seeing some high latency in responses from the web server - sometimes >500ms for no good reason, so I've been digging...
On liberally scattering debug prints throughout the code it seems to come down to the entire process just... stopping for a bit, in a way which I can only assume must be the process/thread being interrupted by another process.
Using print statements and clock_gettime() to calculate time taken to process a request, I can see the code reach the bottom of a while() loop (parsing input), print something like "Time so far: 5ms" and then the next line at the top of the loop will print "Time so far: 350ms" - and all that the code does between the bottom of the loop and the 1st print back at the top is a basic check along the lines of while(position < end), it has nothing complicated that could hold it up.
There's no IO blocking, the data it's parsing has all arrived already, and it's not making any external calls or wandering off into complex functions.
I then looked into whether the kernel scheduler (CFS in our case) might be holding things up, adding calls to clock() (processor time rather than wall-clock) and again calculating time differences Vs processor time used I can see that the wall-clock time delay may run beyond 300ms from one loop to the next, but the reported processor time taken (which seems to have a ~10ms resolution) is more like 50ms.
So, that suggests the task scheduler is holding the process up for hundreds of milliseconds at a time. I've checked the scheduler granularity and max delay and it's nowhere near 100ms, scheduler latency is set at 6ms for example.
Any advice on what I can do now to try and track down the problem - identifying processes which could hog the CPU for >100ms, measuring/tracking what the scheduler is doing, etc.?
First you should try and run your program using strace to see if there are any system calls holding things up.
If that is ambiguous or does not help I would suggest you try and profile the kernel. You could try OProfile
This will create a call graph that you can analyze and see what is happening.

Using CLOCK_PROCESS_CPUTIME_ID in clock_gettime

I read http://linux.die.net/man/3/clock_gettime and http://www.guyrutenberg.com/2007/09/22/profiling-code-using-clock_gettime/comment-page-1/#comment-681578
It said to use this to
clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &stop_time);
measure how long it take for a function to run.
I tried that in my program. When I run it, it returns saying it took 15 sec. But when I compare against using a stop watch to measure it, it is 30 sec.
Can you please tell me why clock_gettime return 1/2 of the actual time it took?
Thank you.
In a multi-process environment, processes are constantly migrating from CPU(s) to 'run queue(s)'.
When performance testing an application, it is often convenient to know the amount of time a process has been running on a CPU, while excluding time that the process was waiting for a CPU resource on a 'run queue'.
In the case of this question, where CPU-time is about half of REAL-time, it is likely that other processes were actively competing for CPU time while your process was also running. It appears that your process was fairly successful in acquiring roughly half the CPU resources during its run.
Instead of using CLOCK_PROCESS_CPUTIME_ID, you might consider using CLOCK_REALTIME?
For additional details, see: Understanding the different clocks of clock_gettime()

What should C program do in idle time when running on Linux?

I've written many C programs for microcontrollers but never one that runs on an OS like linux. How does linux decide how much processing time to give my application? Is there something I need to do when I have idle time to tell the OS to go do something else and come back to me later so that other processes can get time to run as well? Or does the OS just do that automatically?
Edit: Adding More Detail
My c program has a task scheduler. Some tasks run every 100ms, some run every 50 ms and so on. In my main program loop i call ProcessTasks which checks if any tasks are ready to run, if none are ready it calls an idle function. The idle function does nothing but it's there so that I could toggle a GPIO pin and monitor idle time with an O'scope... or something if I so desired. So maybe I should call sched_yield() in this idle function???
How does linux decide how much processing time to give my application
Each scheduler makes up its own mind. Some reward you for not using up your share, some roll dices trying to predict what you'll do etc. In my opinion you can just consider it magic. After we enter the loop, the scheduler magically decides our time is up etc.
Is there something I need to do when I have idle time to tell the OS
to go do something else
You might call sched_yield. I've never called it, nor do I know of any reasons why one would want to. The manual does say it could improve performance though.
Or does the OS just do that automatically
It most certainly does. That's why they call it "preemptive" multitasking.
It depends why and how you have "idle time". Any call to a blocking I/O function, waiting on a mutex or sleeping will automatically deschedule your thread and let the OS get on with something else. Only something like a busy loop would be a problem, but that shouldn't appear in your design in any case.
Your program should really only have one central "infinite loop". If there's any chance that the loop body "runs out of work", then it would be best if you could make the loop perform one of the above system functions which would make all the niceness appear automatically. For example, if your central loop is an epoll_wait and all your I/O, timers and signals are handled by epoll, call the function with a timeout of -1 to make it sleep if there's nothing to do. (By contrast, calling it with a timeout of 0 would make it busy-loop – bad!).
The other answers IMO are going into too much detail. The simple thing to do is:
while (1){
if (iHaveWorkToDo()){
doWork();
} else {
sleep(amountOfTimeToWaitBeforeNextCheck);
}
}
Note: this is the simple solution which is useful in a single-threaded application or like your case where you dont have anything to do for a specified amount of time; just to get something decent working. The other thing about this is that sleep will call whatever yield function the os prefers, so in that sense it is better than an os specific yield call.
If you want to go for high performance, you should be waiting on events.
If you have your own events it will be something like follows:
Lock *l;
ConditionVariable *cv;
while (1){
l->acquire();
if (iHaveWorkToDo()){
doWork();
} else {
cv->wait(lock);
}
l->release();
}
In a networking type situation it will be more like:
while (1){
int result = select(fd_max+1, &currentSocketSet, NULL, NULL, NULL);
process_result();
}

How accurate is Sleep() or sleep()

I'm trying simulate a key down and key up action.
For example: 2638 millseconds.
SendMessage(hWnd, WM_KEYDOWN, keyCode, 0);
Sleep(2638);
SendMessage(hWnd, WM_KEYUP, keyCode, 0);
How would you know if it really worked?
You wouldn't with this code, since accurately measuring the time that code takes to execute is a difficult task.
To get to the question posed by your question title (you should really ask one question at a time...) the accuracy of said functions is dictated by the operating system. On Linux, the system clock granularity is 10ms, so timed process suspension via nanosleep() is only guaranteed to be accurate to 10ms, and even then it's not guaranteed to sleep for exactly the time you specify. (See below.)
On Windows, the clock granularity can be changed to accommodate power management needs (e.g. decrease the granularity to conserve battery power). See MSDN's documentation on the Sleep function.
Note that with Sleep()/nanosleep(), the OS only guarantees that the process suspension will last for at least as long as you specify. The execution of other processes can always delay resumption of your process.
Therefore, the key-up event sent by your code above will be sent at least 2.638 seconds later than the key-down event, and not a millisecond sooner. But it would be possible for the event to be sent 2.7, 2.8, or even 3 seconds later. (Or much later if a realtime process grabbed hold of the CPU and didn't relinquish control for some time.)
Sleep works in terms of the standard Windows thread scheduling. It is accurate up to about 20-50 milliseconds.
So that it's ok for user experience-dependent things. However it's absolutely inappropriate for real-time things.
Beside of this, there're much better ways to simulate keyboard/mouse events. Please see SendInput.
The sleep() function will return before the desired delay when the requested delay is shorter than the time left until the next interrupt occurs. But this only points out that you want to sleep for a shorter period of time than currently is supported by your system. It is advisable to setup the multimedia timer resource to a higher interrupt frequency to obtain better matching of the observed sleep delay with respect to the desired delay.
The the comments in the following threads:
How to get an accurate 1ms Timer Tick under WinXP
Sleep Less Than One Millisecond
The command Sleep() will ensure that thread is suspended at least the amount of time which is given as argument. Operating system does not guarantee it. For detailed discussion you can refer the below post
how is sleep implemented at OS level?

Resources