Calling a C function periodically on OSX

Calling a C function periodically on OSX - c

I have a function which calculates a BPM for a track from incoming data packets from a CDJ. Lets say the BPM was 124.45 beats per minute, how would I go about calling a function every 0.482 seconds (i.e. once per beat)? Would it be possible to set up another thread and set a timer?

Maybe have a look at high precision timers, here for which Apple claim 500 micrososecond accuracy which is 0.1% of your 500 (ish) millisecond requirement. You can minimise skew by reading the time at the start of your processing and calculating an offset to the next beat. Also, if you find you are often getting scheduled late, and missing beats, you can sleep for, say, 95% of the time to your next beat so the CPU can schedule something else, and then busy wait for the last few percent so you don't hog the CPU.

Related

Is it acceptable to measure time elapsed by iteratively blocking a thread for a fixed period and then multiply said period by the loop count?

I work for a company that produces automatic machines, and I help maintain their software that controls the machines. The software runs on a real-time operating system, and consists of multiple threads running concurrently. The code bases are legacy, and have substantial technical debts. Among all the issues that the code bases exhibit, one stands out as being rather bizarre to me; most of the timing algorithms that involve the computation of time elapsed to realize common timed features such as timeouts, delays, recording time spent in a particular state, and etc., basically take the following form:
unsigned int shouldContinue = 1;
unsigned int blockDuration = 1; // Let's say 1 millisecond.
unsigned int loopCount = 0;
unsigned int elapsedTime = 0;
while (shouldContinue)
{
.
. // a bunch of statements, selections and function calls
.
blockingSystemCall(blockDuration);
.
. // a bunch of statements, selections and function calls
.
loopCount++;
elapsedTime = loopCount * blockDuration;
}
The blockingSystemCall function can be any operating system's API that suspends the current thread for the specified blockDuration. The elapsedTime variable is subsequently computed by basically multiplying loopCount by blockDuration or by any equivalent algorithm.
To me, this kind of timing algorithm is wrong, and is not acceptable under most circumstances. All the instructions in the loop, including the condition of the loop, are executed sequentially, and each instruction requires measurable CPU time to execute. Therefore, the actual time elapsed is strictly greater than the value of elapsedTime in any given instance after the loop starts. Consequently, suppose the CPU time required to execute all the statements in the loop, denoted by d, is constant. Then, elapsedTime lags behind the actual time elapsed by loopCount • d for any loopCount > 0; that is, the deviation grows according to an arithmetic progression. This sets the lower bound of the deviation because, in reality, there will be additional delays caused by thread scheduling and time slicing, depending on other factors.
In fact, not too long ago, while testing a new data-driven predictive maintenance feature which relies on the operation time of a machine, we discovered that the operation time reported by the software lagged behind that of a standard reference clock by a whopping three hours after the machine was in continuous operation for just over two days. It was through this test that I discovered the algorithm outlined above, which I swiftly determined to be the root cause.
Coming from a background where I used to implement timing algorithms on bare-metal systems using timer interrupts, which allows the CPU to carry on with the execution of the business logic while the timer process runs in parallel, it was shocking for me to have discovered that the algorithm outlined in the introduction is used in the industry to compute elapsed time, even more so when a typical operating system already encapsulates the timer functions in the form of various easy-to-use public APIs, liberating the programmer from the hassle of configuring a timer via hardware registers, raising events via interrupt service routines, etc.
The kind of timing algorithm as illustrated in the skeleton code above is found in at least two code bases independently developed by two distinct software engineering teams from two subsidiary companies located in two different cities, albeit within the same state. This makes me wonder whether it is how things are normally done in the industry or it is just an isolated case and is not widespread.
So, the question is, is the algorithm shown above common or acceptable in calculating elapsed time, given that the underlying operating system already provides highly optimized time-management system calls that can be used right out of the box to accurately measure elapsed time or even used as basic building blocks for creating higher-level timing facilities that provide more intuitive methods similar to, e.g., the Timer class in C#?

You're right that calculating elapsed time that way is inaccurate -- since it assumes that the blocking call will take exactly the amount of time indicated, and that everything that happens outside of the blocking system call will take no time at all, which would only be true on an infinitely-fast machine. Since actual machines are not infinitely fast, the elapsed-time calculated this way will always be somewhat less than the actual elapsed time.
As to whether that's acceptable, it's going to depend on how much timing accuracy your program needs. If it's just doing a rough estimate to make sure a function doesn't run for "too long", this might be okay. OTOH if it is trying for accuracy (and in particular accuracy over a long period of time), then this approach won't provide that.
FWIW the more common (and more accurate) way to measure elapsed time would be something like this:
const unsigned int startTime = current_clock_time();
while (shouldContinue)
{
loopCount++;
elapsedTime = current_clock_time() - startTime;
}
This has the advantage of not "drifting away" from the accurate value over time, but it does assume that you have a current_clock_time() type of function available, and that it's acceptable to call it within the loop. (If current_clock_time() is very expensive, or doesn't provide some real-time performance guarantees that the calling routine requires, that might be a reason not to do it this way)

I don't think these loops do what you think they do.
In a RTOS, the purpose of a loop like this is usually to perform a task at regular intervals.
blockingSystemCall(N) probably does not just sleep for N milliseconds like you think it does. It probably sleeps until N milliseconds after the last time your thread woke up.
More accurately, all the sleeps your thread has performed since starting are added to the thread start time to get the time at which the OS will try to wake the thread up. If your thread woke up due to an I/O event, then the last one of those times could be used instead of the thread start time. The point is that the inaccuracies in all these start times are corrected, so your thread wakes up at regular intervals and the elapsed time measurement is perfectly accurate according to the RTOS master clock.
There could also be very good reasons for measuring elapsed time by the RTOS master clock instead of a more accurate wall clock time, in addition to simplicity. This is because all of the guarantees that an RTOS provides (which is the reason you are using a RTOS in the first place) are provided in that time scale. The amount of time taken by one task can affect the amount of time you are guaranteed to have available for other tasks, as measured by this clock.
It may or may not be a problem that your RTOS master clock runs slow by 3 hours every 2 days...

how to find scheduling delay for a process in c

I have a C program on linux. During execution of my program, I want to make some decisions if the process is facing scheduling delay above a threshold.
Any suggestion on how I find this statistic ?
P.S.: By scheduling delay I mean time spent by the process waiting to be scheduled i.e. time spent in the scheduler queue.

The time() function allows you to measure the "wall clock" time: http://linux.die.net/man/2/time
On the other side, the clock() function allows you to measure the CPU time used by your process: http://linux.die.net/man/3/clock
By subtracting the two, you can get an approximation of what you asked for.
PS: for more accurate measurements (time has a second resolution) you can use clock_gettime: http://linux.die.net/man/3/clock_gettime

You could set a timer to go off, say every minute, or whatever interval seems appropriate and then gather stats with getrusage() and based on those results (the difference between successive values), you could make your decision then

Scheduling events at microsecond granularity in POSIX

I'm trying to determine the granularity I can accurately schedule tasks to occur in C/C++. At the moment I can reliably schedule tasks to occur every 5 microseconds, but I'm trying to see if I can lower this further.
Any advice on how to achieve this / if it is possible would be greatly appreciated.
Since I know timer granularity can often be OS dependent: I am currently running on Linux, but would use Windows if the timing granularity is better (although I don't believe it is, based on what I've found for the QueryPerformanceCounter)
I execute all measurements on bare-metal (no VM). /proc/timer_info confirms nanosecond timer resolution for my CPU (but I know that doesn't translate to nanosecond alarm resolution)
Current
My current code can be found as a Gist here
At the moment, I'm able to execute a request every 5 microseconds (5000 nanoseconds) with less then 1% late arrivals. When late arrivals do occur, they are typically only one cycle (5000 nanoseconds) behind.
I'm doing 3 things at the moment
Setting the process to real-time priority (some pointed out by #Spudd86 here)
struct sched_param schedparm;
memset(&schedparm, 0, sizeof(schedparm));
schedparm.sched_priority = 99; // highest rt priority
sched_setscheduler(0, SCHED_FIFO, &schedparm);
Minimizing the timer slack
prctl(PR_SET_TIMERSLACK, 1);
Using timerfds (part of the 2.6 Linux kernel)
int timerfd = timerfd_create(CLOCK_MONOTONIC,0);
struct itimerspec timspec;
bzero(&timspec, sizeof(timspec));
timspec.it_interval.tv_sec = 0;
timspec.it_interval.tv_nsec = nanosecondInterval;
timspec.it_value.tv_sec = 0;
timspec.it_value.tv_nsec = 1;
timerfd_settime(timerfd, 0, &timspec, 0);
Possible improvements
Dedicate a processor to this process?
Use a nonblocking timerfd so that I can create a tight loop, instead of blocking (tight loop will waste more CPU, but may also be quicker to respond to an alarm)
Using an external embedded device for triggering (can't imagine why this would be better)
Why
I'm currently working on creating a workload generator for a benchmarking engine. The workload generator simulates an arrival rate (X requests / second, etc.) using a Poisson process. From the Poisson process, I can determine the relative times at which requests must be made from the benchmarking engine.
So for instance, at 10 requests a second, we may have requests made at:
t = 0.02, 0.04, 0.05, 0.056, 0.09 seconds
These requests need to be scheduled in advance and then executed. As the number of requests per second increases, the granularity required for scheduling these requests increases (thousands of requests per second requires sub-millisecond accuracy). As a result, I'm trying to figure out how to scale this system further.

You're very close to the limits of what vanilla Linux will offer you, and it's way past what it can guarantee. Adding the real-time patches to your kernel and tuning for full pre-emption will help give you better guarantees under load. I would also remove any dynamic memory allocation from your time critical code, malloc and friends can (and will) stall for a not-inconsequential (in a real-time sense) period of time if it has to reclaim the memory from the i/o cache. I would also be considering removing swap from that machine to help guarantee performance. Dedicating a processor to your task will help to prevent context switch times but, again, it's no guarantee.
I would also suggest that you be careful with that level of sched_priority, you're above various important bits of Linux there, which can lead to very strange effects.

What you gain from building a realtime kernel is more reliable guarantees (ie lower maximum latency) of the time between an IO/timer event handled by the kernel, and control being passed to your app in response. This comes at the price of lower throughput, and you might notice an increase in your best-case latency times.
However, the only reason for using OS timers to schedule events with high-precision is if you're afraid of burning CPU cycles in a loop while you wait for your next due event. OS timers (especially in MS Windows) are not reliable for high granularity timing events, and are very dependant on the sort of timing/HPET hardware available in your system.
When I require highly accurate event scheduling, I use a hybrid method. First, I measure the worst case latency - that is, the biggest difference between the time I requested to sleep, and the actual clock time after sleeping. Let's call this difference "D". (You can actually do this on-the-fly during normal running, by tracking "D" every time you sleep, with something like "D = (D*7 + lastD) / 8" to produce a temporal average).
Then never request to sleep beyond "N - D*2", where "N" is the time of the next event. When within "D*2" time of the next event, enter a spin loop and wait for "N" to occur.
This eats a lot more CPU cycles, but depending on the accuracy you require, you might be able to get away with a "sched_yield()" in your spin loop, which is more kind to your system.

How to generate requests at a "requests/sec" target rate?

Say I have a target of x requests/sec that I want to generate continuously. My goal is to start these requests at roughly the same interval, rather than just generating x requests and then waiting until 1 second has elapsed and repeating the whole thing over and over again. I'm not making any assumptions about these requests, some might take much longer than others, which is why my scheduler thread will not perform the requests (or wait for them to finish), but hand them over to a sufficiently sized Thread Pool.
Now if x is in the range of hundreds or less, I might get by with .net's Timers or Thread.Sleep and checking actually elapsed time using Stopwatch.
But if I want to go into the thousands or tens of thousands, I could try going high-resolution timer to maintain my roughly the same interval approach. But this would (in most programming environments on a general OS) imply some amount of hand-coding with spin waiting and so forth, and I'm not sure it's worthwhile to take this route.
Extending the initial approach, I could instead use a Timer to sleep and do y requests on each Timer event, monitor the actual requests per second achieved doing this and fine-tune y at runtime. The effect is somewhere in between "put all x requests and wait until 1 second elapsed since start", which I'm trying not to do, and "wait more or less exactly 1/x seconds before starting the next request".
The latter seems like a good compromise, but is there anything that's easier while still spreading the requests somewhat evenly over time? This must have been implemented hundreds of times by different people, but I can't find good references on the issue.
So what's the easiest way to implement this?

One way to do it:
First find (good luck on Windows) or implement a usleep or nanosleep function. As a first step, this could be (on .net) a simple Thread.SpinWait() / Stopwatch.Elapsed > x combo. If you want to get fancier, do Thread.Sleep() if the time span is large enough and only do the fine-tuning using Thread.SpinWait().
That done, just take the inverse of the rate and you have the time interval you need to sleep between each event. Your basic loop, which you do on one dedicated thread, then goes
Fire event
Sleep(sleepTime)
Then every, say, 250ms (or more for faster rates), check the actually achieved rate and adjust the sleepTime interval, perhaps with some smoothing to dampen wild temporary swings, like this
newRate = max(1, sleepTime / targetRate * actualRate)
sleepTime = 0.3 * sleepTime + 0.7 * newRate
This adjusts to what is actually going on in your program and on your system, and makes up for the time spent to invoke the event callback, and whatever the callback is doing on that same thread etc. Without this, you will probably not be able to get high accuracy.
Needless to say, if your rate is so high that you cannot use Sleep but always have to spin, one core will be spinning continuously. The good news: We get ever more cores on our machines, so one core matters less and less :) More serious though, as you mentioned in the comment, if your program does actual work, your event generator will have less time (and need) to waste cycles.
Check out https://github.com/EugenDueck/EventCannon for a proof of concept implementation in .net. It's implemented roughly as described above and done as a library, so you can embed that in your program if you use .net.

1ms resolution timer under linux recommended way

I need a timer tick with 1ms resolution under linux. It is used to increment a timer value that in turn is used to see if various Events should be triggered. The POSIX timerfd_create is not an option because of the glibc requirement. I tried timer_create and timer_settimer, but the best I get from them is a 10ms resolution, smaller values seem to default to 10ms resolution. Getittimer and setitimer have a 10 ms resolution according to the manpage.
The only way to do this timer I can currently think of is to use clock_gettime with CLOCK_MONOTONIC in my main loop an test if a ms has passed, and if so to increase the counter (and then check if the various Events should fire).
Is there a better way to do this than to constantly query in the main loop? What is the recommended solution to this?
The language I am using is plain old c
Update
I am using a 2.6.26 Kernel. I know you can have it interrupt at 1kHz, and the POSIX timer_* functions then can be programmed to up to 1ms but that seems not to be reliable and I don't want to use that, because it may need a new kernel on some Systems. Some stock Kernel seem to still have the 100Hz configured. And I would need to detect that. The application may be run on something else than my System :)
I can not sleep for 1ms because there may be network events I have to react to.
How I resolved it
Since it is not that important I simply declared that the global timer has a 100ms resolution. All events using their own timer have to set at least 100ms for timer expiration. I was more or less wondering if there would be a better way, hence the question.
Why I accepted the answer
I think the answer from freespace best described why it is not really possible without a realtime Linux System.

Polling in the main loop isn't an answer either - your process might not get much CPU time, so more than 10ms will elapse before your code gets to run, rendering it moot.
10ms is about the standard timer resolution for most non-realtime operating systems (RTOS). But it is moot in a non-RTOS - the behaviour of the scheduler and dispatcher is going to greatly influence how quickly you can respond to a timer expiring. For example even suppose you had a sub 10ms resolution timer, you can't respond to the timer expiring if your code isn't running. Since you can't predict when your code is going to run, you can't respond to timer expiration accurately.
There is of course realtime linux kernels, see http://www.linuxdevices.com/articles/AT8073314981.html for a list. A RTOS offers facilities whereby you can get soft or hard guarantees about when your code is going to run. This is about the only way to reliably and accurately respond to timers expiring etc.

To get 1ms resolution timers do what libevent does.
Organize your timers into a min-heap, that is, the top of the heap is the timer with the earliest expiry (absolute) time (a rb-tree would also work but with more overhead). Before calling select() or epoll() in your main event loop calculate the delta in milliseconds between the expiry time of the earliest timer and now. Use this delta as the timeout to select(). select() and epoll() timeouts have 1ms resolution.
I've got a timer resolution test that uses the mechanism explained above (but not libevent). The test measures the difference between the desired timer expiry time and its actual expiry of 1ms, 5ms and 10ms timers:
1000 deviation samples of 1msec timer: min= -246115nsec max= 1143471nsec median= -70775nsec avg= 901nsec stddev= 45570nsec
1000 deviation samples of 5msec timer: min= -265280nsec max= 256260nsec median= -252363nsec avg= -195nsec stddev= 30933nsec
1000 deviation samples of 10msec timer: min= -273119nsec max= 274045nsec median= 103471nsec avg= -179nsec stddev= 31228nsec
1000 deviation samples of 1msec timer: min= -144930nsec max= 1052379nsec median= -109322nsec avg= 1000nsec stddev= 43545nsec
1000 deviation samples of 5msec timer: min= -1229446nsec max= 1230399nsec median= 1222761nsec avg= 724nsec stddev= 254466nsec
1000 deviation samples of 10msec timer: min= -1227580nsec max= 1227734nsec median= 47328nsec avg= 745nsec stddev= 173834nsec
1000 deviation samples of 1msec timer: min= -222672nsec max= 228907nsec median= 63635nsec avg= 22nsec stddev= 29410nsec
1000 deviation samples of 5msec timer: min= -1302808nsec max= 1270006nsec median= 1251949nsec avg= -222nsec stddev= 345944nsec
1000 deviation samples of 10msec timer: min= -1297724nsec max= 1298269nsec median= 1254351nsec avg= -225nsec stddev= 374717nsec
The test ran as a real-time process on Fedora 13 kernel 2.6.34, the best achieved precision of 1ms timer was avg=22nsec stddev=29410nsec.

I'm not sure it's the best solution, but you might consider writing a small kernel module that uses the kernel high-res timers to do timing. Basically, you'd create a device file for which reads would only return on 1ms intervals.
An example of this type of approach is used in the Asterisk PBX, via the ztdummy module. If you google for ztdummy you can find the code that does this.

I think you'll have trouble achieving 1 ms precision with standard Linux even with constant querying in the main loop, because the kernel does not ensure your application will get CPU all the time. For example, you can be put to sleep for dozens of milliseconds because of preemptive multitasking and there's little you can do about it.
You might want to look into Real-Time Linux.

If you are targeting x86 platform you should check HPET timers. This is hardware timer with large precision. It must be supported by your motherbord (right now all of them support it) and your kernel should contains driver for it as well. I have used it few times without any problems and was able to achieve much better resolution than 1ms.
Here is some documentation and examples:
http://www.kernel.org/doc/Documentation/timers/hpet.txt
http://www.kernel.org/doc/Documentation/timers/hpet_example.c
http://fpmurphy.blogspot.com/2009/07/linux-hpet-support.html

I seem to recall getting ok results with gettimeofday/usleep based polling -- I wasn't needing 1000 timers a second or anything, but I was needing good accuracy with the timing for ticks I did need -- my app was a MIDI drum machine controller, and I seem to remember getting sub-millisecond accuracy, which you need for a drum machine if you don't want it to sound like a very bad drummer (esp. counting MIDI's built-in latencies) -- iirc (it was 2005 so my memory is a bit fuzzy) I was getting within 200 microseconds of target times with usleep.
However, I was not running much else on the system. If you have a controlled environment you might be able to get away with a solution like that. If there's more going on the system (watch cron firing up updatedb, etc.) then things may fall apart.

Are you running on a Linux 2.4 kernel?
From VMware KB article #1420 (http://kb.vmware.com/kb/1420).
Linux guest operating systems keep
time by counting timer interrupts.
Unpatched 2.4 and earlier kernels
program the virtual system timer to
request clock interrupts at 100Hz (100
interrupts per second). 2.6 kernels,
on the other hand, request interrupts
at 1000Hz - ten times as often. Some
2.4 kernels modified by distribution vendors to contain 2.6 features also
request 1000Hz interrupts, or in some
cases, interrupts at other rates, such
as 512Hz.

There is ktimer patch for linux kernel:
http://lwn.net/Articles/167897/
http://www.kernel.org/pub/linux/kernel/projects/rt/
HTH

First, get the kernel source and compile it with an adjusted HZ parameter.
If HZ=1000, timer interrupts 1000 times per seconds. It is ok to use HZ=1000 for an i386 machine.
On an embedded machine, HZ might be limited to 100 or 200.
For good operation, PREEMPT_KERNEL option should be on. There are
kernels which does not support this option properly. You can check them out by
searching.
Recent kernels, i.e. 2.6.35.10, supports NO_HZ options, which turns
on dynamic ticks. This means that there will be no timer ticks when in idle,
but a timer tick will be generated at the specified moment.
There is a RT patch to the kernel, but hardware support is very limited.
Generally RTAI is an all killer solution to your problem, but its
hardware support is very limited. However, good CNC controllers, like
emc2, use RTAI for their clocking, maybe 5000 Hz, but it can be
hard work to install it.
If you can, you could add hardware to generate pulses. That would make
a system which can be adapted to any OS version.

You don't need an RTOS for a simple real time application. All modern processors have General Purpose timers. Get a datasheet for whatever target CPU you are working on. Look in the kernel source, under the arch directory you will find processor specific source how to handle these timers.
There are two approaches you can take with this:
1) Your application is ONLY running your state machine, and nothing else. Linux is simply your "boot loader." Create a kernel object which installs a character device. On insertion into the kernel, set up your GP Timer to run continuously. You know the frequency it's operating at. Now, in the kernel, explicitly disable your watchdog. Now disable interrupts (hardware AND software) On a single-cpu Linux kernel, calling spin_lock() will accomplish this (never let go of it.) The CPU is YOURS. Busy loop, checking the value of the GPT until the required # of ticks have passed, when they have, set a value for the next timeout and enter your processing loop. Just make sure that the burst time for your code is under 1ms
2) A 2nd option. This assumes you are running a preemptive Linux kernel. Set up an unused a GPT along side your running OS. Now, set up an interrupt to fire some configurable margin BEFORE your 1ms timeout happens (say 50-75 uSec.) When the interrupt fires, you will immediately disable interrupts and spin waiting for 1ms window to occur, then entering your state machine and subsequently enabling interrupts on your wait OUT. This accounts for the fact that you are cooperating with OTHER things in the kernel which disable interrupts. This ASSUMES that there is no other kernel activity which locks out interrupts for a long time (more than 100us.) Now, you can MEASURE the accuracy of your firing event and make the window larger until it meets your need.
If instead you are trying to learn how RTOS's work...or if you are trying to solve a control problem with more than one real-time responsibility...then use an RTOS.

Can you at least use nanosleep in your loop to sleep for 1ms? Or is that a glibc thing?
Update: Never mind, I see from the man page "it can take up to 10 ms longer than specified until the process becomes runnable again"

What about using "/dev/rtc0" (or "/dev/rtc") device and its related ioctl() interface? I think it offers an accurate timer counter. It is not possible to set the rate just to 1 ms, but to a close value or 1/1024sec (1024Hz), or to a higher frequency, like 8192Hz.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight