Timer\Counter in C for rate calculation? - c

Need a running (moving, rolling) average algorithm to calculate the 5-minute average bits that are passed in. All I have to work with is an accumulative value for the bits that are passed in.
For example: I start with 0 bits, 5 minutes later, I have 10 bits, so my average is 10 bits. 5 minutes later, I have 15 bits, so now my average is 7.5 bits. Another 5 minutes later, I have 30 bits, so my average now is 10.8 bits.
My question is, how can I implement a timer\counter in C++ so it would poll the bits value in exact 5 minutes intervals? Obviously I can't use delay 300 seconds. But can I make a timer in the background which would only fire an event (poll the bit value) every 5 minutes?

The Code to my Previous Answer
#define REENTRANT
//The above is neccessary when using threads. This must be defined before any includes are made
//Often times, gcc -DREENTRANT is used instead of this, however, it produces the same effect
#include <pthread.h>
char running=1;
void* timer(void* dump){
unsigned char i=0;
while(running){
for(i=0;i<300 && running;i++){
sleep(1);//so we don't need to wait the 300 seconds when we want to quit
}
if(running)
callback();//note that this is called from a different thread from main()
}
pthread_exit(NULL);
}
int main(){
pthread_t thread;
pthread_create(&thread,NULL,timer,NULL);
//do some stuff
running=0;
pthread_join(thread,NULL);//we told it to stop running, however, we might need to wait literally a second
pthread_exit(NULL);
return 0;
}

The "dumb" solution is to use POSIX threads. You can make a thread and then put it in an infinite loop with sleep() in it.

Related

Understanding the calculation of the time (jiffies)

I want to sleep in the kernel for a specific amount of time, and I use time_before and jiffies to calculate the amount of time I should sleep, however I don't understand how the calculation actually works. I know that HZ is 250 and jiffies is a huge dynamic value. I know what them both are and what they are used for.
I calculate the time with jiffies + (10 * HZ).
static unsigned long j1;
static int __init sys_module_init(void)
{
j1 = jiffies + (10 * HZ);
while (time_before(jiffies, j1))
schedule();
printk("Hello World - %d - %ld\n", HZ, jiffies); // Hello World - 250 - 4296485594 (dynamic)
return 0;
}
How does the calculation work and how many seconds will I sleep? I want to know that because in the future I'll probably want to sleep for a specific time.
HZ represents the amount of ticks in a second, and multiplying that by 10 gives the amount of ticks in 10 seconds. So the calculation jiffies + 10 * HZ yields the expected value of jiffies 10 seconds from now.
However, calling schedule() in a loop until that value is hit is not the recommended way to go. If you want to sleep in the kernel, you don't need to reinvent the wheel, there already is a set of APIs just for this purpose, documented here, which will make your life a lot easier. The simplest way to sleep in your specific case would be to use msleep(), passing the number of milliseconds you want to sleep for:
#include <linux/delay.h>
static int __init sys_module_init(void)
{
msleep(10 * 1000);
return 0;
}

Variable performance of busy wait loop?

I am evaluating the performance of a busy wait loop for firing events at consistent intervals. I have noticed some odd behavior using the following code:
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <time.h>
int timespec_subtract(struct timespec *, struct timespec, struct timespec);
int main(int argc, char *argv[]) {
int iterations = atoi(argv[1])+1;
struct timespec t[2], diff;
for (int i = 0; i < iterations; i++) {
clock_gettime(CLOCK_MONOTONIC, &t[0]);
static volatile int i;
for (i = 0; i < 200000; i++)
;
clock_gettime(CLOCK_MONOTONIC, &t[1]);
timespec_subtract(&diff, t[1], t[0]);
printf("%ld\n", diff.tv_sec * 1000000000 + diff.tv_nsec);
}
}
On the test machine (dual 14-core E5-2683 v3 # 2.00Ghz, 256GB DDR4), 200k iterations of the for loop is approximately 1ms. Or maybe not:
1030854
1060237
1012797
1011479
1025307
1017299
1011001
1038725
1017361
... (about 700 lines later)
638466
638546
638446
640422
638468
638457
638468
638398
638493
640242
... (about 200 lines later)
606460
607013
606449
608813
606542
606484
606990
606436
606491
606466
... (about 3000 lines later)
404367
404307
404309
404306
404270
404370
404280
404395
404342
406005
When the times shift down the third time, they stay mostly consistent (within about 2 or 3 microseconds), except for occasionally jumping up to about 450us for a few hundred iterations. This behavior is repeatable on similar machines and over many runs.
I understand that busy loops can be optimized out by the compiler, but I don't think that's the issue here. I don't think cache should be affecting it, because no invalidation should be taking place, and wouldn't explain the sudden optimization. I also tried using a register int for the loop counter, with no noticeable effect.
Any thoughts on what is going on, and how to make this (more) consistent?
EDIT: For information, running this program with usleep, nanosleep, or the shown busy wait for 10k iterations all show ~20000 involuntary context switches with time -v.
I'd make 2 points
- Due to context swtiching sleep/usleep may sleep for more time than expected
- Moreover if there is some higher priority task like interrupts, there may come a situation when sleep may not be executed at all.
Thus if you want exact delay in your application you can use gettimeofday to calculate the time gap which can be subtracted from the delay in sleep/usleep call
One big issue with busy waiting is that, besides using up CPU resources, the amount of time you wait will be highly dependent on the CPU block speed. So the same loop can run for wildly different times on different machines.
The problem with any method of sleeping is that due to OS scheduling you may end up sleeping for longer than intended. The man pages for nanosleep says that it will use the rem argument to tell you the remaining time in case you received a signal, but it says nothing about waiting too long.
You need to grab the timestamp after each call to usleep so you know how long you actually slept for. If you slept too short, you add the deficit. If you slept too long, you subtract the overage.
Here's an example of how I did this in UFTP, a multicast file transfer application, in order to send packets at a consistent speed:
int64_t diff_usec(struct timeval t2, struct timeval t1)
{
return (t2.tv_usec - t1.tv_usec) +
(int64_t)1000000 * (t2.tv_sec - t1.tv_sec);
}
...
int32_t packet_wait = 10000;
int64_t overage = 0, tdiff;
struct timeval current_sent, last_sent;
gettimeofday(&last_sent, NULL);
while(...) {
...
if (packet_wait > overage) {
usleep(packet_wait - (int32_t)overage);
}
gettimeofday(&current_sent, NULL);
tdiff = diff_usec(current_sent, last_sent);
overage += tdiff - packet_wait;
last_sent = current_sent;
...
}

Using multithreads to calculate data but it does't reduce the time

My CPU has four cores,MAC os. I use 4 threads to calculate an array. But the time of calculating does't being reduced. If I don't use multithread, the time of calculating is about 52 seconds. But even I use 4 multithreads, or 2 threads, the time doesn't change.
(I know why this happen now. The problem is that I use clock() to calculate the time. It is wrong when it is used in multithread program because this function will multiple the real time based on the num of threads. When I use time() to calculate the time, the result is correct.)
The output of using 2 threads:
id 1 use time = 43 sec to finish
id 0 use time = 51 sec to finish
time for round 1 = 51 sec
id 1 use time = 44 sec to finish
id 0 use time = 52 sec to finish
time for round 2 = 52 sec
id 1 and id 0 is thread 1 and thread 0. time for round is the time of finishing two threads. If I don't use multithread, time for round is also about 52 seconds.
This is the part of calling 4 threads:
for(i=1;i<=round;i++)
{
time_round_start=clock();
for(j=0;j<THREAD_NUM;j++)
{
cal_arg[j].roundth=i;
pthread_create(&thread_t_id[j], NULL, Multi_Calculate, &cal_arg[j]);
}
for(j=0;j<THREAD_NUM;j++)
{
pthread_join(thread_t_id[j], NULL);
}
time_round_end=clock();
int round_time=(int)((time_round_end-time_round_start)/CLOCKS_PER_SEC);
printf("time for round %d = %d sec\n",i,round_time);
}
This is the code inside the thread function:
void *Multi_Calculate(void *arg)
{
struct multi_cal_data cal=*((struct multi_cal_data *)arg);
int p_id=cal.thread_id;
int i=0;
int root_level=0;
int leaf_addr=0;
int neighbor_root_level=0;
int neighbor_leaf_addr=0;
Neighbor *locate_neighbor=(Neighbor *)malloc(sizeof(Neighbor));
printf("id:%d, start:%d end:%d,round:%d\n",p_id,cal.start_num,cal.end_num,cal.roundth);
for(i=cal.start_num;i<=cal.end_num;i++)
{
root_level=i/NUM_OF_EACH_LEVEL;
leaf_addr=i%NUM_OF_EACH_LEVEL;
if(root_addr[root_level][leaf_addr].node_value!=i)
{
//ignore, because this is a gap, no this node
}
else
{
int k=0;
locate_neighbor=root_addr[root_level][leaf_addr].head;
double tmp_credit=0;
for(k=0;k<root_addr[root_level][leaf_addr].degree;k++)
{
neighbor_root_level=locate_neighbor->neighbor_value/NUM_OF_EACH_LEVEL;
neighbor_leaf_addr=locate_neighbor->neighbor_value%NUM_OF_EACH_LEVEL;
tmp_credit += root_addr[neighbor_root_level][neighbor_leaf_addr].g_credit[cal.roundth-1]/root_addr[neighbor_root_level][neighbor_leaf_addr].degree;
locate_neighbor=locate_neighbor->next;
}
root_addr[root_level][leaf_addr].g_credit[cal.roundth]=tmp_credit;
}
}
return 0;
}
The array is very large, each thread calculate part of the array.
Is there something wrong with my code?
It could be a bug, but if you feel the code is correct, then the overhead of parallelization, mutexes and such, might mean the overall performance (runtime) is the same as for the non-parallelized code, for the size of elements to compute against.
It might be an interesting study, to do looped code, single-threaded, and the threaded code, against very large arrays (100k elements?), and see if the results start to diverge to be faster in the parallel/threaded code?
Amdahl's law, also known as Amdahl's argument,[1] is used to find the maximum expected improvement to an overall system when only part of the system is improved. It is often used in parallel computing to predict the theoretical maximum speedup using multiple processors.
https://en.wikipedia.org/wiki/Amdahl%27s_law
You don't always gain speed by multi-threading a program. There is a certain amount of overhead that comes with threading. Unless there is enough inefficiencies in the non-threaded code to make up for the overhead, you'll not see an improvement. A lot can be learned about how multi-threading works even if the program you write ends up running slower.
I know why this happen now. The problem is that I use clock() to calculate the time. It is wrong when it is used in multithread program because this function will multiple the real time based on the num of threads. When I use time() to calculate the time, the result is correct.

Audio tone for Microcontroller p18f4520 using a for loop

I'm using C Programming to program an audio tone for the microcontroller P18F4520.
I am using a for loop and delays to do this. I have not learned any other ways to do so and moreover it is a must for me to use a for loop and delay to generate an audio tone for the target board. The port for the speaker is at RA4. This is what I have done so far.
#include <p18f4520.h>
#include <delays.h>
void tone (float, int);
void main()
{
ADCON1 = 0x0F;
TRISA = 0b11101111;
/*tone(38.17, 262); //C (1)
tone(34.01, 294); //D (2)
tone(30.3, 330); //E (3)
tone(28.57, 350); //F (4)
tone(25.51, 392); //G (5)
tone(24.04, 416); //G^(6)
tone(20.41, 490); //B (7)
tone(11.36, 880); //A (8)*/
tone(11.36, 880); //A (8)
}
void tone(float n, int cycles)
{
unsigned int i;
for(i=0; i<cycles; i++)
{
PORTAbits.RA4 = 0;
Delay10TCYx(n);
PORTAbits.RA4 = 1;
Delay10TCYx(n);
}
}
So as you can see what I have done is that I have created a tone function whereby n is for the delay and cycles is for the number of cycles in the for loop. I am not that sure whether my calculations are correct but so far it is what I have done and it does produce a tone. I'm just not sure whether it is really a A tone or G tone etc. How I calculate is that firstly I will find out the frequency tone for example A tone has a frequency of 440Hz. Then I will find the period for it whereby it will be 1/440Hz. Then for a duty cycle, I would like the tone to beep only for half of it which is 50% so I will divide the period by 2 which is (1/440Hz)/2 = 0.001136s or 1.136ms. Then I will calculate delay for 1 cycle for the microcontroller 4*(1/2MHz) which is 2µs. So this means that for 1 cycle it will delay for 2µs, the ratio would be 2µs:1cyc. So in order to get the number of cycles for 1.136ms it will be 1.136ms:1.136ms/2µs which is 568 cycles. Currently at this part I have asked around what should be in n where n is in Delay10TCYx(n). What I have gotten is that just multiply 10 for 11.36 and for a tone A it will be Delay10TCYx(11.36). As for the cycles I would like to delay for 1 second so 1/1.136ms which is 880. That's why in my method for tone A it is tone(11.36, 880). It generates a tone and I have found out the range of tones but I'm not really sure if they are really tones C D E F G G^ B A.
So my questions are
1. How do I really calculate the delay and frequency for tone A?
2. for the state of the for loop for the 'cycles' is the number cycles but from the answer that I will get from question 1, how many cycles should I use in order to vary the period of time for tone A? More number of cycles will be longer periods of tone A? If so, how do I find out how long?
3. When I use a function to play the tone, it somehow generates a different tone compared to when I used the for loop directly in the main method. Why is this so?
4. Lastly, if I want to stop the code, how do I do it? I've tried using a for loop but it doesn't work.
A simple explanation would be great as I am just a student working on a project to produce tones using a for loop and delays. I've searched else where whereby people use different stuff like WAV or things like that but I would just simply like to know how to use a for loop and delay for audio tones.
Your help would be greatly appreciated.
First, you need to understand the general approach to how you generate an interrupt at an arbitrary time interval. This lets you know you are able to have a specific action occur every x microseconds, [1-2].
There are already existing projects that play a specific tone (in Hz) on a PIC like what you are trying to do, [3-4].
Next, you'll want to take an alternate approach to generating the tone. In the case of your delay functions, you are effectively using up the CPU for nothing, when it could be done something else. You would be better off using timer interrupts directly so you're not "burning the CPU by idling".
Once you have this implemented, you just need to know the corresponding frequency for the note you are trying to generate, either by using a formula to generate a frequency from a specific musical note [5], or using a look-up table [6].
What is the procedure for PIC timer0 to generate a 1 ms or 1 sec interrupt?, Accessed 2014-07-05, <http://www.edaboard.com/thread52636.html>
Introduction to PIC18′s Timers – PIC Microcontroller Tutorial, Accessed 2014-07-05, <http://extremeelectronics.co.in/microchip-pic-tutorials/introduction-to-pic18s-timers-pic-microcontroller-tutorial/>
Generate Ring Tones on your PIC16F87x Microcontroller, Accessed 2014-07-05, <http://retired.beyondlogic.org/pic/ringtones.htm>http://retired.beyondlogic.org/pic/ringtones.htm
AN655 - D/A Conversion Using PWM and R-2R Ladders to Generate Sine and DTMF Waveforms, Accessed 2014-07-05, <http://www.microchip.com/stellent/idcplg?IdcService=SS_GET_PAGE&nodeId=1824&appnote=en011071>
Equations for the Frequency Table, Accessed 2014-07-05, <http://www.phy.mtu.edu/~suits/NoteFreqCalcs.html>
Frequencies for equal-tempered scale, A4 = 440 Hz, Accessed 2014-07-05, <http://www.phy.mtu.edu/~suits/notefreqs.html>
How to compute the number of cycles for delays to get a tone of 440Hz ? I will assume that your clock speed is 1/2MHz or 500kHz, as written in your question.
1) A 500kHz clock speed corresponds to a tic every 2us. Since a cycle is 4 clock tics, a cycle lasts 8 us.
2) A frequency of 440Hz corresponds to a period of 2.27ms, or 2270us, or 283 cycles.
3) The delay is called twice per period, so the delays should be about 141 cycles for A.
About your tone function...As you compile your code, you must face some kind of warning, something like warning: (42) implicit conversion of float to integer... The prototype of the delay function is void Delay10TCYx(unsigned char); : it expects an unsigned char, not a float. You will not get any floating point precision. You may try something like :
void tone(unsigned char n, unsigned int cycles)
{
unsigned int i;
for(i=0; i<cycles; i++)
{
PORTAbits.RA4 = 0;
Delay1TCYx(n);
PORTAbits.RA4 = 1;
Delay1TCYx(n);
}
}
I changed for Delay1TCYx() for accuracy. Now, A 1 second A-tone would be tone(141,440). A 1 second 880Hz-tone would be tone(70,880).
There is always a while(1) is all examples about PIC...if you just need one beep at start, do something like :
void main()
{
ADCON1 = 0x0F;
TRISA = 0b11101111;
tone(141,440);
tone(70,880);
tone(141,440);
while(1){
}
}
Regarding the change of tone when embedded in a function, keep in mind that every operation takes at least one cycle. Calling a function may take a few cycles. Maybe declaring static inline void tone (unsigned char, int) would be a good thing...
However, as signaled by #Dogbert , using delays.h is a good start for beginners, but do not get used to it ! Microcontrollers have lots of features to avoid counting and to save some time for useful computations.
timers : think of it as an alarm clock. 18f4520 has 4 of them.
interruption : the PIC stops the current operation, performs the code specified for this interruption, erases the flag and comes back to its previous task. A timer can trigger an interruption.
PWM pulse wave modulation. 18f4520 has 2 of them. Basically, it generates your signal !

Linux, timerfd accuracy

I have a system that needs at least 10 mseconds of accuracy for timers.
I went for timerfd as it suits me perfectly, but found that even for times up to 15 milliseconds it is not accurate at all, either that or I don't understand how it works.
The times I have measured were up to 21 mseconds on a 10 mseconds timer.
I have put together a quick test that shows my problem.
Here a test:
#include <sys/timerfd.h>
#include <time.h>
#include <string.h>
#include <stdint.h>
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
#include <inttypes.h>
int main(int argc, char *argv[]){
int timerfd = timerfd_create(CLOCK_MONOTONIC,0);
int milliseconds = atoi(argv[1]);
struct itimerspec timspec;
bzero(&timspec, sizeof(timspec));
timspec.it_interval.tv_sec = 0;
timspec.it_interval.tv_nsec = milliseconds * 1000000;
timspec.it_value.tv_sec = 0;
timspec.it_value.tv_nsec = 1;
int res = timerfd_settime(timerfd, 0, &timspec, 0);
if(res < 0){
perror("timerfd_settime:");
return 1;
}
uint64_t expirations = 0;
int iterations = 0;
while( res = read(timerfd, &expirations, sizeof(expirations))){
if(res < 0){ perror("read:"); continue; }
if(expirations > 1){
printf("%" PRIu64 " expirations, %d iterations\n", expirations, iterations);
break;
}
iterations++;
}
return 0;
}
And executed like this:
Zack ~$ for i in 2 4 8 10 15; do echo "intervals of $i milliseconds"; ./test $i;done
intervals of 2 milliseconds
2 expirations, 1 iterations
intervals of 4 milliseconds
2 expirations, 6381 iterations
intervals of 8 milliseconds
2 expirations, 21764 iterations
intervals of 10 milliseconds
2 expirations, 1089 iterations
intervals of 15 milliseconds
2 expirations, 3085 iterations
Even assuming some possible delays, 15 milliseconds delays sounds too much for me.
Try altering it as follows, this should pretty much garuntee that it'll never miss a wakeup, but be careful with it since running realtime priority can lock your machine hard if it doesn't sleep, also you may need to set things up so that your user has the ability to run stuff at realtime priority (see /etc/security/limits.conf)
#include <sys/timerfd.h>
#include <time.h>
#include <string.h>
#include <stdint.h>
#include <stdio.h>
#include <sched.h>
int main(int argc, char *argv[])
{
int timerfd = timerfd_create(CLOCK_MONOTONIC,0);
int milliseconds = atoi(argv[1]);
struct itimerspec timspec;
struct sched_param schedparm;
memset(&schedparm, 0, sizeof(schedparm));
schedparm.sched_priority = 1; // lowest rt priority
sched_setscheduler(0, SCHED_FIFO, &schedparm);
bzero(&timspec, sizeof(timspec));
timspec.it_interval.tv_sec = 0;
timspec.it_interval.tv_nsec = milliseconds * 1000000;
timspec.it_value.tv_sec = 0;
timspec.it_value.tv_nsec = 1;
int res = timerfd_settime(timerfd, 0, &timspec, 0);
if(res < 0){
perror("timerfd_settime:");
}
uint64_t expirations = 0;
int iterations = 0;
while( res = read(timerfd, &expirations, sizeof(expirations))){
if(res < 0){ perror("read:"); continue; }
if(expirations > 1){
printf("%ld expirations, %d iterations\n", expirations, iterations);
break;
}
iterations++;
}
}
If you are using threads you should use pthread_setschedparam instead of sched_setscheduler.
Realtime also isn't about low latency, it's about guarantees, RT means that if you want to wake up exactly once every second on the second, you WILL, the normal scheduling does not give you this, it might decide to wake you up 100ms later, because it had some other work to do at that time anyway. If you want to wake up every 10ms and you REALLY do need to, then you should set yourself to run as a realtime task then the kernel will wake you up every 10ms without fail. Unless a higher priority realtime task is busy doing stuff.
If you need to guarantee that your wakeup interval is exactly some time it doesn't matter if it's 1ms or 1 second, you won't get it unless you run as a realtime task. There are good reasons the kernel will do this to you (saving power is one of them, higher throughput is another, there are others), but it's well within it's rights to do so since you never told it you need better guarantees. Most stuff doesn't actually need to be this accurate, or need to never miss so you should think hard about whether or not you really do need it.
quoting from http://www.ganssle.com/articles/realtime.htm
A hard real time task or system is
one where an activity simply must be
completed - always - by a specified
deadline. The deadline may be a
particular time or time interval, or
may be the arrival of some event. Hard
real time tasks fail, by definition,
if they miss such a deadline.
Notice this definition makes no
assumptions about the frequency or
period of the tasks. A microsecond or
a week - if missing the deadline
induces failure, then the task has
hard real time requirements.
Soft realtime is pretty much the same, except that missing a deadline, while undesirable, is not the end of the world (for example video and audio playback are soft realtime tasks, you don't want to miss displaying a frame, or run out of buffer, but if you do it's just a momentary hiccough, and you simply continue). If what you are trying to do is 'soft' realtime I wouldn't bother with running at realtime priority since you should generally get your wakeups in time (or at least close to it).
EDIT:
If you aren't running realtime the kernel will by default give any timers you make some 'slack' so that it can merge your request to wake up with other events that happen at times close to the one you asked for (that is if the other event is within your 'slack' time it will not wake you at the time you asked, but a little earlier or later, at the same time it was already going to do something else, this saves power).
For a little more info see High- (but not too high-) resolution timeouts and Timer slack (note I'm not sure if either of those things is exactly what's really in the kernel since both those articles are about lkml mailing list discussions, but something like the first one really is in the kernel.
I've got a feeling that your test is very hardware dependent. When I ran your sample program on my system, it appeared to hang at 1ms. To make your test at all meaningful on my computer, I had to change from milliseconds to microseconds. (I changed the multiplier from 1_000_000 to 1_000.)
$ grep 1000 test.c
timspec.it_interval.tv_nsec = microseconds * 1000;
$ for i in 1 2 4 5 7 8 9 15 16 17\
31 32 33 47 48 49 63 64 65 ; do\
echo "intervals of $i microseconds";\
./test $i;done
intervals of 1 microseconds
11 expirations, 0 iterations
intervals of 2 microseconds
5 expirations, 0 iterations
intervals of 4 microseconds
3 expirations, 0 iterations
intervals of 5 microseconds
2 expirations, 0 iterations
intervals of 7 microseconds
2 expirations, 0 iterations
intervals of 8 microseconds
2 expirations, 0 iterations
intervals of 9 microseconds
2 expirations, 0 iterations
intervals of 15 microseconds
2 expirations, 7788 iterations
intervals of 16 microseconds
4 expirations, 1646767 iterations
intervals of 17 microseconds
2 expirations, 597 iterations
intervals of 31 microseconds
2 expirations, 370969 iterations
intervals of 32 microseconds
2 expirations, 163167 iterations
intervals of 33 microseconds
2 expirations, 3267 iterations
intervals of 47 microseconds
2 expirations, 1913584 iterations
intervals of 48 microseconds
2 expirations, 31 iterations
intervals of 49 microseconds
2 expirations, 17852 iterations
intervals of 63 microseconds
2 expirations, 24 iterations
intervals of 64 microseconds
2 expirations, 2888 iterations
intervals of 65 microseconds
2 expirations, 37668 iterations
(Somewhat interesting that I got the longest runs from 16 and 47 microseconds, but 17 and 48 were awful.)
time(7) has some suggestions on why our platforms are so different:
High-Resolution Timers
Before Linux 2.6.21, the accuracy of timer and sleep system
calls (see below) was also limited by the size of the jiffy.
Since Linux 2.6.21, Linux supports high-resolution timers
(HRTs), optionally configurable via CONFIG_HIGH_RES_TIMERS. On
a system that supports HRTs, the accuracy of sleep and timer
system calls is no longer constrained by the jiffy, but instead
can be as accurate as the hardware allows (microsecond accuracy
is typical of modern hardware). You can determine whether
high-resolution timers are supported by checking the resolution
returned by a call to clock_getres(2) or looking at the
"resolution" entries in /proc/timer_list.
HRTs are not supported on all hardware architectures. (Support
is provided on x86, arm, and powerpc, among others.)
All the 'resolution' lines in my /proc/timer_list are 1ns on my (admittedly ridiculously powerful) x86_64 system.
I decided to try to figure out where the 'breaking point' is on my computer, but gave up on the 110 microsecond run:
$ for i in 70 80 90 100 110 120 130\
; do echo "intervals of $i microseconds";\
./test $i;done
intervals of 70 microseconds
2 expirations, 639236 iterations
intervals of 80 microseconds
2 expirations, 150304 iterations
intervals of 90 microseconds
4 expirations, 3368248 iterations
intervals of 100 microseconds
4 expirations, 1964857 iterations
intervals of 110 microseconds
^C
90 microseconds ran for three million iterations before it failed a few times; that's 22 times better resolution than your very first test, so I'd say that given the right hardware, 10ms shouldn't be anywhere near difficult. (90 microseconds is 111 times better resolution than 10 milliseconds.)
But if your hardware doesn't provide the timers for high resolution timers, then Linux can't help you without resorting to SCHED_RR or SCHED_FIFO. And even then, perhaps another kernel could better provide you with the software timer support you need.
Good luck. :)
Here's a theory. If HZ is set to 250 for your system ( as is typical ) then you have a 4 millisecond timer resolution. Once your process is swapped out by the scheduler, it's likely that a number of other processes will be scheduled and run before your process gets another time slice. This might explain you seeing timer resolutions in the 15 to 21 millisecond range. The only way to get around this would be to run a real-time kernel.
The typical solution for high resolution timing on non-realtime systems is basically to busy wait with a call to select.
Depending on what else the system is doing, it may be a bit slow in switching back to your task. Unless you have a "real" realtime system, there's no guarantee it will do better than what you're seeing, although I agree that result is a bit disappointing.
You can (mostly) eliminate that task switch / scheduler time. If you have CPU power (and electrical power!) to spare, a brutal but effective solution would be a busy wait spin loop.
The idea is to run your program in a tight loop that continuously polls the clock for what time it is, and then calls your other code when the time is right. At the expense of making your system act very sluggish for everything else and heating up your CPU, you will end up with task scheduling that is mostly jitter free.
I wrote a system like this once under Windows XP to spin a stepper motor, supplying evenly spaced pulses up to 40K times per second, and it worked fine. Of course, your mileage may vary.

Resources