Getting more precise timing control in Linux

Getting more precise timing control in Linux - c

I am trying to create a low-jitter multicast source for digital TV. The program in question should buffer the input, calculate the intended times from the PCR values in the stream and then send the packets at relatively precise intervals. However, this is not running on a RTOS, so some timing variance is expected.
This is the basic code (the relevant variables are initialized, I just omitted the code here):
while (!sendstop) {
//snip
//put 7 MPEG packets in one UDP packet buffer "outpkt"
//snip
waittime = //calculate from PCR values - value is in microseconds
//waittime is in the order of 2000 -> 2ms
sleeptime=curtime;
sleeptime.tv_nsec += waittime * 1000L;
sleeptime.tv_sec += sleeptime.tv_nsec / 1000000000;
sleeptime.tv_nsec %= 1000000000;
while (clock_nanosleep(CLOCK_MONOTONIC, TIMER_ABSTIME, &sleeptime, NULL) && errno == EINTR) {
printf("I");
}
sendto(sck,outpkt,1316,0,res->ai_addr,res->ai_addrlen); //send the packet
clock_gettime(CLOCK_MONOTONIC,&curtime);
}
However, this results in the sending being too slow (since there is some processing that also takes time), so the buffer fills up. So, I thought that I should get the difference between "sleeptime" (the time that should have been) and "curtime" (the actual time) and the subtract it from the future "waittime". This almost works, but now is a bit too fast and now I get an empty buffer.
My next idea was to multiply the difference by some value before subtracting it, like this (just above "while..."):
difn=curtime.tv_nsec-ostime.tv_nsec;
if (difn<0) difn+=1000000000;
sleeptime.tv_nsec = sleeptime.tv_nsec-(difn*difnc)/1000; //difnc - adjustment
if (sleeptime.tv_nsec<0) {
sleeptime.tv_nsec+=1000000000;
sleeptime.tv_sec--;
}
However, different values of difnc work at different times of day, servers and so on. There needs to be some kind of automatic adjustment based on the operation of the program. The best I could figure out was to increment/decrement it every time the buffer is full or empty, however, this leads to slow cycles of "too fast" - "too slow". I tried to adjust the "difnc" value based on how full/empty the buffer is but that too just leads of "slow"-"fast" cycles.
How can I properly automatically derive the "difnc" value or is there some other method of getting a more precise timing than with just the "clock_nanosleep" function but without busy waits (the server has other things to do)?

Related

Average from error prone measurement samples without buffering

I got a µC which measures temperature with of a sensor with an ADC. Due to various circumstances it can happen, that the reading is 0 (-30°C) or a impossible large Value (500-1500°C). I can't fix the reasons why these readings are so bad (time critical ISRs and sometimes a bad wiring) so I have to fix it with a clever piece of code.
I've come up with this (code gets called OVERSAMPLENR-times in a ISR):
#define OVERSAMPLENR 16 //read value 16 times
#define TEMP_VALID_CHANGE 0.15 //15% change in reading is possible
//float raw_tem_bed_value = <sum of all readings>;
//ADC = <AVR ADC reading macro>;
if(temp_count > 1) { //temp_count = amount of samples read, gets increased elsewhere
float avgRaw = raw_temp_bed_value / temp_count;
float diff = (avgRaw > ADC ? avgRaw - ADC : ADC - avgRaw) / (avgRaw == 0 ? 1 : avgRaw); //pulled out to shorten the line for SO
if (diff > TEMP_VALID_CHANGE * ((OVERSAMPLENR - temp_count) / OVERSAMPLENR)) //subsequent readings have a smaller tollerance
raw_temp_bed_value += avgRaw;
else
raw_temp_bed_value += ADC;
} else {
raw_temp_bed_value = ADC;
}
Where raw_temp_bed_value is a static global and gets read and processed later, when the ISR got fired 16 times.
As you can see, I check if the difference between the current average and the new reading is less then 15%. If so I accept the reading, if not, I reject it and add the current average instead.
But this breaks horribly if the first reading is something impossible.
One solution I though of is:
In the last line the raw_temp_bed_value is reset to the first ADC reading. It would be better to reset this to raw_temp_bed_value/OVERSAMPLENR. So I don't run in a "first reading error".
Do you have any better solutions? I though of some solutions featuring a moving average and use the average of the moving average but this would require additional arrays/RAM/cycles which we want to prevent.

I've often used something what I call rate of change to the sampling. Use a variable that represents how many samples it takes to reach a certain value, like 20. Then keep adding your sample difference to a variable divided by the rate of change. You can still use a threshold to filter out unlikely values.
float RateOfChange = 20;
float PreviousAdcValue = 0;
float filtered = FILTER_PRESET;
while(1)
{
//isr gets adc value here
filtered = filtered + ((AdcValue - PreviousAdcValue)/RateOfChange);
PreviousAdcValue = AdcValue;
sleep();
}
Please note that this isn't exactly like a low pass filter, it responds quicker and the last value added has the most significance. But it will not change much if a single value shoots out too much, depending on the rate of change.
You can also preset the filtered value to something sensible. This prevents wild startup behavior.
It takes up to RateOfChange samples to reach a stable value. You may want to make sure the filtered value isn't used before that by using a counter to count the number of samples taken for example. If the counter is lower than RateOfChange, skip processing temperature control.
For a more advanced (temperature) control routine, I highly recommend looking into PID control loops. These add a plethora of functionality to get a fast, stable response and keep something at a certain temperature efficiently and keep oscillations to a minimum. I've used the one used in the Marlin firmware in my own projects and works quite well.

The meaning of period in ALSA

I'm using ALSA for and audio application on Linux, I found great docs explain how to use it : 1 and this one. although I have some issues to understand this part of the setup :
/* Set number of periods. Periods used to be called fragments. */
if (snd_pcm_hw_params_set_periods(pcm_handle, hwparams, periods, 0) < 0) {
fprintf(stderr, "Error setting periods.\n");
return(-1);
}
what does mean set a number of period when I'm using the PLAYBACK mode
and :
/* Set buffer size (in frames). The resulting latency is given by */
/* latency = periodsize * periods / (rate * bytes_per_frame) */
if (snd_pcm_hw_params_set_buffer_size(pcm_handle, hwparams, (periodsize * periods)>>2) < 0) {
fprintf(stderr, "Error setting buffersize.\n");
return(-1);
}
and the same question here about the latency , how should I understand it?

I assume you've read and understood this section of linux-journal. You may also find that this blog clarify things with respect to period size selection (or fragment in the blog) in the context of ALSA. To quote:
You shouldn't misuse the fragments logic of sound devices. It's like
this:
The latency is defined by the buffer size.
The wakeup interval is defined by the fragment size.
The buffer fill level will oscillate between 'full buffer' and 'full
buffer minus 1x fragment size minus OS scheduling latency'. Setting
smaller fragment sizes will increase the CPU load and decrease battery
time since you force the CPU to wake up more often. OTOH it increases
drop out safety, since you fill up playback buffer earlier. Choosing
the fragment size is hence something which you should do balancing out
your needs between power consumption and drop-out safety. With modern
processors and a good OS scheduler like the Linux one setting the
fragment size to anything other than half the buffer size does not
make much sense.
...
(Oh, ALSA uses the term 'period' for what I call 'fragment'
above. It's synonymous)
So essentially, typically you would set period to 2 (as was done in the howto you referenced). Then periodsize * period is your total buffer size in bytes. Finally, the latency is the delay that is induced by the buffering of that many samples, and can be computed by dividing the buffer size by the rate at which samples are played back (ie. according to the formula latency = periodsize * periods / (rate * bytes_per_frame) in the code comments).
For example, the parameters from the howto:
period = 2
periodsize = 8192 bytes
rate = 44100Hz
16 bits stereo data (4 bytes per frame)
correspond to a total buffer size of period * periodsize = 2 * 8192 = 16384 bytes, and a latency of 16384 / (44100 * 4) ~ 0.093` seconds.
Note also that your hardware may have some size limitations for the supported period size (see this trouble shooting guide)

When the application tries to write samples into the buffer, an if the buffer is already full, the process goes to sleep. It gets woken up by the hardware through an interrupt; this interrupt is raised at the end of each period.
There should be at least two periods per buffer; otherwise, the buffer is already empty when a wakeup happens, which result in an underrun.
Increasing the number of periods (i.e., reducing the period size) increases the safety margin against underruns caused by scheduling or processing delays.
The latency is just proportional to the buffer size: when you completely fill the buffer, the last sample written is played by the hardware only after all the other samples have been played.

how to measure serial receive byte speed. eg bytes per second.

I'm receiving byte by byte via serial at baud rate of 115200. How to calculate bytes per sec im receiving in a c program?

There are only 3 ways to measure bytes actually received per second.
The first way is to keep track of how many bytes you receive in a fixed length of time. For example, each time you receive bytes you might do counter += number_of_bytes, and then every 5 seconds you might do rate = counter/5; counter = 0;.
The second way is to keep track of how much time passed to receive a fixed number of bytes. For example, every time you receive one byte you might do temp = now(); rate = 1/(temp - previous); previous = temp;.
The third way is to combine both of the above. For example, each time you receive bytes you might do temp = now(); rate = number_of_bytes/(temp - previous); previous = temp;.
For all of the above, you end up with individual samples and not an average. To convert the samples into an average you'd need to do something like average = sum_of_samples / number_of_samples. The best way to do this (e.g. if you want nice/smooth looking graphs) would be to store a lot of samples; where you'd replace the oldest sample with a new sample and recalculate the average.
For example:
double sampleData[1024];
int nextSlot = 0;
double average;
addSample(double value) {
double sum = 0;
sampleData[nextSlot] = value;
nextSlot++;
if(nextSlot >= 1024) nextSlot = 0;
for(int i = 0; i < 1024; i++) sum += sampleData[1024];
average = sum/1024;
}
Of course the final thing (collecting the samples using one of the 3 methods, then finding the average) would need some fiddling to get the resolution how you want it.

Assuming you have some fairly continuous input, just count the number of bytes you receive, and after some number of characters have been received, print out the time and number of characters over that time. You'll need a fairly good timestamp - clock() may be one reasonable source, but it depends on what system you are on what is the "best" option - as well as how portable you want it, but serial comms tend to not be very portable anyways, or your error will probably be large. Each time you print, reset the count.

To correct some odd comments in this thread about the theoretical maximum:
Around the time that 14400 Baud modems came to the pre-web world, the measure of Baud changed from Baud (wiki it) to match emerging digital technologies such as ISDN 64kbit. At that time, Baud became to mean Bits/second.
Being serial data in the format of 8N1, a common shorthand notation, there are eight bits, no parity bit, and one stop bit for every byte. There is no start bit.
So a theoretical maximum for 8N1 serial over 115200 Baud (bits/sec) = 115200/(8+1) = 12800 bytes/sec.
Similar (but not the same) to watching your download speeds, the rough ball-park way to work out bytes/sec from bits/sec, without a calculator, is to divide by 10.

Baud rate is measurement of how many times per second a signal is able to change. In one of that cycles, depending on the modulation you are using, you can send one or more bits (if you are using no modulation - bit rate is the same as baud rate).
Let's say you are using QPSK modulation, so you can transmit/receive 2 bits per baud. So, if you are receiving data at 115200 baud rate, 2 bits per symbol, you are receiving data with 115200 * 2 = 230400bps.

Using hardware timer in C

Okay, so I've got some C code to perform a mathematical operation which could, pretty much, take any length of time (depending on the operands supplied to it, of course). I was wondering if there is a way to register some kind of method which will be called every n seconds which can analyse the state of the operation, i.e. what iteration it is currently at, possibly using a hardware timer interrupt or something?
The reason I ask this is because I know the common way to implement this is to be keeping track of the current iteration in a variable; say, an integer called progress and have an IF statement like this in the code:
if ((progress % 10000) == 0)
printf("Currently at iteration %d\n", progress);
but I believe that a mod operation takes a relatively long time to execute, so the idea of having it inside a loop which will be ran many, many times scares me, from an optimisation point of view.
So I get the feeling that having an external way of signalling a progress print is nice and efficient. Are there any great ways to perform this, or is the simple 'mod check' the best (in terms of optimising)?

I'd go with the mod check, but maybe with subtractions instead :-)
icount = 0;
progress = 10000;
/* ... */
if (--progress == 0) {
progress = 10000;
printf("Currently at iteration %d0000\n", ++icount);
}
/* ... */

While mod operations are usually slow, the compiler should be able to optimize and predict this really well and only mis-predict once ever 10'000 ifs, burning one mod operation and ~20 cycles (for the mis-prediction) on it, which is fine. So you are trying to optimize one mod operation every 10'000 iterations. Of course this assumes you are running it on a modern and typical CPU, and not some embedded system with unknown specs. This should even be faster than having a counter variable.
Suggestion: Test it with and without the timing code, and figure out a complex solution if there is really a problem.
Premature optimisation is the root of all evil. -Knuth

mod is about the same speed as division, on most CPU's these days that means about 5-10 cycles... in other words hardly anything, slower than multiply/add/subtract, but not enough to really worry about.
However you are right to want to avoid sting in a loop spinning if you're doing work in another thread or something like that, if you're on a unixish system there's timer_create() or on linux the much easier to use timerfd_create()
But for single threaded, just putting that if in is enough.

Use alarm setitimer to raise SIGALRM signals at regular intervals.
struct itimerval interval;
void handler( int x ) {
write( STDOUT_FILENO, ".", 1 ); /* Defined in POSIX, not in C */
}
int main() {
signal( SIGALRM, &handler );
interval.it_value.tv_sec = 5; /* display after 5 seconds */
interval.it_interval.tv_sec = 5; /* then display every 5 seconds */
setitimer( ITIMER_REAL, &interval, NULL );
/* do computations */
interval.it_interval.tv_sec = 0; /* don't display progress any more */
setitimer( ITIMER_REAL, &interval, NULL );
printf( "\n" ); /* done with the dots! */
}
Note, only a smattering of functions are OK to call inside handler. They are listed partway down this page. If you want to communicate anything for a fancier printout, do it through a sig_atomic_t variable.

you could have a global variable for the iterations, which you could monitor from an external thread.
While () {
Print(iteration);
Sleep(1000);
}
You may need to watch out for data races though.

Is this a good implementation of a FPS independant game loop?

I currently have something close to the following implementation of a FPS independent game loop for physics based games. It works very well on just about every computer I have tested it on, keeping the game speed consistent when frame rate drops. However I am going to be porting to embedded devices which will likely struggle harder with video and I am wondering if it will still cut the mustard.
edits:
For this question assume that msecs() returns the time passed in milliseconds which the program has run. The implementation of msecs is different on different platforms. This loop is also run in different ways on different platforms.
#define MSECS_PER_STEP 20
int stepCount, stepSize; // these are not globals in the real source
void loop() {
int i,j;
int iterations =0;
static int accumulator; // the accumulator holds extra msecs
static int lastMsec;
int deltatime = msec() - lastMsec;
lastMsec = msec();
// deltatime should be the time since the last call to loop
if (deltatime != 0) {
// iterations determines the number of steps which are needed
iterations = deltatime/MSECS_PER_STEP;
// save any left over millisecs in the accumulator
accumulator += deltatime%MSECS_PER_STEP;
}
// when the accumulator has gained enough msecs for a step...
while (accumulator >= MSECS_PER_STEP) {
iterations++;
accumulator -= MSECS_PER_STEP;
}
handleInput(); // gathers user input from an event queue
for (j=0; j<iterations; j++) {
// here step count is a way of taking a more granular step
// without effecting the overall speed of the simulation (step size)
for (i=0; i<stepCount; i++) {
doStep(stepSize/(float) stepCount); // forwards the sim
}
}
}

I just have a few comments. The first is that you don't have enough comments. There are places where it's not clear what you are trying to do so it is difficult to say if there is a better way to do it, but I'll point those out as I come to them. First, though:
#define MSECS_PER_STEP 20
int stepCount, stepSize; // these are not globals in the real source
void loop() {
int i,j;
int iterations =0;
static int accumulator; // the accumulator holds extra msecs
static int lastMsec;
These are not initialized to anything. The probably turn up as 0, but you should have initialized them. Also, rather than declaring them as static you might want to consider putting them in a structure that you pass into loop by reference.
int deltatime = msec() - lastMsec;
Since lastMsec wasn't (initialized and is probably 0) this probably starts out as a big delta.
lastMsec = msec();
This line, just like the last line, calls msec. This is probably meant as "the current time", and these calls are close enough that the returned value is probably the same for both calls, which is probably also what you expected, but still, you call the function twice. You should change these lines to int now = msec(); int deltatime = now - lastMsec; lastMsec = now; to avoid calling this function twice. Current time getting functions often have much higher overhead than you think.
if (deltatime != 0) {
iterations = deltatime/MSECS_PER_STEP;
accumulator += deltatime%MSECS_PER_STEP;
}
You should have a comment here that says what this does, as well as a comment above
that says what the variables were meant to mean.
while (accumulator >= MSECS_PER_STEP) {
iterations++;
accumulator -= MSECS_PER_STEP;
}
This loop needs a comment. It also needs to not be there. It appears that it could have been replaced with iterations += accumulator/MSECS_PER_STEP; accumulator %= MSECS_PER_STEP;. The division and modulus should run in shorter and more consistent time than the loop on any machine that has hardware division (which many do).
handleInput(); // gathers user input from an event queue
for (j=0; j<iterations; j++) {
for (i=0; i<stepCount; i++) {
doStep(stepSize/(float) stepCount); // forwards the sim
}
}
Doing steps in a loop independent of input will have the effect of making the game unresponsive if it does execute slow and get behind. It appears, at least, that if the game gets behind all of the input will start to stack up and get executed together and all of the in-game time will pass in one chunk. This is a less than graceful way to fail.
Additionally, I can guess what the j loop (outer loop) means, but the inner loop I am less clear on. also, the value passed to the doStep function -- what does that mean.
}
This is the last curly brace. I think that it looks lonely.
I don't know what goes on as far as whatever calls your loop function, which may be out of your control, and that may dictate what this function does and how it looks, but if not I hope that you will reconsider the structure. I believe that a better way to do it would be to have a function that is called repeatedly but with only one event at the time (issued regularly at a relatively short period). These events can be either user input events or timer events. User input events just set things up to react upon the next timer event. (when you don't have any events to process you sleep)
You should always assume that each timer event is processed at the same period, even though there may be some drift here if the processing gets behind. The main oddity that you may notice here is that if the game gets behind on processing timer events and then catches up again the time within the game may appear to slow down (below real time), then speed up (to real time), and then slow back down (to real time).
Ways to deal with this include only allowing one timer event to be in the event queue at one time, which would result in time appearing to slow down (below real time) and then speed back up (to real time) with no super speed interval.
Another way to do this, which is functionally similar to what you have, would be to have the last step of processing each timer event be to queue up the next timer event (note that no one else should send timer events {except for the first one} if this is the way you choose to implement the game). This would mean doing away with the regular time intervals between timer events and also restrict the ability for the program to sleep, since at the very least every time the event queue were inspected there would be a timer event to process.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight