Duration for Sound on the 8051 - c

I am trying to create a short tune using timers on the 8051. I am trying to send a square wave with a specified frequency to create the notes.
However, with my current code all I am getting is one infinite note, that never stops playing. Any help figuring out how to stop the note, and create a duration function would be greatly appreciated.
#include<reg932.h>
sbit speaker=P1^7;
void tone(unsigned char, unsigned char);
void main()
{
P1M1 = 0;
P1M2 = 0;
tone(0xC8, 0xF3);
}
void tone(unsigned char highval, unsigned char lowval)
{
TMOD=0x01;
TL0=lowval;
TH0=highval;
TR0=1;
while(TF0==0);
speaker=0;
TR0=0;
TF0=0;
}

I haven't programmed 8051's devices in a long time, but here's what I'd do:
1.a. figure out if it's tone() that never exits
1.b. if it is, I'd make sure that the while loop is indeed in there (see the disassembly of tone()), if it's not, the compiler optimizes the check out and it needs fixing (e.g. declaring TF0 as volatile)
1.c. see if the check is correct (the right bit in the right register, etc)
write an assembly routine to waste N CPU clocks, use the slowest instruction (was MUL or DIV the slowest?) in a loop or simply repeated M times so you get like a 10 ms delay or something, write a C function to call that routine as many times as necessary (e.g. 100 times for 1 second). (You could use a timer here, but this may be the simplest)

Related

Preventing torn reads with an HCS12 microcontroller

Summary
I'm trying to write an embedded application for an MC9S12VR microcontroller. This is a 16-bit microcontroller but some of the values I deal with are 32 bits wide and while debugging I've captured some anomalous values that seem to be due to torn reads.
I'm writing the firmware for this micro in C89 and running it through the Freescale HC12 compiler, and I'm wondering if anyone has any suggestions on how to prevent them on this particular microcontroller assuming that this is the case.
Details
Part of my application involves driving a motor and estimating its position and speed based on pulses generated by an encoder (a pulse is generated on every full rotation of the motor).
For this to work, I need to configure one of the MCU timers so that I can track the time elapsed between pulses. However, the timer has a clock rate of 3 MHz (after prescaling) and the timer counter register is only 16-bit, so the counter overflows every ~22ms. To compensate, I set up an interrupt handler that fires on a timer counter overflow, and this increments an "overflow" variable by 1:
// TEMP
static volatile unsigned long _timerOverflowsNoReset;
// ...
#ifndef __INTELLISENSE__
__interrupt VectorNumber_Vtimovf
#endif
void timovf_isr(void)
{
// Clear the interrupt.
TFLG2_TOF = 1;
// TEMP
_timerOverflowsNoReset++;
// ...
}
I can then work out the current time from this:
// TEMP
unsigned long MOTOR_GetCurrentTime(void)
{
const unsigned long ticksPerCycle = 0xFFFF;
const unsigned long ticksPerMicrosecond = 3; // 24 MHZ / 8 (prescaler)
const unsigned long ticks = _timerOverflowsNoReset * ticksPerCycle + TCNT;
const unsigned long microseconds = ticks / ticksPerMicrosecond;
return microseconds;
}
In main.c, I've temporarily written some debugging code that drives the motor in one direction and then takes "snapshots" of various data at regular intervals:
// Test
for (iter = 0; iter < 10; iter++)
{
nextWait += SECONDS(secondsPerIteration);
while ((_test2Snapshots[iter].elapsed = MOTOR_GetCurrentTime() - startTime) < nextWait);
_test2Snapshots[iter].position = MOTOR_GetCount();
_test2Snapshots[iter].phase = MOTOR_GetPhase();
_test2Snapshots[iter].time = MOTOR_GetCurrentTime() - startTime;
// ...
In this test I'm reading MOTOR_GetCurrentTime() in two places very close together in code and assign them to properties of a globally available struct.
In almost every case, I find that the first value read is a few microseconds beyond the point the while loop should terminate, and the second read is a few microseconds after that - this is expected. However, occasionally I find the first read is significantly higher than the point the while loop should terminate at, and then the second read is less than the first value (as well as the termination value).
The screenshot below gives an example of this. It took about 20 repeats of the test before I was able to reproduce it. In the code, <snapshot>.elapsed is written to before <snapshot>.time so I expect it to have a slightly smaller value:
For snapshot[8], my application first reads 20010014 (over 10ms beyond where it should have terminated the busy-loop) and then reads 19988209. As I mentioned above, an overflow occurs every 22ms - specifically, a difference in _timerOverflowsNoReset of one unit will produce a difference of 65535 / 3 in the calculated microsecond value. If we account for this:
A difference of 40 isn't that far off the discrepancy I see between my other pairs of reads (~23/24), so my guess is that there's some kind of tear going on involving an off-by-one read of _timerOverflowsNoReset. As in while busy-looping, it will perform one call to MOTOR_GetCurrentTime() that erroneously sees _timerOverflowsNoReset as one greater than it actually is, causing the loop to end early, and then on the next read after that it sees the correct value again.
I have other problems with my application that I'm having trouble pinning down, and I'm hoping that if I resolve this, it might resolve these other problems as well if they share a similar cause.
Edit: Among other changes, I've changed _timerOverflowsNoReset and some other globals from 32-bit unsigned to 16-bit unsigned in the implementation I now have.
You can read this value TWICE:
unsigned long GetTmrOverflowNo()
{
unsigned long ovfl1, ovfl2;
do {
ovfl1 = _timerOverflowsNoReset;
ovfl2 = _timerOverflowsNoReset;
} while (ovfl1 != ovfl2);
return ovfl1;
}
unsigned long MOTOR_GetCurrentTime(void)
{
const unsigned long ticksPerCycle = 0xFFFF;
const unsigned long ticksPerMicrosecond = 3; // 24 MHZ / 8 (prescaler)
const unsigned long ticks = GetTmrOverflowNo() * ticksPerCycle + TCNT;
const unsigned long microseconds = ticks / ticksPerMicrosecond;
return microseconds;
}
If _timerOverflowsNoReset increments much slower then execution of GetTmrOverflowNo(), in worst case inner loop runs only two times. In most cases ovfl1 and ovfl2 will be equal after first run of while() loop.
Calculate the tick count, then check if while doing that the overflow changed, and if so repeat;
#define TCNT_BITS 16 ; // TCNT register width
uint32_t MOTOR_GetCurrentTicks(void)
{
uint32_t ticks = 0 ;
uint32_t overflow_count = 0;
do
{
overflow_count = _timerOverflowsNoReset ;
ticks = (overflow_count << TCNT_BITS) | TCNT;
}
while( overflow_count != _timerOverflowsNoReset ) ;
return ticks ;
}
the while loop will iterate either once or twice no more.
Based on the answers #AlexeyEsaulenko and #jeb provided, I gained understanding into the cause of this problem and how I could tackle it. As both their answers were helpful and the solution I currently have is sort of a mixture of the two, I can't decide which of the two answers to accept, so instead I'll upvote both answers and keep this question open.
This is how I now implement MOTOR_GetCurrentTime:
unsigned long MOTOR_GetCurrentTime(void)
{
const unsigned long ticksPerMicrosecond = 3; // 24 MHZ / 8 (prescaler)
unsigned int countA;
unsigned int countB;
unsigned int timerOverflowsA;
unsigned int timerOverflowsB;
unsigned long ticks;
unsigned long microseconds;
// Loops until TCNT and the timer overflow count can be reliably determined.
do
{
timerOverflowsA = _timerOverflowsNoReset;
countA = TCNT;
timerOverflowsB = _timerOverflowsNoReset;
countB = TCNT;
} while (timerOverflowsA != timerOverflowsB || countA >= countB);
ticks = ((unsigned long)timerOverflowsA << 16) + countA;
microseconds = ticks / ticksPerMicrosecond;
return microseconds;
}
This function might not be as efficient as other proposed answers, but it gives me confidence that it will avoid some of the pitfalls that have been brought to light. It works by repeatedly reading both the timer overflow count and TCNT register twice, and only exiting the loop when the following two conditions are satisfied:
the timer overflow count hasn't changed while reading TCNT for the first time in the loop
the second count is greater than the first count
This basically means that if MOTOR_GetCurrentTime is called around the time that a timer overflow occurs, we wait until we've safely moved on to the next cycle, indicated by the second TCNT read being greater than the first (e.g. 0x0001 > 0x0000).
This does mean that the function blocks until TCNT increments at least once, but since that occurs every 333 nanoseconds I don't see it being problematic.
I've tried running my test 20 times in a row and haven't noticed any tearing, so I believe this works. I'll continue to test and update this answer if I'm wrong and the issue persists.
Edit: As Vroomfondel points out in the comments below, the check I do involving countA and countB also incidentally works for me and can potentially cause the loop to repeat indefinitely if _timerOverflowsNoReset is read fast enough. I'll update this answer when I've come up with something to address this.
The atomic reads are not the main problem here.
It's the problem that the overflow-ISR and TCNT are highly related.
And you get problems when you read first TCNT and then the overflow counter.
Three sample situations:
TCNT=0x0000, Overflow=0 --- okay
TCNT=0xFFFF, Overflow=1 --- fails
TCNT=0x0001, Overflow=1 --- okay again
You got the same problems, when you change the order to: First read overflow, then TCNT.
You could solve it with reading twice the totalOverflow counter.
disable_ints();
uint16_t overflowsA=totalOverflows;
uint16_t cnt = TCNT;
uint16_t overflowsB=totalOverflows;
enable_ints();
uint32_t totalCnt = cnt;
if ( overflowsA != overflowsB )
{
if (cnt < 0x4000)
totalCnt += 0x10000;
}
totalCnt += (uint32_t)overflowsA << 16;
If the totalOverflowCounter changed while reading the TCNT, then it's necessary to check if the value in tcnt is already greater 0 (but below ex. 0x4000) or if tcnt is just before the overflow.
One technique that can be helpful is to maintain two or three values that, collectively, hold overlapping portions of a larger value.
If one knows that a value will be monotonically increasing, and one will never go more than 65,280 counts between calls to "update timer" function, one could use something like:
// Note: Assuming a platform where 16-bit loads and stores are atomic
uint16_t volatile timerHi, timerMed, timerLow;
void updateTimer(void) // Must be only thing that writes timers!
{
timerLow = HARDWARE_TIMER;
timerMed += (uint8_t)((timerLow >> 8) - timerMed);
timerHi += (uint8_t)((timerMed >> 8) - timerHi);
}
uint32_t readTimer(void)
{
uint16_t tempTimerHi = timerHi;
uint16_t tempTimerMed = timerMed;
uint16_t tempTimerLow = timerLow;
tempTimerMed += (uint8_t)((tempTimerLow >> 8) - tempTimerMed);
tempTimerHi += (uint8_t)((tempTimerMed >> 8) - tempTimerHi);
return ((uint32_t)tempTimerHi) << 16) | tempTimerLow;
}
Note that readTimer reads timerHi before it reads timerLow. It's possible that updateTimer might update timerLow or timerMed between the time readTimer reads
timerHi and the time it reads those other values, but if that occurs, it will
notice that the lower part of timerHi needs to be incremented to match the upper
part of the value that got updated later.
This approach can be cascaded to arbitrary length, and need not use a full 8 bits
of overlap. Using 8 bits of overlap, however, makes it possible to form a 32-bit
value by using the upper and lower values while simply ignoring the middle one.
If less overlap were used, all three values would need to take part in the
final computation.
The problem is that the writes to _timerOverflowsNoReset isn't atomic and you don't protect them. This is a bug. Writing atomic from the ISR isn't very important, as the HCS12 blocks the background program during interrupt. But reading atomic in the background program is absolutely necessary.
Also, have in mind that Codewarrior/HCS12 generates somewhat ineffective code for 32 bit arithmetic.
Here is how you can fix it:
Drop unsigned long for the shared variable. In fact you don't need a counter at all, given that your background program can service the variable within 22ms real-time - should be very easy requirement. Keep your 32 bit counter local and away from the ISR.
Ensure that reads of the shared variable are atomic. Disassemble! It must be a single MOV instruction or similar; otherwise you must implement semaphores.
Don't read any volatile variable inside complex expressions. Not only the shared variable but also the TCNT. Your program as it stands has a tight coupling between the slow 32 bit arithmetic algorithm's speed and the timer, which is very bad. You won't be able to reliably read TCNT with any accuracy, and to make things worse you call this function from other complex code.
Your code should be changed to something like this:
static volatile bool overflow;
void timovf_isr(void)
{
// Clear the interrupt.
TFLG2_TOF = 1;
// TEMP
overflow = true;
// ...
}
unsigned long MOTOR_GetCurrentTime(void)
{
bool of = overflow; // read this on a line of its own, ensure this is atomic!
uint16_t tcnt = TCNT; // read this on a line of its own
overflow = false; // ensure this is atomic too
if(of)
{
_timerOverflowsNoReset++;
}
/* calculations here */
return microseconds;
}
If you don't end up with atomic reads, you will have to implement semaphores, block the timer interrupt or write the reading code in inline assembler (my recommendation).
Overall I would say that your design relying on TOF is somewhat questionable. I think it would be better to set up a dedicated timer channel and let it count up a known time unit (10ms?). Any reason why you can't use one of the 8 timer channels for this?
It all boils down to the question of how often you do read the timer and how long the maximum interrupt sequence will be in your system (i.e. the maximum time the timer code can be stopped without making "substantial" progress).
Iff you test for time stamps more often than the cycle time of your hardware timer AND those tests have the guarantee that the end of one test is no further apart from the start of its predecessor than one interval (in your case 22ms), all is well. In the case your code is held up for so long that these preconditions don't hold, the following solution will not work - the question then however is whether the time information coming from such a system has any value at all.
The good thing is that you don't need an interrupt at all - any try to compensate for the inability of the system to satisfy two equally hard RT problems - updating your overflow timer and delivering the hardware time is either futile or ugly plus not meeting the basic system properties.
unsigned long MOTOR_GetCurrentTime(void)
{
static uint16_t last;
static uint16_t hi;
volatile uint16_t now = TCNT;
if (now < last)
{
hi++;
}
last = now;
return now + (hi * 65536UL);
}
BTW: I return ticks, not microseconds. Don't mix concerns.
PS: the caveat is that such a function is not reentrant and in a sense a true singleton.

Audio tone for Microcontroller p18f4520 using a for loop

I'm using C Programming to program an audio tone for the microcontroller P18F4520.
I am using a for loop and delays to do this. I have not learned any other ways to do so and moreover it is a must for me to use a for loop and delay to generate an audio tone for the target board. The port for the speaker is at RA4. This is what I have done so far.
#include <p18f4520.h>
#include <delays.h>
void tone (float, int);
void main()
{
ADCON1 = 0x0F;
TRISA = 0b11101111;
/*tone(38.17, 262); //C (1)
tone(34.01, 294); //D (2)
tone(30.3, 330); //E (3)
tone(28.57, 350); //F (4)
tone(25.51, 392); //G (5)
tone(24.04, 416); //G^(6)
tone(20.41, 490); //B (7)
tone(11.36, 880); //A (8)*/
tone(11.36, 880); //A (8)
}
void tone(float n, int cycles)
{
unsigned int i;
for(i=0; i<cycles; i++)
{
PORTAbits.RA4 = 0;
Delay10TCYx(n);
PORTAbits.RA4 = 1;
Delay10TCYx(n);
}
}
So as you can see what I have done is that I have created a tone function whereby n is for the delay and cycles is for the number of cycles in the for loop. I am not that sure whether my calculations are correct but so far it is what I have done and it does produce a tone. I'm just not sure whether it is really a A tone or G tone etc. How I calculate is that firstly I will find out the frequency tone for example A tone has a frequency of 440Hz. Then I will find the period for it whereby it will be 1/440Hz. Then for a duty cycle, I would like the tone to beep only for half of it which is 50% so I will divide the period by 2 which is (1/440Hz)/2 = 0.001136s or 1.136ms. Then I will calculate delay for 1 cycle for the microcontroller 4*(1/2MHz) which is 2µs. So this means that for 1 cycle it will delay for 2µs, the ratio would be 2µs:1cyc. So in order to get the number of cycles for 1.136ms it will be 1.136ms:1.136ms/2µs which is 568 cycles. Currently at this part I have asked around what should be in n where n is in Delay10TCYx(n). What I have gotten is that just multiply 10 for 11.36 and for a tone A it will be Delay10TCYx(11.36). As for the cycles I would like to delay for 1 second so 1/1.136ms which is 880. That's why in my method for tone A it is tone(11.36, 880). It generates a tone and I have found out the range of tones but I'm not really sure if they are really tones C D E F G G^ B A.
So my questions are
1. How do I really calculate the delay and frequency for tone A?
2. for the state of the for loop for the 'cycles' is the number cycles but from the answer that I will get from question 1, how many cycles should I use in order to vary the period of time for tone A? More number of cycles will be longer periods of tone A? If so, how do I find out how long?
3. When I use a function to play the tone, it somehow generates a different tone compared to when I used the for loop directly in the main method. Why is this so?
4. Lastly, if I want to stop the code, how do I do it? I've tried using a for loop but it doesn't work.
A simple explanation would be great as I am just a student working on a project to produce tones using a for loop and delays. I've searched else where whereby people use different stuff like WAV or things like that but I would just simply like to know how to use a for loop and delay for audio tones.
Your help would be greatly appreciated.
First, you need to understand the general approach to how you generate an interrupt at an arbitrary time interval. This lets you know you are able to have a specific action occur every x microseconds, [1-2].
There are already existing projects that play a specific tone (in Hz) on a PIC like what you are trying to do, [3-4].
Next, you'll want to take an alternate approach to generating the tone. In the case of your delay functions, you are effectively using up the CPU for nothing, when it could be done something else. You would be better off using timer interrupts directly so you're not "burning the CPU by idling".
Once you have this implemented, you just need to know the corresponding frequency for the note you are trying to generate, either by using a formula to generate a frequency from a specific musical note [5], or using a look-up table [6].
What is the procedure for PIC timer0 to generate a 1 ms or 1 sec interrupt?, Accessed 2014-07-05, <http://www.edaboard.com/thread52636.html>
Introduction to PIC18′s Timers – PIC Microcontroller Tutorial, Accessed 2014-07-05, <http://extremeelectronics.co.in/microchip-pic-tutorials/introduction-to-pic18s-timers-pic-microcontroller-tutorial/>
Generate Ring Tones on your PIC16F87x Microcontroller, Accessed 2014-07-05, <http://retired.beyondlogic.org/pic/ringtones.htm>http://retired.beyondlogic.org/pic/ringtones.htm
AN655 - D/A Conversion Using PWM and R-2R Ladders to Generate Sine and DTMF Waveforms, Accessed 2014-07-05, <http://www.microchip.com/stellent/idcplg?IdcService=SS_GET_PAGE&nodeId=1824&appnote=en011071>
Equations for the Frequency Table, Accessed 2014-07-05, <http://www.phy.mtu.edu/~suits/NoteFreqCalcs.html>
Frequencies for equal-tempered scale, A4 = 440 Hz, Accessed 2014-07-05, <http://www.phy.mtu.edu/~suits/notefreqs.html>
How to compute the number of cycles for delays to get a tone of 440Hz ? I will assume that your clock speed is 1/2MHz or 500kHz, as written in your question.
1) A 500kHz clock speed corresponds to a tic every 2us. Since a cycle is 4 clock tics, a cycle lasts 8 us.
2) A frequency of 440Hz corresponds to a period of 2.27ms, or 2270us, or 283 cycles.
3) The delay is called twice per period, so the delays should be about 141 cycles for A.
About your tone function...As you compile your code, you must face some kind of warning, something like warning: (42) implicit conversion of float to integer... The prototype of the delay function is void Delay10TCYx(unsigned char); : it expects an unsigned char, not a float. You will not get any floating point precision. You may try something like :
void tone(unsigned char n, unsigned int cycles)
{
unsigned int i;
for(i=0; i<cycles; i++)
{
PORTAbits.RA4 = 0;
Delay1TCYx(n);
PORTAbits.RA4 = 1;
Delay1TCYx(n);
}
}
I changed for Delay1TCYx() for accuracy. Now, A 1 second A-tone would be tone(141,440). A 1 second 880Hz-tone would be tone(70,880).
There is always a while(1) is all examples about PIC...if you just need one beep at start, do something like :
void main()
{
ADCON1 = 0x0F;
TRISA = 0b11101111;
tone(141,440);
tone(70,880);
tone(141,440);
while(1){
}
}
Regarding the change of tone when embedded in a function, keep in mind that every operation takes at least one cycle. Calling a function may take a few cycles. Maybe declaring static inline void tone (unsigned char, int) would be a good thing...
However, as signaled by #Dogbert , using delays.h is a good start for beginners, but do not get used to it ! Microcontrollers have lots of features to avoid counting and to save some time for useful computations.
timers : think of it as an alarm clock. 18f4520 has 4 of them.
interruption : the PIC stops the current operation, performs the code specified for this interruption, erases the flag and comes back to its previous task. A timer can trigger an interruption.
PWM pulse wave modulation. 18f4520 has 2 of them. Basically, it generates your signal !

Performance of array of functions over if and switch statements

I am writing a very performance critical part of the code and I had this crazy idea about substituting case statements (or if statements) with array of function pointers.
Let me demonstrate; here goes the normal version:
while(statement)
{
/* 'option' changes on every iteration */
switch(option)
{
case 0: /* simple task */ break;
case 1: /* simple task */ break;
case 2: /* simple task */ break;
case 3: /* simple task */ break;
}
}
And here is the "callback function" version:
void task0(void) {
/* simple task */
}
void task1(void) {
/* simple task */
}
void task2(void) {
/* simple task */
}
void task3(void) {
/* simple task */
}
void (*task[4]) (void);
task[0] = task0;
task[1] = task1;
task[2] = task2;
task[3] = task3;
while(statement)
{
/* 'option' changes on every iteration */
/* and now we call the function with 'case' number */
(*task[option]) ();
}
So which version will be faster? Is the overhead of the function call eliminating speed benefit over normal switch (or if) statement?
Ofcourse the latter version is not so readable but I am looking for all the speed I can get.
I am about to benchmark this when I get things set up but if someone has an answer already, I wont bother.
I think at the end of the day your switch statements will be the fastest, because function pointers have the "overhead" of the lookup of the function and the function call itself. A switch is just a jmp table straight. It of course depends on different things which only testing can give you an answer to. That's my two cent worth.
The switch statement should be compiled into a branch table, which is essentially the same thing as your array of functions, if your compiler has at least basic optimization capability.
Which version will be faster depends. The naive implementation of switch is a huge if ... else if ... else if ... construction meaning it takes on average O(n) time to execute where n is the number of cases. Your jump table is O(1) so the more different cases there are and the more the later cases are used, the more likely the jump table is to be better. For a small number of cases or for switches where the first case is chosen more frequently than others, the naive implementation is better. The matter is complicated by the fact that the compiler may choose to use a jump table even when you have written a switch if it thinks that will be faster.
The only way to know which you should choose is to performance test your code.
First, I would randomly-pause it a few times, to make certain enough time is spent in this dispatching to even bother optimizing it.
Second, if it is, since each branch spends very few cycles, you want a jump table to get to the desired branch. The reason switch statements exist is to suggest to the compiler that it can generate one if the switch values are compact.
How long is the list of switch values? If it's short, the if-ladder could still be faster, especially if you put the most frequently used codes at the top. An alternative to an if-ladder (that I've never actually seen anyone use) is an if-tree, the code equivalent of a binary tree.
You probably don't want an array of function pointers. Yes, it's an array reference to get the function pointer, but there's several instructions' overhead in calling a function, and it sounds like that could overwhelm the small amount being done inside each function.
In any case, looking at the assembly language, or single-stepping at the instruction level, will give you a good idea how efficient it's being.
A good compiler will compile a switch with cases in a small numerical range as a single conditional to see if the value is in that range (which can sometimes be optimized out) followed by a jumptable jump. This will almost surely be faster than a function call (direct or indirect) because:
A jump is a lot less expensive than a call (which must save call-clobbered registers, adjust the stack, etc.).
The code in the switch statement cases can make use of expression values already cached in registers in the caller.
It's possible that an extremely advanced compiler could determine that the call-via-function pointer only refers to one of a small set of static-linkage functions, and thereby optimize things heavily, maybe even eliminating the calls and replacing them by jumps. But I wouldn't count on it.
I arrived at this post recently since I was wondering the same. I ended up taking the time to try it. It certainly depends greatly on what you're doing, but for my VM it was a decent speed up (15-25%), and allowed me to simplify some code (which is probably where a lot of the speedup came from). As an example (code simplified for clarity), a "for" loop was able to be easily implemented using a for loop:
void OpFor( Frame* frame, Instruction* &code )
{
i32 start = GET_OP_A(code);
i32 stop_value = GET_OP_B(code);
i32 step = GET_OP_C(code);
// instruction count (ie. block size)
u32 i_count = GET_OP_D(code);
// pointer to end of block (NOP if it branches)
Instruction* end = code + i_count;
if( step > 0 )
{
for( u32 i = start; i < stop_value; i += step )
{
// rewind instruction pointer
Instruction* cur = code;
// execute code inside for loop
while(cur != end)
{
cur->func(frame, cur);
++cur;
}
}
}
else
// same with <=
}

Using hardware timer in C

Okay, so I've got some C code to perform a mathematical operation which could, pretty much, take any length of time (depending on the operands supplied to it, of course). I was wondering if there is a way to register some kind of method which will be called every n seconds which can analyse the state of the operation, i.e. what iteration it is currently at, possibly using a hardware timer interrupt or something?
The reason I ask this is because I know the common way to implement this is to be keeping track of the current iteration in a variable; say, an integer called progress and have an IF statement like this in the code:
if ((progress % 10000) == 0)
printf("Currently at iteration %d\n", progress);
but I believe that a mod operation takes a relatively long time to execute, so the idea of having it inside a loop which will be ran many, many times scares me, from an optimisation point of view.
So I get the feeling that having an external way of signalling a progress print is nice and efficient. Are there any great ways to perform this, or is the simple 'mod check' the best (in terms of optimising)?
I'd go with the mod check, but maybe with subtractions instead :-)
icount = 0;
progress = 10000;
/* ... */
if (--progress == 0) {
progress = 10000;
printf("Currently at iteration %d0000\n", ++icount);
}
/* ... */
While mod operations are usually slow, the compiler should be able to optimize and predict this really well and only mis-predict once ever 10'000 ifs, burning one mod operation and ~20 cycles (for the mis-prediction) on it, which is fine. So you are trying to optimize one mod operation every 10'000 iterations. Of course this assumes you are running it on a modern and typical CPU, and not some embedded system with unknown specs. This should even be faster than having a counter variable.
Suggestion: Test it with and without the timing code, and figure out a complex solution if there is really a problem.
Premature optimisation is the root of all evil. -Knuth
mod is about the same speed as division, on most CPU's these days that means about 5-10 cycles... in other words hardly anything, slower than multiply/add/subtract, but not enough to really worry about.
However you are right to want to avoid sting in a loop spinning if you're doing work in another thread or something like that, if you're on a unixish system there's timer_create() or on linux the much easier to use timerfd_create()
But for single threaded, just putting that if in is enough.
Use alarm setitimer to raise SIGALRM signals at regular intervals.
struct itimerval interval;
void handler( int x ) {
write( STDOUT_FILENO, ".", 1 ); /* Defined in POSIX, not in C */
}
int main() {
signal( SIGALRM, &handler );
interval.it_value.tv_sec = 5; /* display after 5 seconds */
interval.it_interval.tv_sec = 5; /* then display every 5 seconds */
setitimer( ITIMER_REAL, &interval, NULL );
/* do computations */
interval.it_interval.tv_sec = 0; /* don't display progress any more */
setitimer( ITIMER_REAL, &interval, NULL );
printf( "\n" ); /* done with the dots! */
}
Note, only a smattering of functions are OK to call inside handler. They are listed partway down this page. If you want to communicate anything for a fancier printout, do it through a sig_atomic_t variable.
you could have a global variable for the iterations, which you could monitor from an external thread.
While () {
Print(iteration);
Sleep(1000);
}
You may need to watch out for data races though.

Is this a good implementation of a FPS independant game loop?

I currently have something close to the following implementation of a FPS independent game loop for physics based games. It works very well on just about every computer I have tested it on, keeping the game speed consistent when frame rate drops. However I am going to be porting to embedded devices which will likely struggle harder with video and I am wondering if it will still cut the mustard.
edits:
For this question assume that msecs() returns the time passed in milliseconds which the program has run. The implementation of msecs is different on different platforms. This loop is also run in different ways on different platforms.
#define MSECS_PER_STEP 20
int stepCount, stepSize; // these are not globals in the real source
void loop() {
int i,j;
int iterations =0;
static int accumulator; // the accumulator holds extra msecs
static int lastMsec;
int deltatime = msec() - lastMsec;
lastMsec = msec();
// deltatime should be the time since the last call to loop
if (deltatime != 0) {
// iterations determines the number of steps which are needed
iterations = deltatime/MSECS_PER_STEP;
// save any left over millisecs in the accumulator
accumulator += deltatime%MSECS_PER_STEP;
}
// when the accumulator has gained enough msecs for a step...
while (accumulator >= MSECS_PER_STEP) {
iterations++;
accumulator -= MSECS_PER_STEP;
}
handleInput(); // gathers user input from an event queue
for (j=0; j<iterations; j++) {
// here step count is a way of taking a more granular step
// without effecting the overall speed of the simulation (step size)
for (i=0; i<stepCount; i++) {
doStep(stepSize/(float) stepCount); // forwards the sim
}
}
}
I just have a few comments. The first is that you don't have enough comments. There are places where it's not clear what you are trying to do so it is difficult to say if there is a better way to do it, but I'll point those out as I come to them. First, though:
#define MSECS_PER_STEP 20
int stepCount, stepSize; // these are not globals in the real source
void loop() {
int i,j;
int iterations =0;
static int accumulator; // the accumulator holds extra msecs
static int lastMsec;
These are not initialized to anything. The probably turn up as 0, but you should have initialized them. Also, rather than declaring them as static you might want to consider putting them in a structure that you pass into loop by reference.
int deltatime = msec() - lastMsec;
Since lastMsec wasn't (initialized and is probably 0) this probably starts out as a big delta.
lastMsec = msec();
This line, just like the last line, calls msec. This is probably meant as "the current time", and these calls are close enough that the returned value is probably the same for both calls, which is probably also what you expected, but still, you call the function twice. You should change these lines to int now = msec(); int deltatime = now - lastMsec; lastMsec = now; to avoid calling this function twice. Current time getting functions often have much higher overhead than you think.
if (deltatime != 0) {
iterations = deltatime/MSECS_PER_STEP;
accumulator += deltatime%MSECS_PER_STEP;
}
You should have a comment here that says what this does, as well as a comment above
that says what the variables were meant to mean.
while (accumulator >= MSECS_PER_STEP) {
iterations++;
accumulator -= MSECS_PER_STEP;
}
This loop needs a comment. It also needs to not be there. It appears that it could have been replaced with iterations += accumulator/MSECS_PER_STEP; accumulator %= MSECS_PER_STEP;. The division and modulus should run in shorter and more consistent time than the loop on any machine that has hardware division (which many do).
handleInput(); // gathers user input from an event queue
for (j=0; j<iterations; j++) {
for (i=0; i<stepCount; i++) {
doStep(stepSize/(float) stepCount); // forwards the sim
}
}
Doing steps in a loop independent of input will have the effect of making the game unresponsive if it does execute slow and get behind. It appears, at least, that if the game gets behind all of the input will start to stack up and get executed together and all of the in-game time will pass in one chunk. This is a less than graceful way to fail.
Additionally, I can guess what the j loop (outer loop) means, but the inner loop I am less clear on. also, the value passed to the doStep function -- what does that mean.
}
This is the last curly brace. I think that it looks lonely.
I don't know what goes on as far as whatever calls your loop function, which may be out of your control, and that may dictate what this function does and how it looks, but if not I hope that you will reconsider the structure. I believe that a better way to do it would be to have a function that is called repeatedly but with only one event at the time (issued regularly at a relatively short period). These events can be either user input events or timer events. User input events just set things up to react upon the next timer event. (when you don't have any events to process you sleep)
You should always assume that each timer event is processed at the same period, even though there may be some drift here if the processing gets behind. The main oddity that you may notice here is that if the game gets behind on processing timer events and then catches up again the time within the game may appear to slow down (below real time), then speed up (to real time), and then slow back down (to real time).
Ways to deal with this include only allowing one timer event to be in the event queue at one time, which would result in time appearing to slow down (below real time) and then speed back up (to real time) with no super speed interval.
Another way to do this, which is functionally similar to what you have, would be to have the last step of processing each timer event be to queue up the next timer event (note that no one else should send timer events {except for the first one} if this is the way you choose to implement the game). This would mean doing away with the regular time intervals between timer events and also restrict the ability for the program to sleep, since at the very least every time the event queue were inspected there would be a timer event to process.

Resources