Increasing performance of 32bit math on 16bit processor - c

I am working on some firmware for an embedded device that uses a 16 bit PIC operating at 40 MIPS and programming in C. The system will control the position of two stepper motors and maintain the step position of each motor at all times. The max position of each motor is around 125000 steps so I cannot use a 16bit integer to keep track of the position. I must use a 32 bit unsigned integer (DWORD). The motor moves at 1000 steps per second and I have designed the firmware so that steps are processed in a Timer ISR. The timer ISR does the following:
1) compare the current position of one motor to the target position, if they are the same set the isMoving flag false and return. If they are different set the isMoving flag true.
2) If the target position is larger than the current position, move one step forward, then increment the current position.
3) If the target position is smaller than the current position, move one step backward, then decrement the current position.
Here is the code:
void _ISR _NOPSV _T4Interrupt(void)
{
static char StepperIndex1 = 'A';
if(Device1.statusStr.CurrentPosition == Device1.statusStr.TargetPosition)
{
Device1.statusStr.IsMoving = 0;
// Do Nothing
}
else if (Device1.statusStr.CurrentPosition > Device1.statusStr.TargetPosition)
{
switch (StepperIndex1) // MOVE OUT
{
case 'A':
SetMotor1PosB();
StepperIndex1 = 'B';
break;
case 'B':
SetMotor1PosC();
StepperIndex1 = 'C';
break;
case 'C':
SetMotor1PosD();
StepperIndex1 = 'D';
break;
case 'D':
default:
SetMotor1PosA();
StepperIndex1 = 'A';
break;
}
Device1.statusStr.CurrentPosition--;
Device1.statusStr.IsMoving = 1;
}
else
{
switch (StepperIndex1) // MOVE IN
{
case 'A':
SetMotor1PosD();
StepperIndex1 = 'D';
break;
case 'B':
SetMotor1PosA();
StepperIndex1 = 'A';
break;
case 'C':
SetMotor1PosB();
StepperIndex1 = 'B';
break;
case 'D':
default:
SetMotor1PosC();
StepperIndex1 = 'C';
break;
}
Device1.statusStr.CurrentPosition++;
Device1.statusStr.IsMoving = 1;
}
_T4IF = 0; // Clear the Timer 4 Interrupt Flag.
}
The target position is set in the main program loop when move requests are received. The SetMotorPos lines are just macros to turn on/off specific port pins.
My question is: Is there any way to improve the efficiency of this code? The code functions fine as is if the positions are 16bit integers but as 32bit integers there is too much processing. This device must communicate with a PC without hesitation and as written there is a noticeable performance hit. I really only need 18 bit math but I don't know of an easy way of doing that! Any constructive input/suggestions would be most appreciated.

Warning: all numbers are made up...
Supposing that the above ISR has about 200 (likely, fewer) instructions of compiled code and those include the instructions to save/restore the CPU registers before and after the ISR, each taking 5 clock cycles (likely, 1 to 3) and you call 2 of them 1000 times a second each, we end up with 2*1000*200*5 = 2 millions of clock cycles per second or 2 MIPS.
Do you actually consume the rest 38 MIPS elsewhere?
The only thing that may be important here and I can't see it, is what's done inside of the SetMotor*Pos*() functions. Do they do any complex calculations? Do they perform some slow communication with the motors, e.g. wait for them to respond to the commands sent to them?
At any rate, it's doubtful that such simple code would be noticeably slower when working with 32-bit integers than with 16-bit.
If your code is slow, find out where time is spent and how much, profile it. Generate a square pulse signal in the ISR (going to 1 when the ISR starts, going to 0 when the ISR is about to return) and measure its duration with an oscilloscope. Or do whatever is easier to find it out. Measure the time spent in all parts of the program, then optimize where really necessary, not where you have previously thought it would be.

The difference between 16 and 32 bits arithmetic shouldn't be that big, I think, since you use only increment and comparision. But maybe the problem is that each 32-bit arithmetic operation implies a function call (if the compiler isn't able/willing to do inlining of simpler operations).
One suggestion would be to do the arithmetic yourself, by breaking the Device1.statusStr.CurrentPosition in two, say, Device1.statusStr.CurrentPositionH and Device1.statusStr.CurrentPositionL. Then use some macros to do the operations, like:
#define INC(xH,xL) {xL++;if (xL == 0) xH++;}

I would get rid of the StepperIndex1 variable and instead use the two low-order bits of CurrentPosition to keep track of the current step index. Alternately, keep track of the current position in full rotations (rather than each step), so it can fit in a 16 bit variable. When moving, you only increment/decrement the position when moving to phase 'A'. Of course, this means you can only target each full rotation, rather than every step.

Sorry, but you are using bad program design.
Let's check the difference between 16 bit and 32 bit PIC24 or PIC33 asm code...
16 bit increment
inc PosInt16 ;one cycle
So 16 bit increment takes one cycle
32bit increment
clr Wd ;one cycle
inc low PosInt32 ;one cycle
addc high PosInt32, Wd ;one cycle
and 32 increment takes three cycles.
The total difference is 2 cycles or 50ns (nano seconds).
Simple calcolation will show you all. You have 1000 steps per second and 40Mips DSP so you have 40000 instructions per step at 1000 steps per second. More than enough!

When you change it from 16bit to 32bit do you change any of the compile flags to tell it to compile as a 32bit application instead.
have you tried compiling with the 32bit extensions but using only 16bit integers. do you still get such a performance drop?
It's likely that just by changing from 16bit to 32bit that some operations are compiled differently, perhaps do a Diff between the two sets of compiled ASM code and see what is actually different, is it lots or is it only a couple of lines.
Solutions would be maybe instead of using a 32bit integer, just use two 16bit integers,
when the valueA is int16.Max then set it to 0 and then increment valueB by 1 otherwise just incriment ValueA by 1, when value B is >= 3 you then check valueA >= 26696 (or something similar depending if you use unsigned or signed int16) and then you have your motor checking at 12500.

Related

Preventing torn reads with an HCS12 microcontroller

Summary
I'm trying to write an embedded application for an MC9S12VR microcontroller. This is a 16-bit microcontroller but some of the values I deal with are 32 bits wide and while debugging I've captured some anomalous values that seem to be due to torn reads.
I'm writing the firmware for this micro in C89 and running it through the Freescale HC12 compiler, and I'm wondering if anyone has any suggestions on how to prevent them on this particular microcontroller assuming that this is the case.
Details
Part of my application involves driving a motor and estimating its position and speed based on pulses generated by an encoder (a pulse is generated on every full rotation of the motor).
For this to work, I need to configure one of the MCU timers so that I can track the time elapsed between pulses. However, the timer has a clock rate of 3 MHz (after prescaling) and the timer counter register is only 16-bit, so the counter overflows every ~22ms. To compensate, I set up an interrupt handler that fires on a timer counter overflow, and this increments an "overflow" variable by 1:
// TEMP
static volatile unsigned long _timerOverflowsNoReset;
// ...
#ifndef __INTELLISENSE__
__interrupt VectorNumber_Vtimovf
#endif
void timovf_isr(void)
{
// Clear the interrupt.
TFLG2_TOF = 1;
// TEMP
_timerOverflowsNoReset++;
// ...
}
I can then work out the current time from this:
// TEMP
unsigned long MOTOR_GetCurrentTime(void)
{
const unsigned long ticksPerCycle = 0xFFFF;
const unsigned long ticksPerMicrosecond = 3; // 24 MHZ / 8 (prescaler)
const unsigned long ticks = _timerOverflowsNoReset * ticksPerCycle + TCNT;
const unsigned long microseconds = ticks / ticksPerMicrosecond;
return microseconds;
}
In main.c, I've temporarily written some debugging code that drives the motor in one direction and then takes "snapshots" of various data at regular intervals:
// Test
for (iter = 0; iter < 10; iter++)
{
nextWait += SECONDS(secondsPerIteration);
while ((_test2Snapshots[iter].elapsed = MOTOR_GetCurrentTime() - startTime) < nextWait);
_test2Snapshots[iter].position = MOTOR_GetCount();
_test2Snapshots[iter].phase = MOTOR_GetPhase();
_test2Snapshots[iter].time = MOTOR_GetCurrentTime() - startTime;
// ...
In this test I'm reading MOTOR_GetCurrentTime() in two places very close together in code and assign them to properties of a globally available struct.
In almost every case, I find that the first value read is a few microseconds beyond the point the while loop should terminate, and the second read is a few microseconds after that - this is expected. However, occasionally I find the first read is significantly higher than the point the while loop should terminate at, and then the second read is less than the first value (as well as the termination value).
The screenshot below gives an example of this. It took about 20 repeats of the test before I was able to reproduce it. In the code, <snapshot>.elapsed is written to before <snapshot>.time so I expect it to have a slightly smaller value:
For snapshot[8], my application first reads 20010014 (over 10ms beyond where it should have terminated the busy-loop) and then reads 19988209. As I mentioned above, an overflow occurs every 22ms - specifically, a difference in _timerOverflowsNoReset of one unit will produce a difference of 65535 / 3 in the calculated microsecond value. If we account for this:
A difference of 40 isn't that far off the discrepancy I see between my other pairs of reads (~23/24), so my guess is that there's some kind of tear going on involving an off-by-one read of _timerOverflowsNoReset. As in while busy-looping, it will perform one call to MOTOR_GetCurrentTime() that erroneously sees _timerOverflowsNoReset as one greater than it actually is, causing the loop to end early, and then on the next read after that it sees the correct value again.
I have other problems with my application that I'm having trouble pinning down, and I'm hoping that if I resolve this, it might resolve these other problems as well if they share a similar cause.
Edit: Among other changes, I've changed _timerOverflowsNoReset and some other globals from 32-bit unsigned to 16-bit unsigned in the implementation I now have.
You can read this value TWICE:
unsigned long GetTmrOverflowNo()
{
unsigned long ovfl1, ovfl2;
do {
ovfl1 = _timerOverflowsNoReset;
ovfl2 = _timerOverflowsNoReset;
} while (ovfl1 != ovfl2);
return ovfl1;
}
unsigned long MOTOR_GetCurrentTime(void)
{
const unsigned long ticksPerCycle = 0xFFFF;
const unsigned long ticksPerMicrosecond = 3; // 24 MHZ / 8 (prescaler)
const unsigned long ticks = GetTmrOverflowNo() * ticksPerCycle + TCNT;
const unsigned long microseconds = ticks / ticksPerMicrosecond;
return microseconds;
}
If _timerOverflowsNoReset increments much slower then execution of GetTmrOverflowNo(), in worst case inner loop runs only two times. In most cases ovfl1 and ovfl2 will be equal after first run of while() loop.
Calculate the tick count, then check if while doing that the overflow changed, and if so repeat;
#define TCNT_BITS 16 ; // TCNT register width
uint32_t MOTOR_GetCurrentTicks(void)
{
uint32_t ticks = 0 ;
uint32_t overflow_count = 0;
do
{
overflow_count = _timerOverflowsNoReset ;
ticks = (overflow_count << TCNT_BITS) | TCNT;
}
while( overflow_count != _timerOverflowsNoReset ) ;
return ticks ;
}
the while loop will iterate either once or twice no more.
Based on the answers #AlexeyEsaulenko and #jeb provided, I gained understanding into the cause of this problem and how I could tackle it. As both their answers were helpful and the solution I currently have is sort of a mixture of the two, I can't decide which of the two answers to accept, so instead I'll upvote both answers and keep this question open.
This is how I now implement MOTOR_GetCurrentTime:
unsigned long MOTOR_GetCurrentTime(void)
{
const unsigned long ticksPerMicrosecond = 3; // 24 MHZ / 8 (prescaler)
unsigned int countA;
unsigned int countB;
unsigned int timerOverflowsA;
unsigned int timerOverflowsB;
unsigned long ticks;
unsigned long microseconds;
// Loops until TCNT and the timer overflow count can be reliably determined.
do
{
timerOverflowsA = _timerOverflowsNoReset;
countA = TCNT;
timerOverflowsB = _timerOverflowsNoReset;
countB = TCNT;
} while (timerOverflowsA != timerOverflowsB || countA >= countB);
ticks = ((unsigned long)timerOverflowsA << 16) + countA;
microseconds = ticks / ticksPerMicrosecond;
return microseconds;
}
This function might not be as efficient as other proposed answers, but it gives me confidence that it will avoid some of the pitfalls that have been brought to light. It works by repeatedly reading both the timer overflow count and TCNT register twice, and only exiting the loop when the following two conditions are satisfied:
the timer overflow count hasn't changed while reading TCNT for the first time in the loop
the second count is greater than the first count
This basically means that if MOTOR_GetCurrentTime is called around the time that a timer overflow occurs, we wait until we've safely moved on to the next cycle, indicated by the second TCNT read being greater than the first (e.g. 0x0001 > 0x0000).
This does mean that the function blocks until TCNT increments at least once, but since that occurs every 333 nanoseconds I don't see it being problematic.
I've tried running my test 20 times in a row and haven't noticed any tearing, so I believe this works. I'll continue to test and update this answer if I'm wrong and the issue persists.
Edit: As Vroomfondel points out in the comments below, the check I do involving countA and countB also incidentally works for me and can potentially cause the loop to repeat indefinitely if _timerOverflowsNoReset is read fast enough. I'll update this answer when I've come up with something to address this.
The atomic reads are not the main problem here.
It's the problem that the overflow-ISR and TCNT are highly related.
And you get problems when you read first TCNT and then the overflow counter.
Three sample situations:
TCNT=0x0000, Overflow=0 --- okay
TCNT=0xFFFF, Overflow=1 --- fails
TCNT=0x0001, Overflow=1 --- okay again
You got the same problems, when you change the order to: First read overflow, then TCNT.
You could solve it with reading twice the totalOverflow counter.
disable_ints();
uint16_t overflowsA=totalOverflows;
uint16_t cnt = TCNT;
uint16_t overflowsB=totalOverflows;
enable_ints();
uint32_t totalCnt = cnt;
if ( overflowsA != overflowsB )
{
if (cnt < 0x4000)
totalCnt += 0x10000;
}
totalCnt += (uint32_t)overflowsA << 16;
If the totalOverflowCounter changed while reading the TCNT, then it's necessary to check if the value in tcnt is already greater 0 (but below ex. 0x4000) or if tcnt is just before the overflow.
One technique that can be helpful is to maintain two or three values that, collectively, hold overlapping portions of a larger value.
If one knows that a value will be monotonically increasing, and one will never go more than 65,280 counts between calls to "update timer" function, one could use something like:
// Note: Assuming a platform where 16-bit loads and stores are atomic
uint16_t volatile timerHi, timerMed, timerLow;
void updateTimer(void) // Must be only thing that writes timers!
{
timerLow = HARDWARE_TIMER;
timerMed += (uint8_t)((timerLow >> 8) - timerMed);
timerHi += (uint8_t)((timerMed >> 8) - timerHi);
}
uint32_t readTimer(void)
{
uint16_t tempTimerHi = timerHi;
uint16_t tempTimerMed = timerMed;
uint16_t tempTimerLow = timerLow;
tempTimerMed += (uint8_t)((tempTimerLow >> 8) - tempTimerMed);
tempTimerHi += (uint8_t)((tempTimerMed >> 8) - tempTimerHi);
return ((uint32_t)tempTimerHi) << 16) | tempTimerLow;
}
Note that readTimer reads timerHi before it reads timerLow. It's possible that updateTimer might update timerLow or timerMed between the time readTimer reads
timerHi and the time it reads those other values, but if that occurs, it will
notice that the lower part of timerHi needs to be incremented to match the upper
part of the value that got updated later.
This approach can be cascaded to arbitrary length, and need not use a full 8 bits
of overlap. Using 8 bits of overlap, however, makes it possible to form a 32-bit
value by using the upper and lower values while simply ignoring the middle one.
If less overlap were used, all three values would need to take part in the
final computation.
The problem is that the writes to _timerOverflowsNoReset isn't atomic and you don't protect them. This is a bug. Writing atomic from the ISR isn't very important, as the HCS12 blocks the background program during interrupt. But reading atomic in the background program is absolutely necessary.
Also, have in mind that Codewarrior/HCS12 generates somewhat ineffective code for 32 bit arithmetic.
Here is how you can fix it:
Drop unsigned long for the shared variable. In fact you don't need a counter at all, given that your background program can service the variable within 22ms real-time - should be very easy requirement. Keep your 32 bit counter local and away from the ISR.
Ensure that reads of the shared variable are atomic. Disassemble! It must be a single MOV instruction or similar; otherwise you must implement semaphores.
Don't read any volatile variable inside complex expressions. Not only the shared variable but also the TCNT. Your program as it stands has a tight coupling between the slow 32 bit arithmetic algorithm's speed and the timer, which is very bad. You won't be able to reliably read TCNT with any accuracy, and to make things worse you call this function from other complex code.
Your code should be changed to something like this:
static volatile bool overflow;
void timovf_isr(void)
{
// Clear the interrupt.
TFLG2_TOF = 1;
// TEMP
overflow = true;
// ...
}
unsigned long MOTOR_GetCurrentTime(void)
{
bool of = overflow; // read this on a line of its own, ensure this is atomic!
uint16_t tcnt = TCNT; // read this on a line of its own
overflow = false; // ensure this is atomic too
if(of)
{
_timerOverflowsNoReset++;
}
/* calculations here */
return microseconds;
}
If you don't end up with atomic reads, you will have to implement semaphores, block the timer interrupt or write the reading code in inline assembler (my recommendation).
Overall I would say that your design relying on TOF is somewhat questionable. I think it would be better to set up a dedicated timer channel and let it count up a known time unit (10ms?). Any reason why you can't use one of the 8 timer channels for this?
It all boils down to the question of how often you do read the timer and how long the maximum interrupt sequence will be in your system (i.e. the maximum time the timer code can be stopped without making "substantial" progress).
Iff you test for time stamps more often than the cycle time of your hardware timer AND those tests have the guarantee that the end of one test is no further apart from the start of its predecessor than one interval (in your case 22ms), all is well. In the case your code is held up for so long that these preconditions don't hold, the following solution will not work - the question then however is whether the time information coming from such a system has any value at all.
The good thing is that you don't need an interrupt at all - any try to compensate for the inability of the system to satisfy two equally hard RT problems - updating your overflow timer and delivering the hardware time is either futile or ugly plus not meeting the basic system properties.
unsigned long MOTOR_GetCurrentTime(void)
{
static uint16_t last;
static uint16_t hi;
volatile uint16_t now = TCNT;
if (now < last)
{
hi++;
}
last = now;
return now + (hi * 65536UL);
}
BTW: I return ticks, not microseconds. Don't mix concerns.
PS: the caveat is that such a function is not reentrant and in a sense a true singleton.

Average from error prone measurement samples without buffering

I got a µC which measures temperature with of a sensor with an ADC. Due to various circumstances it can happen, that the reading is 0 (-30°C) or a impossible large Value (500-1500°C). I can't fix the reasons why these readings are so bad (time critical ISRs and sometimes a bad wiring) so I have to fix it with a clever piece of code.
I've come up with this (code gets called OVERSAMPLENR-times in a ISR):
#define OVERSAMPLENR 16 //read value 16 times
#define TEMP_VALID_CHANGE 0.15 //15% change in reading is possible
//float raw_tem_bed_value = <sum of all readings>;
//ADC = <AVR ADC reading macro>;
if(temp_count > 1) { //temp_count = amount of samples read, gets increased elsewhere
float avgRaw = raw_temp_bed_value / temp_count;
float diff = (avgRaw > ADC ? avgRaw - ADC : ADC - avgRaw) / (avgRaw == 0 ? 1 : avgRaw); //pulled out to shorten the line for SO
if (diff > TEMP_VALID_CHANGE * ((OVERSAMPLENR - temp_count) / OVERSAMPLENR)) //subsequent readings have a smaller tollerance
raw_temp_bed_value += avgRaw;
else
raw_temp_bed_value += ADC;
} else {
raw_temp_bed_value = ADC;
}
Where raw_temp_bed_value is a static global and gets read and processed later, when the ISR got fired 16 times.
As you can see, I check if the difference between the current average and the new reading is less then 15%. If so I accept the reading, if not, I reject it and add the current average instead.
But this breaks horribly if the first reading is something impossible.
One solution I though of is:
In the last line the raw_temp_bed_value is reset to the first ADC reading. It would be better to reset this to raw_temp_bed_value/OVERSAMPLENR. So I don't run in a "first reading error".
Do you have any better solutions? I though of some solutions featuring a moving average and use the average of the moving average but this would require additional arrays/RAM/cycles which we want to prevent.
I've often used something what I call rate of change to the sampling. Use a variable that represents how many samples it takes to reach a certain value, like 20. Then keep adding your sample difference to a variable divided by the rate of change. You can still use a threshold to filter out unlikely values.
float RateOfChange = 20;
float PreviousAdcValue = 0;
float filtered = FILTER_PRESET;
while(1)
{
//isr gets adc value here
filtered = filtered + ((AdcValue - PreviousAdcValue)/RateOfChange);
PreviousAdcValue = AdcValue;
sleep();
}
Please note that this isn't exactly like a low pass filter, it responds quicker and the last value added has the most significance. But it will not change much if a single value shoots out too much, depending on the rate of change.
You can also preset the filtered value to something sensible. This prevents wild startup behavior.
It takes up to RateOfChange samples to reach a stable value. You may want to make sure the filtered value isn't used before that by using a counter to count the number of samples taken for example. If the counter is lower than RateOfChange, skip processing temperature control.
For a more advanced (temperature) control routine, I highly recommend looking into PID control loops. These add a plethora of functionality to get a fast, stable response and keep something at a certain temperature efficiently and keep oscillations to a minimum. I've used the one used in the Marlin firmware in my own projects and works quite well.

Moving Average for ADC

Hi All i am working on a project where I have to calculate the moving average of ADC readings. The data coming out from ADC represent an Sinusoidal wave.
This is the code I am using to get moving average of a given signal.
longNew = (8 bit data from ADC);
longNew = longNew << 8;
//Division
longNew = longNew >> 8; //255 Samples
longTemp = avgALong >> 8;
avgALong -= longTemp;// Old data
avgALong += longNew;// New Data
avgA = avgALong >> 8;//256 Point Average
Reference Image
Please refer this image for upper limit and lower limit relative to reference (or avgA)
Currently I am using a constant value to obtain the upper limit and lower limit of voltage for my application
which I am calculating as follows
upper_limit = avgA + Delta(x);
lower_limit = avgA - Delta(x);
In my case I am taking Delta(x) = 15.
I want to calculate this constant expression or Delta(x) based on signal strength.
The maximum voltage level of signal is 255 or 5Volt.
The minimum voltage level of signal varies frequently because of that a constant value is not useful for my application which determines the lower and upper limit.
Please help
Thank you
Now with the description of what's going on, I think you want three running averages:
The input signal. Lightly average it to help tamp down noise.
upper_limit When you determine local maximums, push them into this average.
lower_limit When you determine local minimums, push them into this average.
Your delta would be (upper_limit-lower_limit)/8 (or 4, or whatever). Your hysteresis points would be upper_limit - delta and lower_limit + delta.
Every time you transition to '1', push the current local minimum into the lower_limit moving average and then begin searching for a new local maximum. When you transition to '0', push the local maximum into the upper_limit moving average and begin searching for a new local minimum.
There is a problem if your signal strength is wildly varying (you could get to a point where your signal suddenly drops into the hysteresis band and you never get any more transitions). You could solve this a few ways:
Count how much time you spend in the hysteresis band and reset everything if you spend too much time.
Or
for each sample in the hysteresis band, bring upper_limit and lower_limit slightly closer together. Eventually they'd collapse to the point where you start detecting transitions again.
Take this with a grain of salt. If you're doing this for a school project, it almost certainly wont match whatever scholarly method your professor is looking for.

Audio tone for Microcontroller p18f4520 using a for loop

I'm using C Programming to program an audio tone for the microcontroller P18F4520.
I am using a for loop and delays to do this. I have not learned any other ways to do so and moreover it is a must for me to use a for loop and delay to generate an audio tone for the target board. The port for the speaker is at RA4. This is what I have done so far.
#include <p18f4520.h>
#include <delays.h>
void tone (float, int);
void main()
{
ADCON1 = 0x0F;
TRISA = 0b11101111;
/*tone(38.17, 262); //C (1)
tone(34.01, 294); //D (2)
tone(30.3, 330); //E (3)
tone(28.57, 350); //F (4)
tone(25.51, 392); //G (5)
tone(24.04, 416); //G^(6)
tone(20.41, 490); //B (7)
tone(11.36, 880); //A (8)*/
tone(11.36, 880); //A (8)
}
void tone(float n, int cycles)
{
unsigned int i;
for(i=0; i<cycles; i++)
{
PORTAbits.RA4 = 0;
Delay10TCYx(n);
PORTAbits.RA4 = 1;
Delay10TCYx(n);
}
}
So as you can see what I have done is that I have created a tone function whereby n is for the delay and cycles is for the number of cycles in the for loop. I am not that sure whether my calculations are correct but so far it is what I have done and it does produce a tone. I'm just not sure whether it is really a A tone or G tone etc. How I calculate is that firstly I will find out the frequency tone for example A tone has a frequency of 440Hz. Then I will find the period for it whereby it will be 1/440Hz. Then for a duty cycle, I would like the tone to beep only for half of it which is 50% so I will divide the period by 2 which is (1/440Hz)/2 = 0.001136s or 1.136ms. Then I will calculate delay for 1 cycle for the microcontroller 4*(1/2MHz) which is 2µs. So this means that for 1 cycle it will delay for 2µs, the ratio would be 2µs:1cyc. So in order to get the number of cycles for 1.136ms it will be 1.136ms:1.136ms/2µs which is 568 cycles. Currently at this part I have asked around what should be in n where n is in Delay10TCYx(n). What I have gotten is that just multiply 10 for 11.36 and for a tone A it will be Delay10TCYx(11.36). As for the cycles I would like to delay for 1 second so 1/1.136ms which is 880. That's why in my method for tone A it is tone(11.36, 880). It generates a tone and I have found out the range of tones but I'm not really sure if they are really tones C D E F G G^ B A.
So my questions are
1. How do I really calculate the delay and frequency for tone A?
2. for the state of the for loop for the 'cycles' is the number cycles but from the answer that I will get from question 1, how many cycles should I use in order to vary the period of time for tone A? More number of cycles will be longer periods of tone A? If so, how do I find out how long?
3. When I use a function to play the tone, it somehow generates a different tone compared to when I used the for loop directly in the main method. Why is this so?
4. Lastly, if I want to stop the code, how do I do it? I've tried using a for loop but it doesn't work.
A simple explanation would be great as I am just a student working on a project to produce tones using a for loop and delays. I've searched else where whereby people use different stuff like WAV or things like that but I would just simply like to know how to use a for loop and delay for audio tones.
Your help would be greatly appreciated.
First, you need to understand the general approach to how you generate an interrupt at an arbitrary time interval. This lets you know you are able to have a specific action occur every x microseconds, [1-2].
There are already existing projects that play a specific tone (in Hz) on a PIC like what you are trying to do, [3-4].
Next, you'll want to take an alternate approach to generating the tone. In the case of your delay functions, you are effectively using up the CPU for nothing, when it could be done something else. You would be better off using timer interrupts directly so you're not "burning the CPU by idling".
Once you have this implemented, you just need to know the corresponding frequency for the note you are trying to generate, either by using a formula to generate a frequency from a specific musical note [5], or using a look-up table [6].
What is the procedure for PIC timer0 to generate a 1 ms or 1 sec interrupt?, Accessed 2014-07-05, <http://www.edaboard.com/thread52636.html>
Introduction to PIC18′s Timers – PIC Microcontroller Tutorial, Accessed 2014-07-05, <http://extremeelectronics.co.in/microchip-pic-tutorials/introduction-to-pic18s-timers-pic-microcontroller-tutorial/>
Generate Ring Tones on your PIC16F87x Microcontroller, Accessed 2014-07-05, <http://retired.beyondlogic.org/pic/ringtones.htm>http://retired.beyondlogic.org/pic/ringtones.htm
AN655 - D/A Conversion Using PWM and R-2R Ladders to Generate Sine and DTMF Waveforms, Accessed 2014-07-05, <http://www.microchip.com/stellent/idcplg?IdcService=SS_GET_PAGE&nodeId=1824&appnote=en011071>
Equations for the Frequency Table, Accessed 2014-07-05, <http://www.phy.mtu.edu/~suits/NoteFreqCalcs.html>
Frequencies for equal-tempered scale, A4 = 440 Hz, Accessed 2014-07-05, <http://www.phy.mtu.edu/~suits/notefreqs.html>
How to compute the number of cycles for delays to get a tone of 440Hz ? I will assume that your clock speed is 1/2MHz or 500kHz, as written in your question.
1) A 500kHz clock speed corresponds to a tic every 2us. Since a cycle is 4 clock tics, a cycle lasts 8 us.
2) A frequency of 440Hz corresponds to a period of 2.27ms, or 2270us, or 283 cycles.
3) The delay is called twice per period, so the delays should be about 141 cycles for A.
About your tone function...As you compile your code, you must face some kind of warning, something like warning: (42) implicit conversion of float to integer... The prototype of the delay function is void Delay10TCYx(unsigned char); : it expects an unsigned char, not a float. You will not get any floating point precision. You may try something like :
void tone(unsigned char n, unsigned int cycles)
{
unsigned int i;
for(i=0; i<cycles; i++)
{
PORTAbits.RA4 = 0;
Delay1TCYx(n);
PORTAbits.RA4 = 1;
Delay1TCYx(n);
}
}
I changed for Delay1TCYx() for accuracy. Now, A 1 second A-tone would be tone(141,440). A 1 second 880Hz-tone would be tone(70,880).
There is always a while(1) is all examples about PIC...if you just need one beep at start, do something like :
void main()
{
ADCON1 = 0x0F;
TRISA = 0b11101111;
tone(141,440);
tone(70,880);
tone(141,440);
while(1){
}
}
Regarding the change of tone when embedded in a function, keep in mind that every operation takes at least one cycle. Calling a function may take a few cycles. Maybe declaring static inline void tone (unsigned char, int) would be a good thing...
However, as signaled by #Dogbert , using delays.h is a good start for beginners, but do not get used to it ! Microcontrollers have lots of features to avoid counting and to save some time for useful computations.
timers : think of it as an alarm clock. 18f4520 has 4 of them.
interruption : the PIC stops the current operation, performs the code specified for this interruption, erases the flag and comes back to its previous task. A timer can trigger an interruption.
PWM pulse wave modulation. 18f4520 has 2 of them. Basically, it generates your signal !

Finding position of '1's efficiently in an bit array

I'm wiring a program that tests a set of wires for open or short circuits. The program, which runs on an AVR, drives a test vector (a walking '1') onto the wires and receives the result back. It compares this resultant vector with the expected data which is already stored on an SD Card or external EEPROM.
Here's an example, assume we have a set of 8 wires all of which are straight through i.e. they have no junctions. So if we drive 0b00000010 we should receive 0b00000010.
Suppose we receive 0b11000010. This implies there is a short circuit between wire 7,8 and wire 2. I can detect which bits I'm interested in by 0b00000010 ^ 0b11000010 = 0b11000000. This tells me clearly wire 7 and 8 are at fault but how do I find the position of these '1's efficiently in an large bit-array. It's easy to do this for just 8 wires using bit masks but the system I'm developing must handle up to 300 wires (bits). Before I started using macros like the following and testing each bit in an array of 300*300-bits I wanted to ask here if there was a more elegant solution.
#define BITMASK(b) (1 << ((b) % 8))
#define BITSLOT(b) ((b / 8))
#define BITSET(a, b) ((a)[BITSLOT(b)] |= BITMASK(b))
#define BITCLEAR(a,b) ((a)[BITSLOT(b)] &= ~BITMASK(b))
#define BITTEST(a,b) ((a)[BITSLOT(b)] & BITMASK(b))
#define BITNSLOTS(nb) ((nb + 8 - 1) / 8)
Just to further show how to detect an open circuit. Expected data: 0b00000010, received data: 0b00000000 (the wire isn't pulled high). 0b00000010 ^ 0b00000000 = 0b0b00000010 - wire 2 is open.
NOTE: I know testing 300 wires is not something the tiny RAM inside an AVR Mega 1281 can handle, that is why I'll split this into groups i.e. test 50 wires, compare, display result and then move forward.
Many architectures provide specific instructions for locating the first set bit in a word, or for counting the number of set bits. Compilers usually provide intrinsics for these operations, so that you don't have to write inline assembly. GCC, for example, provides __builtin_ffs, __builtin_ctz, __builtin_popcount, etc., each of which should map to the appropriate instruction on the target architecture, exploiting bit-level parallelism.
If the target architecture doesn't support these, an efficient software implementation is emitted by the compiler. The naive approach of testing the vector bit by bit in software is not very efficient.
If your compiler doesn't implement these, you can still code your own implementation using a de Bruijn sequence.
How often do you expect faults? If you don't expect them that often, then it seems pointless to optimize the "fault exists" case -- the only part that will really matter for speed is the "no fault" case.
To optimize the no-fault case, simply XOR the actual result with the expected result and a input ^ expected == 0 test to see if any bits are set.
You can use a similar strategy to optimize the "few faults" case, if you further expect the number of faults to typically be small when they do exist -- mask the input ^ expected value to get just the first 8 bits, just the second 8 bits, and so on, and compare each of those results to zero. Then, you just need to search for the set bits within the ones that are not equal to zero, which should narrow the search space to something that can be done pretty quickly.
You can use a lookup table. For example log-base-2 lookup table of 255 bytes can be used to find the most-significant 1-bit in a byte:
uint8_t bit1 = log2[bit_mask];
where log2 is defined as follows:
uint8_t const log2[] = {
0, /* not used log2[0] */
0, /* log2[0x01] */
1, 1 /* log2[0x02], log2[0x03] */
2, 2, 2, 2, /* log2[0x04],..,log2[0x07] */
3, 3, 3, 3, 3, 3, 3, 3, /* log2[0x08],..,log2[0x0F */
...
}
On most processors a lookup table like this will go to ROM. But AVR is a Harvard machine and to place data in code space (ROM) requires special non-standard extension, which depends on the compiler. For example the IAR AVR compiler would need use the extended keyword __flash. In WinAVR (GNU AVR) you would need to use the PROGMEM attribute, but it's more complex than that, because you would also need to use special macros to to read from the program space.
I think there is only one way to do this:
Create an array out "outdata". Each item of the array can for example correspond an 8-bit port register.
Send the outdata on the wires.
Read back this data as "indata".
Store the indata in an array mapped exactly as the outdata.
In a loop, XOR each byte of outdata with each byte of indata.
I would strongly recommend inline functions instead of those macros.
Why can't your MCU handle 300 wires?
300/8 = 37.5 bytes. Rounded to 38. It needs to be stored twice, outdata and indata, 38*2 = 76 bytes.
You can't spare 76 bytes of RAM?
I think you're missing the forest through the trees. Seems like a bed of nails test. First test some assumptions:
1) You know which pins should be live for each pin tested/energized.
2) you have a netlist translated for step 1 into a file on sd
If you operate on a byte level as well as bit, it simplifies the issue. If you energize a pin, there is an expected pattern out stored in your file. First find the mismatched bytes; identify mismatched pins in the byte; finally store the energized pin with the faulty pin numbers.
You don't need an array for searching, or results. general idea:
numwires=300;
numbytes=numwires/8 + (numwires%8)?1:0;
for(unsigned char currbyte=0; currbyte<numbytes; currbyte++)
{
unsigned char testbyte=inchar(baseaddr+currbyte)
unsigned char goodbyte=getgoodbyte(testpin,currbyte/*byte offset*/);
if( testbyte ^ goodbyte){
// have a mismatch report the pins
for(j=0, mask=0x01; mask<0x80;mask<<=1, j++){
if( (mask & testbyte) != (mask & goodbyte)) // for clarity
logbadpin(testpin, currbyte*8+j/*pin/wirevalue*/, mask & testbyte /*bad value*/);
}
}

Resources