floating point bug in embedded system - c

On a Rabbit microcontroller..
(1)
I am incrementing f1 every second by converting into hours to the existing value and store in the same register.
void main()
{
float f1;
int i;
f1 = 4096;
// Assume that I am simulating a one second through each iteration of the following loop
for(i = 0; i < 100; i++)
{
f1 += 0.000278; // f1 does not change from 4096
printf("\ni: %d f1: %.06f", i, f1);
}
}
(2)
Another question is when I try to store a 32-bit unsigned long int value into float variable and accessing it does not give me the value I have stored. What am I doing wrong?
void main()
{
unsigned long L1;
int temp;
float f1;
L1 = 4000000000; // four billion
f1 = (float)L1;
// Now print both
// You see that L1: 4000000000 while f1: -4000000000.000000
printf("\nL1: %lu f1:%.6f", L1, f1);
}

The first problem is that single precision (32 bit) binary floating point is good for only approximately 6 significant figures in decimal. So if you start with 4096.00 anything less than .01 cannot be added to the value. Using double precision will improve the result at some significant cost.
It is usually unnecessary and inappropriate to use floating point, it is very expensive on a processor without an FPU - especially an 8 bitter. Moreover your literal approximation of one second in hours (1.0f/3600.0f hours) will introduce significant cumulative error in any case. You may be better off storing time in integer seconds, and converting to hours where necessary for display or output.
The second problem is less clear, but seems likely to be an issue with the Rabbit compiler implementation of floating point or possibly of the %f format specifier in the printf() implementation. Check the ISO compliance statement in the compiler documentation - there may be restrictions - especially on floating point. Again you may find that using a double resolve the problem - especially as strictly that is the type expected by the %f format specifier in an ISO conforming implementation. As I said, you are probably best off avoiding floating point altogether on such a target.
Note that if you are using Rabbit's Dynamic C compiler, you should be clear that Dynamic C is not an ISO conforming C compiler. It is a proprietary C-like language, that is similar enough to C to cause a great deal of confusion! Specifically it does not support double precision (double) floating point.

f1 += (1/3600); should be f1 += (1.0f/3600.0f);.
If you perform integer division then result will also be integer.

Related

Is it defined what will happen if you shift a float?

I am following This video tutorial to implement a raycaster. It contains this code:
if(ra > PI) { ry = (((int)py>>6)<<6)-0.0001; rx=(py-ry)*aTan+px; yo=-64; xo=-yo*aTan; }//looking up
I hope I have transcribed this correctly. In particular, my question is about casting py (it's declared as float) to integer, shifting it back and forth, subtracting something, and then assigning it to a ry (also a float) This line of code is entered at time 7:24, where he also explains that he wants to
round the y position to the nearest 64th value
(I'm unsure if that means the nearest multiple of 64 or the nearest (1/64), but I know that the 6 in the source is derived from the number 64, being 2⁶)
For one thing, I think that it would be valid for the compiler to load (say) a 32-bit float into a machine register, and then shift that value down by six spaces, and then shift it back up by six spaces (these two operations could interfere with the mantissa, or the exponent, or maybe something else, or these two operations could be deleted by a peephole optimisation step.)
Also I think it would be valid for the compiler to make demons fly out of your nose when this statement is executed.
So my question is, is (((int)py>>6)<<6) defined in C when py is float?
is (((int)py>>6)<<6) defined in C when py is float?
It is certainly undefined behavior (UB) for many float. The cast to an int is UB for float with a whole number value outside the [INT_MIN ... INT_MAX] range.
So code is UB for about 38% of all typical float - the large valued ones, NaNs and infinities.
For typical float, a cast to int128_t is defined for nearly all float.
To get to OP's goal, code could use the below, which I believe to be well defined for all float.
If anything, use the below to assess the correctness of one's crafted code.
// round the y position to the nearest 64th value
float round_to_64th(float x) {
if (isfinite(x)) {
float ipart;
// The modf functions break the argument value into integral and fractional parts
float frac = modff(x, &ipart);
x = ipart + roundf(frac*64)/64;
}
return x;
}
"I'm unsure if that means the nearest multiple of 64 or the nearest (1/64)"
On review, OP's code is attempting to truncate to the nearest multiple of 64 or 2⁶.
It is still UB for many float.
That code doesn't shift a float because the bitshift operators aren't defined for floating-point types. If you try it you will get a compiler error.
Notice that the code is (int)py >> 6, the float is cast to an int before the shift operation. The integer value is what is being shifted.
If your question is "what will happen if you shift a float?", the answer is it won't compile. Example on Compiler Explorer.
The best possible recreation of the shift ops for floating points, in short, without using additional functions are the following:
Left shift:
ShiftFloat(py,6,1);
Right shift:
ShiftFloat(py,6,0);
float ShiftFloat(float x, int count, int ismultiplication)
{
float value = x;
for (int i = 0; i < count; ++i)
{
value *= (powf(0.5,(float)(ismultiplication^1)) / powf(2.0,(float)(ismultiplication)));
}
return count != 0 ? value : x;
}

Dividing with/without using floats in C [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 6 years ago.
Improve this question
Below is the main function I wrote in C (for PIC18F8722 microprocessor) attempting to drive 2 multiplexing 7 segments displays at a specific frequency set by the unsigned int function get_ADC_value(). The displays also display the current multiplexing frequency. This frequency range is set by #define to be in the range LAB_Fmin and LAB_Fmax and must scale as the get_ADC_value() increases or decreases from 0 to 255.
This code however does not work as I think there is implicit conversion from int to float at freq =.
The challenge is to fix this error with floats and to find an alternative using only integer types (int, char...).
while (1) {
unsigned int x, y, z;
float freq, delay;
x = get_ADC_value();
y = x & 0b00001111;
z = (x & 0b11110000) >> 4 ;
freq = LAB_Fmin + (((LAB_Fmax) - (LAB_Fmin))/ 255)*x ;
delay = 1/(freq*1000); // convert hZ to ms delay accurately
LATF = int_to_SSD(y);
LATH = 0b11111110; //enable 7seg U1
for (unsigned int i = 0; i<(delay) ; i++){
Delay10TCYx(250); //1ms delay
}
LATF = int_to_SSD(z);
LATH = 0b11111101; //enable 7seg U2
for (unsigned int j = 0; j<(delay) ; j++){
Delay10TCYx(250); //1ms delay
}
}
C is defined to divide ints using integer division, and only when there is a float does it "promote" other ints to floats first. Note that this even happens if it will be assigned to a float - if the right-hand side is all ints, then the division will all be integer, and only for the final assignment will C convert the int result to float.
So, with your line:
freq = LAB_Fmin + (((LAB_Fmax) - (LAB_Fmin))/ 255)*x ;
it all depends on what LAB_Fmax and LAB_Fmin are. It doesn't matter what freq or x are, because the "damage" will already have been done due to the parentheses forcing the division to be first.
If those LAB_F variables are ints, the easiest way to use floating point division is to simply tell C that you want that by making the constant 255 a floating point number rather than an integer, by using a decimal point: 255. (or 255.0 to be less subtle).
If you want to use integer arithmetic only, then the usual suggestion is to do all of your multiplications before any divisions. Of course, this runs the risk of overflowing the intermediate result - to help that, you can use the type long. Define your LAB_F or x variables as long, and do the division last:
freq = LAB_Fmin + (((LAB_Fmax) - (LAB_Fmin)) * x / 255);
Code review:
unsigned int x, y, z; Avoid using raw integer types on embedded systems. Exact-width types from stdint.h should always be used, so you know exactly what size you use. If you don't have access to stdint.h then typedef those types yourself.
float freq, delay; Floating point numbers should generally be avoided on most embedded systems. Particularly on 8 bit MCUs with no FPU! This will result in software-defined floating point numbers that are incredibly slow and memory-consuming. There seem to be no reason for you to use floats in this program, it would seem that you should be able to write this algorithm with uint16_t or smaller, unless you have extreme accuracy requirements.
x = get_ADC_value(); Since you only seem interested in 8 bits of the ADC read, why not use an 8 bit type?
Please note that binary number literals are not standard C.
((LAB_Fmax) - (LAB_Fmin))/ 255 This looks fishy. First of all, are these integers or floats? What's their size? The answer to your question depends on that. By swapping the literal to 255.0f you can force a conversion to float. But are you sure the division should be by 255? And not 256?
i<(delay). You should always avoid using floating point expressions inside loop conditions, since it makes the loop needlessly slow and can potentially lead to floating point inaccuracy bugs. Also, the parenthesis fills no purpose.
Overall, your program suffers from "sloppy typing", meaning that the programmer has not given any thought about what types that are used in each expression. Note that literals have types too. Implicit conversions might cause a lot of these expressions to be calculated on too large types, which is very bad news for the PIC. I'd recommend reading up on "balancing", aka the usual arithmetic conversions.
This "sloppy typing" will cause your program to get very bloated and slow, for nothing gained. You must keep in mind that PIC is perhaps the least code-efficient MCU still manufactured. When writing C code for any 8-bit MCU, you should avoid types larger than 8 bit. In particular, you should avoid 32 bit integers and floating point numbers like the plague.
Your program re-scales all data to types that ease the thinking for the programmer. This is a common design mistake - instead your program should use types that are easy to use for the processor. For example, instead of milliseconds, you could use timer ticks as the unit.
You are correct about integer division.
Change to
freq = LAB_Fmin + (((LAB_Fmax) - (LAB_Fmin)) / 255.0)*x;
^^
freq = LAB_Fmin + (((LAB_Fmax) - (LAB_Fmin))/ 255)*x ;
This is indeed an implicit conversion to integer, and you're doing integer division to do that.
That is because 255 is an Integer literal.
Change it to 255.0 to be a double literal, which should play nicely with your calculation.
If you want to be more precise, you can even use a float literal, like 255.0f or an explicit cast like (float)255.
Your code could look like this then:
freq = LAB_Fmin + (((LAB_Fmax) - (LAB_Fmin))/ 255.0)*x ;
Or this:
freq = LAB_Fmin + (((LAB_Fmax) - (LAB_Fmin))/ (float)255)*x ;
Math operations with integers will by default result into an integer too,
so you need to either express one of the literals as double/float
freq = LAB_Fmin + (((LAB_Fmax) - (LAB_Fmin))/ 255.0)*x ;
or cast (float)
as many other state the first option is the most commonly implemented.

Different Answers by removing a printf statement

This is the link to the question on UVa online judge.
https://uva.onlinejudge.org/index.php?option=com_onlinejudge&Itemid=8&category=29&page=show_problem&problem=1078
My C code is
#include <stdio.h>
double avg(double * arr,int students)
{
int i;
double average=0;
for(i=0;i<students;i++){
average=average+(*(arr+i));
}
average=average/students;
int temp=average*100;
average=temp/100.0;
return average;
}
double mon(double * arr,int students,double average)
{
int i;
double count=0;
for(i=0;i<students;i++){
if(*(arr+i)<average){
double temp=average-*(arr+i);
int a=temp*100;
temp=a/100.0;
count=count+temp;
}
}
return count;
}
int main(void)
{
// your code goes here
int students;
scanf("%d",&students);
while(students!=0){
double arr[students];
int i;
for(i=0;i<students;i++){
scanf("%lf",&arr[i]);
}
double average=avg(arr,students);
//printf("%lf\n",average);
double money=mon(arr,students,average);
printf("$%.2lf\n",money);
scanf("%d",&students);
}
return 0;
}
One of the input and outputs are
Input
3
0.01
0.03
0.03
0
Output
$0.01
My output is $0.00.
However if I uncomment the line printf("%lf",average);
The Output is as follows
0.02 //This is the average
$0.01
I am running the code on ideone.com
Please explain why is this happening.
I believe I've found the culprit and a reasonable explanation.
On x86 processors, the FPU operates internally with extended precision, which is an 80-bit format. All of the floating point instructions operate with this precision. If a double is actually required, the compiler will generate code to convert the extended precision value down to a double precision value. Crucially, the commented-out printf forces such a conversion because the FPU registers must be saved and restored across that function call, and they will be saved as doubles (note that avg and mon are both inlined so no save/restore happens there).
In fact, instead of printf we can use the line static double dummy = average; to force the double conversion to occur, which also causes the bug to disappear: http://ideone.com/a1wadn
Your value of average is close to but not exactly 0.02 because of floating-point inaccuracies. When I do all the calculations explicitly with long double, and I print out the value of average, it is the following:
long double: 0.01999999999999999999959342418532
double: 0.02000000000000000041633363423443
Now you can see the problem. When you add the printf, average is forced to a double which pushes it above 0.02. But, without the printf, average will be a little less than 0.02 in its native extended precision format.
When you do int a=temp*100; the bug appears. Without the conversion, this makes a = 1. With the conversion, this makes a = 2.
To fix this, simply use int a=round(temp*100); - all your weird errors should vanish.
Of note, this bug is extremely sensitive to changes in the code. Anything that causes the registers to be saved (such as a printf pretty much anywhere) will actually cause the bug to vanish. Hence, this is an extremely good example of a heisenbug: a bug that vanishes when you try to investigate it.
#nneonneo well answered most of the issue: Slightly variant compilations result in slightly different floating-point code that result in nearly the same double answer, except one answer is just below 2.0 and the the other at or just about 2.0.
Like to add about the importance on not using conversion to int for floating point rounding.
Code like double temp; ... int a=temp*100; accentuate this difference resulting in a with a value of 1 or 2 as conversion to int is effective "truncate toward zero" - drop the fraction.
Rather than round to near 0.01 with code like:
double temp;
...
int a = temp*100; // Problem is here
temp = a/100.0;
Do not use int at all. Use
double temp;
...
temp = round(temp*100.0)/100.0;
Not only does this provide more consistent answers (as temp is unlikely to have values near a half-cent), it also allows temp values outside the int range. temp = 1e13; int a = temp/100; certainly results in undefined behavior.
Do not use conversion to int to round floating-point numbers: use round()
roundf(), roundl(), floor(), ceil(), etc. may also be useful. #jeff
You're dividing a double by an integer, which is integer division. In this case it will give a value of 0.
If you cast student to a double it should give you proper output.
average=average/(double)students;
There might be other locations this is needed depending on your arithmetic.

Precision loss / rounding difference when directly assigning double result to an int

Is there a reason why converting from a double to an int performs as expected in this case:
double value = 45.33;
double multResult = (double) value*100.0; // assign to double
int convert = multResult; // assign to int
printf("convert = %d\n", convert); // prints 4533 as expected
But not in this case:
double value = 45.33;
int multResultInt = (double) value*100.0; // assign directly to int
printf("multResultInt = %d\n", multResultInt); // prints 4532??
It seems to me there should be no difference. In the second case the result is still first stored as a double before being converted to an int unless I am not understanding some difference between casts and hard assignments.
There is indeed no difference between the two, but compilers are used to take some freedom when it comes down to floating point computations. For example compilers are free to use higher precision for intermediate results of computations but higher still means different so the results may vary.
Some compilers provide switches to always drop extra precision and convert all intermediate results to the prescribed floating point numbers (say 64bit double-precision numbers). This will make the code slower, however.
In the specific the number 45.33 cannot be represented exactly with a floating point value (it's a periodic number when expressed in binary and it would require an infinite number of bits). When multiplying by 100 this value may be you don't get an integer, but something very close (just below or just above).
int conversion or cast is performed using truncation and something very close to 4533 but below will become 4532, when above will become 4533; even if the difference is incredibly tiny, say 1E-300.
To avoid having problems be sure to account for numeric accuracy problems. If you are doing a computation that depends on exact values of floating point numbers then you're using the wrong tool.
#6502 has given you the theory, here's how to look at things experimentally
double v = 45.33;
int x = v * 100.0;
printf("x=%d v=%.20lf v100=%.20lf\n", x, v, v * 100.0 );
On my machine, this prints
x=4533 v=45.32999999999999829470 v100=4533.00000000000000000000
The value 45.33 does not have an exact representation when encoded as a 64-bit IEEE-754 floating point number. The actual value of v is slightly lower than the intended value due to the limited precision of the encoding.
So why does multiplying by 100.0 fix the problem on some machines? One possibility is that the multiplication is done with 80-bits of precision and then rounded to fit into a 64-bit result. The 80-bit number 4532.999... will round to 4533 when converted to 64-bits.
On your machine, the multiplication is evidently done with 64-bits of precision, and I would expect that v100 will print as 4532.999....

Why does GCC give an unexpected result when adding float values?

I'm using GCC to compile a program which adds floats, longs, ints and chars. When it runs, the result is bad. The following program unexpectedly prints the value of 34032.101562.
Recompiling with a Microsoft compiler gives the right result.
#include <stdio.h>
int main (void) {
const char val_c = 10;
const int val_i = 20;
const long val_l = 34000;
const float val_f = 2.1;
float result;
result = val_c + val_i + val_l + val_f;
printf("%f\n", result);
return 0;
}
What do you think the "right result" is? I'm guessing that you believe it is 34032.1. It isn't.
2.1 is not representable as a float, so val_f instead is initialized with the closest representable float value. In binary, 2.1 is:
10.000110011001100110011001100110011001100110011001...
a float has 24 binary digits, so the value of val_f in binary is:
10.0001100110011001100110
The expression resultat = val_c + val_i + val_l + val_f computes 34030 + val_f, which is evaluated in single-precision and causes another rounding to occur.
1000010011101110.0
+ 10.0001100110011001100110
-----------------------------------------
1000010011110000.0001100110011001100110
rounds to 24 digits:
-----------------------------------------
1000010011110000.00011010
In decimal, this result is exactly 34032.1015625. Because the %f format prints 6 digits after the decimal point (unless specified otherwise), this is rounded again, and printf prints 34032.101562.
Now, why do you not get this result when you compile with MSVC? The C and C++ standard allow floating-point calculations to be carried out in a wider type if the compiler chooses to do so. MSVC does this with your calculation, which means that the result of 34030 + val_f is not rounded before being passed to printf. In that case, the exact floating-point value being printed is 34032.099999999991268850862979888916015625, which is rounded to 34032.1 by printf.
Why don't all compilers do what MSVC does? A few reasons. First, it's slower on some processors. Second, and more importantly, although it can give more accurate answers, the programmer cannot depend on that -- seemingly unrelated code changes can cause the answer to change in the presence of this behavior. Because of this, carrying extra precision often causes more problems than it solves.
Google David Goldberg's paper "What Every Computer Scientist Should Know About
Floating-Point Arithmetic".
The float format has only about 6-7 digits of precision. Use %7.1f or some other reasonable format and you will like your results better.
I don't see any problem here. 2.1 has no exact representation in IEEE floating-point format, and as such, it is converting the entire answer to a floating-point number with around 6-7 (correct) sig-figs. If you need more precision, use a double.

Resources