I have a method that looks like this:
float * mutate(float* organism){
int i;
float sign = 1;
static float newOrg[INPUTS] = {0};
for (i = 0;i<INPUTS;i++){
if (rand() % 2 == 0) {
sign = 1;
} else {
sign = -1;
}
float temp = (organism[i] + sign);
printf("bf: %f af: %f diff: %f sign: %f sign2: %f temp: %f\n\n",
organism[i], (organism[i] + sign), (organism[i] + sign)-organism[i],
sign, sign+sign, temp);
newOrg[i] = organism[i] + sign;
}
return newOrg;
}
When sign is not 0 the first two "%f"s are the same and the 3rd is 0, also putting the sum in a variable didn't help. This is baffling me! I can post full code if needed.
Output:
bf: 117810016.000000 af: 117810016.000000 diff: 0.000000 sign: 1.000000 sign2: 2.000000 temp: 117810016.000000
Finite precision of float.
A typical float can only represent about 232 different numbers. 117,810,016.0 and 1.0 are two of them. 117,810,017.0 is not. So the C sum of 117810016.0 + 1.0 results in the "best" answer of 117810016.0.
Using a higher precision type like double often will extend the range of +1 exact math, but even that will not be exact with large enough values (typically about 9.0*10e15 or 253).
If code is to retain using float, suggest limiting organism[i] to values to the inclusive range or ±8,388,608.0 (223).
Perhaps can code simply use integer types for this task like long long.
Related
Given a harmonic series 1 - 1/2 + 1/3 - 1/4... = ln(2), is it possible to get a value of 0.69314718056 using only float values and using only basic operations (+,-,*,/). Are there any algorithms which can increase the precision of this calculation without going to unreasonably high values of n (current reasonable limit is 1e^10)
What I currently have: this nets me 8 correct digits -> 0.6931471825
EDIT
The goal is to compute the most precise summation value using only float datatypes
int main()
{
float sum = 0;
int n = 1e9;
double ans = log(2);
int i;
float r = 0;
for (i = n; i > 0; i--) {
r = i - (2*(i/2));
if(r == 0){
sum -= 1.0000000 / i;
}else{
sum += 1.0000000 / i;
}
}
printf("\n%.10f", sum);
printf("\n%.10f", ans);
return 0;
}
On systems where a float is a single-precision IEEE floating point number, it has 24 bits of precision, which is roughly 7 or (log10(224)) digits of decimal precision.
If you change
double ans = log(2);
to
float ans = log(2);
You'll see you already get the best answer possible.
0.6931471 82464599609375 From log(2), casted to float
0.6931471 82464599609375 From your algorithm
0.6931471 8055994530941723... Actual value
\_____/
7 digits
In fact, if you use %A instead of %f, you'll see you get the same answer to the bit.
0X1.62E43P-1 // From log(2), casted to float
0X1.62E43P-1 // From your algorithm
#ikegami already showed this answer in decimal and hex, but to make it even more clear, here are the numbers in binary.
ln(2) is actually:
0.1011000101110010000101111111011111010001110011111…
Rounded to 24 bits, that is:
0.101100010111001000011000
Converted back to decimal, that is:
0.693147182464599609375
...which is the number you got. You simply can't do any better than that, in the 24 bits of precision you've got available in a single-precision float.
I have problem with floating point rounding. I want to calculate floating point numbers and round them to (given) N decimals. In this example I want to round to 1 decimal places.
Calculation 37.1-28.75 will result into floating point 8.349998 (instead of 8.35), which will result printf rounding to 8.3 instead of 8.4 for 1 decimal places.
The actual result in math is 37.10-28.75=8.35000000, but due to floating point imprecision it is converted into 8.349998, which is then converted into 8.3 instead of 8.4 when using 1 decimal place rounding.
Minimum reproducible example:
float a = 37.10;
float b = 28.75;
//a-b = 8.35 = 8.4
printf("%.1f\n", a - b); //outputs 8.3 instead of 8.4
Is it valid to add following to the result:
float result = a - b;
if (result > 0.0f)
{
result += powf(10, -nr_of_decimals - 1) / 2;
}
else
{
result -= powf(10, -nr_of_decimals - 1) / 2;
}
EDIT: corrected that I want 1 decimal place rounded output, not 2 decimal places
EDIT2: negative results are needed as well (28.75-37.1 = -8.4)
On my system I do actually get 8.35. It's possible that you have to set the rounding direction to "nearest" first, try this (compile with e.g. gcc ... -lm):
#include <fenv.h>
#include <stdio.h>
int main()
{
float a = 37.10;
float b = 28.75;
float res = a - b;
fesetround(FE_TONEAREST);
printf("%.2f\n", res);
}
Binary floating point is, after all, binary, and if you do care about the correct decimal rounding this much, then your choices would be:
decimal floating point, or
fixed point.
I'd say the solution is to use fixed point, especially if you're on embedded, and forget about everything else.
With
int32_t a = 3710;
int32_t b = 2875;
the result of
a - b
will exactly be
835
every time; and then you just need to have a simple fixed point printing routine for the desired precision, and check the following digit after the last digit to see if it needs to be rounded up.
If you want to round to 2 decimals, you can add 0.005 to the result and then offset it with floorf:
float f = 37.10f - 28.75f;
float r = floorf((f + 0.005f) * 100.f) / 100.f;
printf("%f\n", r);
The output is 8.350000
Why are you using floats instead of doubles?
Regarding your question:
Is it valid to add following to the result:
float result = a - b;
if (result > 0.0f)
{
result += powf(10, -nr_of_decimals - 1) / 2;
}
else
{
result -= powf(10, -nr_of_decimals - 1) / 2;
}
It doesn't seem so, on my computer I get 8.350498 instead of 8.350000.
After your edit:
Calculation 37.1-28.75 will result into floating point 8.349998, which will result printf rounding to 8.3 instead of 8.4.
Then
float r = roundf((f + (f < 0.f ? -0.05f : +0.05f)) * 10.f) / 10.f;
is what you are looking for.
I'm new in C language, but I've tried integer, float and double division in C as I'm normally doing in Java, but when I execute 5.0/3 instead of 1.6666666666666667 I'm getting 1.666667 for double division and for float division.
I had tried to execute the program using Visual Studio as I always do but I got the message "First number is 1, second one is 1.666667 and the last one is 1.666667." after executing:
#include <stdio.h>
int main()
{
int firstNumber = 5 / 3;
float secondNumber = 5.0f / 3.0f;
double thirdNumber = 5.0 / 3.0;
printf("First number is %d, second one is %f and the last one is %lf.", firstNumber, secondNumber, thirdNumber);
return 0;
}
Why I'm getting the same result for 'secondNumber' and for 'thirdNumber'?
Typical float can represent about 232 different values.
Typical double can represent about 264 different values.
In both types, 5/3, the exact quotient of the division, is not in that set. Instead a nearby value (some binary fraction) is used.
float secondNumber = 5.0f / 3.0f; // 1.66666662693023681640625
double thirdNumber = 5.0 / 3.0; // 1.6666666666666667406815349750104360282421112060546875
When using "%f", 6 places past the decimal point are used. The printed text is a rounded one. In both cases, rounding to the same.
1.666667
To see more digits, use "%.10f", "%.20f", etc. #xing
printf("%.10f\n", secondNumber);
printf("%.10f\n", thirdNumber);
Output
1.6666666269
1.6666666667
I have seen this code:
(int)(num < 0 ? (num - 0.5) : (num + 0.5))
(How to round floating point numbers to the nearest integer in C?)
for rounding but I need to use float and precision for three digits after the point.
Examples:
254.450 should be rounded up to 255.
254.432 should be rounded down to 254
254.448 should be rounded down to 254
and so on.
Notice: This is what I mean by "3 digits" the bold digits after the dot.
I believe it should be faster then roundf() because I use many hundreds of thousands rounds when I need to calculate the rounds. Do you have some tips how to do that? I tried to search source of roundf but nothing found.
Note: I need it for RGB2HSV conversion function so I think 3 digits should be enough. I use positive numbers.
"it should be faster then roundf()" is only verifiable with profiling various approaches.
To round to 0 places (round to nearest whole number), use roundf()
float f;
float f_rounded3 = roundf(f);
To round to 3 places using float, use round()
The round functions round their argument to the nearest integer value in floating-point format, rounding halfway cases away from zero, regardless of the current rounding direction.
#include <math.h>
float f;
float f_rounded3 = round(f * 1000.0)/1000.0;
Code purposely uses the intermediate type of double, else code code use with reduced range:
float f_rounded3 = roundf(f * 1000.0f)/1000.0f;
If code is having trouble rounding 254.450 to 255.0 using roundf() or various tests, it is likely because the value is not 254.450, but a float close to it like 254.4499969 which rounds to 254. Typical FP using a binary format and 254.450 is not exactly representable.
You can use double transformation float -> string -> float, while first transformation make 3 digits after point:
sprintf(tmpStr, "%.3f", num);
this work for me
#include <stdio.h>
int main(int ac, char**av)
{
float val = 254.449f;
float val2 = 254.450f;
int res = (int)(val < 0 ? (val - 0.55f) : (val + 0.55f));
int res2 = (int)(val2 < 0 ? (val2 - 0.55f) : (val2 + 0.55f));
printf("%f %d %d\n", val, res, res2);
return 0;
}
output : 254.449005 254 255
to increase the precision just add any 5 you want in 0.55f like 0.555f, 0.5555f, etc
I wanted something like this:
float num = 254.454300;
float precision=10;
float p = 10*precision;
num = (int)(num * p + 0.5) / p ;
But the result will be inaccurate (with error) - my x86 machine gives me this result: 254.449997
When you can change de border from b=0.5 to b=0.45 you must know that for positives the rounded value is round_0(x,b)=(int)( x+(1-b) ) therefore b=0.45 ⟹ round_0(x)=(int)(x+0.55) and you can threat the signal. But remember that don't exists 254.45 but 254.449997 and 254.449999999999989, maybe you prefer to use b=0.4495.
If you have float round_0(float) to zero-digit rounding (can be like you show in question), you can do for one, two... n-digit rounding like this in C/C++: # define round_n(x,n) (round_0((x)*1e##n)/1e##n).
round_1( x , b ) = round_0( 10*x ,b)/10
round_2( x , b ) = round_0( 100*x ,b)/100
round_3( x , b ) = round_0( 1000*x ,b)/1000
round_n( x , b , n ) = round_0( (10^n)*x ,b)/(10^n)
But do typecast to int and (one more typecast) to float to operate is slower than rounds in operations. If don't simplify the add/sub (some compilers have this setting) for faster zero-digit round to float type you can do it.
inline float round_0( float x , float b=0.5f ){
return (( x+(0.5f-b) )+(3<<22))-(3<<22) ; // or (( x+(0.5f-b) )-(3<<22))+(3<<22) ;
}
inline double round_0( double x , double b=0.5 ){
return (( x+(0.5-b) )+(3<<51))-(3<<51) ; // or (( x+(0.5-b) )-(3<<51))+(3<<51) ;
}
When b=0.5 it correctly rounds to nearest integer if |x|<=2^23 (float) or |x|<=2^52 (double). But if compiler uses FPU (ten bytes floating-point) optimizing loads then constant is 3.0*(1u<<63), works |x|<=2^64 and use long double can be faster.
I just encountered a behaviour I don't understand in a C program that I'm using.
I guess it's due to floating numbers, maybe int to float cast, but still I would like someone to explain to me that this is a normal behaviour, and why.
Here is my C program :
#include <stdio.h>
#include <float.h>
int main ()
{
printf("FLT_MIN : %f\n", FLT_MIN);
printf("FLT_MAX : %f\n", FLT_MAX);
float valueFloat = 0.000000;
int valueInt = 0;
if (valueInt < FLT_MIN) {
printf("1- integer %d < FLT_MIN %f\n", valueInt, FLT_MIN);
}
if (valueFloat < FLT_MIN) {
printf("2- float %f < FLT_MIN %f\n", valueFloat, FLT_MIN);
}
if (0 < 0.000000) {
printf("3- 0 < 0.000000\n");
} else if (0 == 0.000000) {
printf("4- 0 == 0.000000\n");
} else {
printf("5- 0 > 0.000000\n");
}
if (valueInt < valueFloat) {
printf("6- %d < %f\n", valueInt, valueFloat);
} else if (valueInt == valueFloat) {
printf("7- %d == %f\n", valueInt, valueFloat);
} else {
printf("8- %d > %f\n", valueInt, valueFloat);
}
return 0;
}
And here is my command to compile and run it :
gcc float.c -o float ; ./float
Here is the output :
FLT_MIN : 0.000000
FLT_MAX : 340282346638528859811704183484516925440.000000
1- integer 0 < FLT_MIN 0.000000
2- float 0.000000 < FLT_MIN 0.000000
4- 0 == 0.000000
7- 0 == 0.000000
A C developper that I know consider normal that the line "1-" displays become of the loss of precision in the comparison. Let's admit that.
But why the line "3-" doesn't appear then, since it's the same comparison ?
Why the line "2-" appears, since I'm comparing the same numbers ? (or at least I hope so)
And why lines "4-" and "7-" appear ? It seems a different behaviour from line "1-".
Thanks for your help.
Your confusion is probably over the line:
printf("FLT_MIN : %f\n", FLT_MIN);
change it to:
printf("FLT_MIN : %g\n", FLT_MIN);
And you will see, that FLT_MIN is actually NOT zero, but a (tiny bit) larger than zero.
FLT_MIN is not 0, it's just above 0, you just need to show more places to see that. FLT_MIN is the smallest floating point number above 0 that the computer can represent, since floating points are almost always an approximation, printf and friends round when printing, unless you ask it for the precision:
printf("FLT_MIN : %.64f\n", FLT_MIN);
3 does not actually appear in your output because 0 is not less than 0
4 is comparing 0 with 0, the computer has no problem representing both of those (0 is a special case for floats) so they compare equal
7 is the same case as 4 just with intermediate assignments
This is correct behaviour. Under IEEE754, zero is exactly representable as a float. Therefore it can be 'equal' to integer zero (although 'equivalent' would be a better term). FLT_MIN is the smallest magnitude number that can be represented as a float and still be distinguished from zero. Even though a standard %f format specifier to printf() will show FLT_MIN as 0.000000, it is not zero. A literal 0.00... will be interpreted by the compiler as float 0, which is not equal to FLT_MIN, even though the default six decimal place %f format will print them the same.