This question already has answers here:
Why can't decimal numbers be represented exactly in binary?
(22 answers)
Closed 8 years ago.
Why is it that when I run the C code
float x = 4.2
int y = 0
y = x*100
printf("%i\n", y);
I get 419 back? Shouldn't it be 420?
This has me stumped.
To illustrate, look at the intermediate values:
int main()
{
float x = 4.2;
int y;
printf("x = %f\n", x);
printf("x * 100 = %f\n", x * 100);
y = x * 100;
printf("y = %i\n", y);
return 0;
}
x = 4.200000 // Original x
x * 100 = 419.999981 // Floating point multiplication precision
y = 419 // Assign to int truncates
Per #Lutzi's excellent suggestion, this is more clearly illustrated if we print all the float values with precision that is higher than they represent:
...
printf("x = %.20f\n", x);
printf("x * 100 = %.20f\n", x * 100);
...
And then you can see that the value assigned to x isn't perfectly precise to start with:
x = 4.19999980926513671875
x * 100 = 419.99998092651367187500
y = 419
A floating point number is stored as an approximate value - not the exact floating point value. It has a representation due to which the result gets truncated when you convert it into an integer. You can see more information about the representation here.
This is an example representation of a single precision floating point number :
float isn't large enough to store 4.2 precisely. If you print x with enough precision you'll probably see it come out as 4.19999995 or so. Multiplying by 100 yields 419.999995 and the integer assignment truncates (rounds down). It should work if you make x a double.
4.2 is not in the finite number space of a float, so the system uses the closest possible approximation, which is slightly below 4.2. If you now multiply this with 100 (which is an exact float), you get 419.99something. printf()ing this with %i performs not rounding, but truncation - so you get 419.
Related
Given a harmonic series 1 - 1/2 + 1/3 - 1/4... = ln(2), is it possible to get a value of 0.69314718056 using only float values and using only basic operations (+,-,*,/). Are there any algorithms which can increase the precision of this calculation without going to unreasonably high values of n (current reasonable limit is 1e^10)
What I currently have: this nets me 8 correct digits -> 0.6931471825
EDIT
The goal is to compute the most precise summation value using only float datatypes
int main()
{
float sum = 0;
int n = 1e9;
double ans = log(2);
int i;
float r = 0;
for (i = n; i > 0; i--) {
r = i - (2*(i/2));
if(r == 0){
sum -= 1.0000000 / i;
}else{
sum += 1.0000000 / i;
}
}
printf("\n%.10f", sum);
printf("\n%.10f", ans);
return 0;
}
On systems where a float is a single-precision IEEE floating point number, it has 24 bits of precision, which is roughly 7 or (log10(224)) digits of decimal precision.
If you change
double ans = log(2);
to
float ans = log(2);
You'll see you already get the best answer possible.
0.6931471 82464599609375 From log(2), casted to float
0.6931471 82464599609375 From your algorithm
0.6931471 8055994530941723... Actual value
\_____/
7 digits
In fact, if you use %A instead of %f, you'll see you get the same answer to the bit.
0X1.62E43P-1 // From log(2), casted to float
0X1.62E43P-1 // From your algorithm
#ikegami already showed this answer in decimal and hex, but to make it even more clear, here are the numbers in binary.
ln(2) is actually:
0.1011000101110010000101111111011111010001110011111…
Rounded to 24 bits, that is:
0.101100010111001000011000
Converted back to decimal, that is:
0.693147182464599609375
...which is the number you got. You simply can't do any better than that, in the 24 bits of precision you've got available in a single-precision float.
This question already has answers here:
Is floating point math broken?
(31 answers)
Closed 2 months ago.
I'm using the online compiler https://www.onlinegdb.com/ and in the following code when I multiply 2.1 with 100 the output becomes 209 instead of 210.
#include<stdio.h>
#include <stdint.h>
int main()
{
float x = 1.8;
x = x + 0.3;
int coefficient = 100;
printf("x: %2f\n", x);
uint16_t y = (uint16_t)(x * coefficient);
printf("y: %d\n", y);
return 0;
}
Where am I doing wrong? And what should I do to obtain 210?
I tried to all different type casts still doesn't work.
The following assumes the compiler uses IEEE-754 binary32 and binary64 for float and double, which is overwhelmingly common.
float x = 1.8;
Since 1.8 is a double constant, the compiler converts 1.8 to the nearest double value, 1.8000000000000000444089209850062616169452667236328125. Then, to assign it to the float x, it converts that to the nearest float value, 1.7999999523162841796875.
x = x + 0.3;
The compiler converts 0.3 to the nearest double value, 0.299999999999999988897769753748434595763683319091796875. Then it adds x and that value using double arithmetic, which produces 2.09999995231628400205181605997495353221893310546875.
Then, to assign that to x, it converts it to the nearest float value, 2.099999904632568359375.
uint16_t y = (uint16_t)(x * coefficient);
Since x is float and coefficient is int, the compiler converts the coefficient to float and performs the multiplication using float arithmetic. This produces 209.9999847412109375.
Then the conversion to uint16_t truncates the number, producing 209.
One way to get 210 instead is to use uint16_t y = lroundf(x * coefficient);. (lroundf is declared in <math.h>.) However, to determine what the right way is, you should explain what these numbers are and why you are doing this arithmetic with them.
Floating point numbers are not exact, when you add 1.8 + 0.3,
the FPU might generate a slightly different result from the expected 2.1 (by margin smaller then float Epsilon)
read more about floating-point numbers representation in wiki https://en.wikipedia.org/wiki/Machine_epsilon
what happens to you is:
1.8 + 0.3 = 209.09999999...
then you truncate it to int resulting in 209
you might find this question also relevant to you Why float.Epsilon and not zero? might be
#include<stdio.h>
#include <stdint.h>
#include <inttypes.h>
int main()
{
float x = 1.8;
x = x + 0.3;
uint16_t coefficient = 100;
printf("x: %2f\n", x);
uint16_t y = round(x * coefficient);
printf("y: %" PRIu16 "\n", y);
return 0;
}
I have problem with floating point rounding. I want to calculate floating point numbers and round them to (given) N decimals. In this example I want to round to 1 decimal places.
Calculation 37.1-28.75 will result into floating point 8.349998 (instead of 8.35), which will result printf rounding to 8.3 instead of 8.4 for 1 decimal places.
The actual result in math is 37.10-28.75=8.35000000, but due to floating point imprecision it is converted into 8.349998, which is then converted into 8.3 instead of 8.4 when using 1 decimal place rounding.
Minimum reproducible example:
float a = 37.10;
float b = 28.75;
//a-b = 8.35 = 8.4
printf("%.1f\n", a - b); //outputs 8.3 instead of 8.4
Is it valid to add following to the result:
float result = a - b;
if (result > 0.0f)
{
result += powf(10, -nr_of_decimals - 1) / 2;
}
else
{
result -= powf(10, -nr_of_decimals - 1) / 2;
}
EDIT: corrected that I want 1 decimal place rounded output, not 2 decimal places
EDIT2: negative results are needed as well (28.75-37.1 = -8.4)
On my system I do actually get 8.35. It's possible that you have to set the rounding direction to "nearest" first, try this (compile with e.g. gcc ... -lm):
#include <fenv.h>
#include <stdio.h>
int main()
{
float a = 37.10;
float b = 28.75;
float res = a - b;
fesetround(FE_TONEAREST);
printf("%.2f\n", res);
}
Binary floating point is, after all, binary, and if you do care about the correct decimal rounding this much, then your choices would be:
decimal floating point, or
fixed point.
I'd say the solution is to use fixed point, especially if you're on embedded, and forget about everything else.
With
int32_t a = 3710;
int32_t b = 2875;
the result of
a - b
will exactly be
835
every time; and then you just need to have a simple fixed point printing routine for the desired precision, and check the following digit after the last digit to see if it needs to be rounded up.
If you want to round to 2 decimals, you can add 0.005 to the result and then offset it with floorf:
float f = 37.10f - 28.75f;
float r = floorf((f + 0.005f) * 100.f) / 100.f;
printf("%f\n", r);
The output is 8.350000
Why are you using floats instead of doubles?
Regarding your question:
Is it valid to add following to the result:
float result = a - b;
if (result > 0.0f)
{
result += powf(10, -nr_of_decimals - 1) / 2;
}
else
{
result -= powf(10, -nr_of_decimals - 1) / 2;
}
It doesn't seem so, on my computer I get 8.350498 instead of 8.350000.
After your edit:
Calculation 37.1-28.75 will result into floating point 8.349998, which will result printf rounding to 8.3 instead of 8.4.
Then
float r = roundf((f + (f < 0.f ? -0.05f : +0.05f)) * 10.f) / 10.f;
is what you are looking for.
I am trying to get multiplay decimal part of a double number about 500 times. This number starts to lose precision as time goes on. Is there any trick to be able to make the continued multiplication accurate?
double x = 0.3;
double binary = 2.0;
for (i=0; i<500; i++){
x = x * binary;
printf("x equals to : %f",x);
if(x>=1.0)
x = x - 1;
}
Ok after i read some of the things u posted i am thinking how could i remove this unwanted stuff from my number to keep multiplication stable. For instance in my example. My decimal parts will be chaning in such manner: 0.3,0.6,0.2,0.4,0.8... Can we cut the rest to keep this numbers ??
With typical FP is binary64, double x = 0.3; results in x with the value more like 0.29999999999999998890... so code has an difference from the beginning.
Scale x by 10 to stay with exact math - or use a decimal64 double
int main(void) {
double x = 3.0;
double binary = 2.0;
printf("x equals to : %.20f\n",x);
for (int i=0; i<500; i++){
x = x * binary;
printf("x equals to : %.20f\n",x/10);
if(x>=10.0)
x = x - 10;
}
return 0;
}
In general, floating point math is not completely precise, as shown in the other answers and in many online resources. The problem is that certain numbers can not be represented exactly in binary. 0.3 is such a number, but all natural numbers aren't. So you could change your program to this:
double x = 3.0;
double binary = 2.0;
for (i=0; i<500; i++){
x = x * binary;
printf("x equals to : %f",x/10.0);
if(x>=10.0)
x = x - 10.0;
}
Although your program is doing some very unusual things, the main answer to your question is that that is how floating point numbers work. They are imprecise.
http://floating-point-gui.de/basic/
This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Why can't I return a double from two ints being divided
This statement in C with gcc:
float result = 1 / 10;
Produces the result 0.
But if I define variables a and b with values 1 and 10 respectively and then do:
float result = a / b;
I get the expected answer of 0.1
What gives?
When the / operator is applied to two integers, it's an integer division. So, the result of 1 / 10 is 0.
When the / operator is applied to at least one float variable, it's a float division. The result will be 0.1 as you intend.
Example :
printf("%f\n", 1.0f / 10); /* output : 0.1 (the 'f' means that 1.0 is a float, not a double)*/
printf("%d\n", 1 / 10); /* output : 0 */
Example with variables :
int a = 1, b = 10;
printf("%f\n", (float)a / b); /* output : 0.1 */
That happens because 1 and 10 are integer constants, so the division is done using integer arithmetic.
If at least one of your variables a and b is a float, it will be done using floating-point arithmetic.
If you want to do it with number literals, use the notation to make at least one of them a float literal, for example:
float result = 1.0f / 10;
Or cast one of them to float, that would be a bit more elaborate:
float result = 1 / (float)10;
1 and 10 are both integers and will return an integer, when you define a and b you're defining as a float. If you use 1.0 and 10.0 it will return the correct result
If you want float than just cast it as follow.
float result = (float)a/b;