Representing floating point numbers in c [duplicate] - c

This question already has answers here:
Is floating point math broken?
(31 answers)
Closed 5 years ago.
I have read that floating numbers are stored as per IEEE 754 representation and sometimes approximate value is displayed if its not possible to represent the number.
I have written the following code in which i am extracting fractional part then multiplying it by 10 nine times inside the loop. At the end of the loop the value is 142000000.000000 (variable g).
I multiplied it again with 10 outside the loop and got the result as 1419999999.999999.
I stored the value which was calculated inside for loop explicitly in another variable(k) and multiplied it with 10 and got the result as 1420000000.000000
Can you please tell me why the difference how it is able to store the value correctly in the second instance (In variable k).
#include<stdio.h>
#include<math.h>
int main()
{
double f=3.142,g,i;
int j;
g=modf(f,&i);
printf("Inside loop");
for(j=1;j<=9;j++)
{
g = g * 10.0;
printf("\n%lf",g);
}
printf("\nLoop ends");
g = g * 10.0;
printf("\nThe value of g is %lf",g);
double k = 142000000.000000;
k = k * 10.0;
printf("\nThe value of k is %lf",k);
}
Output
Inside loop
1.420000
14.200000
142.000000
1420.000000
14200.000000
142000.000000
1420000.000000
14200000.000000
142000000.000000
Loop ends
The value of g is 1419999999.999999
The value of k is 1420000000.000000

The initial value of f is 3.142. This value can't be represented exactly, so any operations on this value will also result in an inexact value. That's why you eventually end up printing an inexact value.
In contrast, the initial value of k is 142000000.000000 which can be represented exactly as a double. Multiplying this value by 10 still gives a value that can be represented exactly.

Related

C: is there anyway i can get the modulo operator to work on non integer values? [duplicate]

This question already has answers here:
How to use % operator for float values in c
(6 answers)
Floating Point Modulo Operation
(4 answers)
Closed 2 years ago.
I need to reset the value of a variable called theta back to 0 everytime its value reaches or exceeds 2 PI. I was thinking something along the lines of:
int n = 10;
float inc = 2*PI/n;
for(int i=0;i<10;i++)
theta = (theta + inc) % 2*PI;
Of course it wont work because % doesn't work on floating points in C. Is there another equivalent or better way to achieve what I'm trying to do here? All replies are welcome. Thanks
Use the standard fmod function. See https://en.cppreference.com/w/c/numeric/math/fmod or 7.2.10 in the C17 standard.
The fmod functions return the value x − n y , for some integer n such that, if y is nonzero, the result
has the same sign as x and magnitude less than the magnitude of y.
So theta = fmod(theta, 2*PI) should be what you want, if I understand your question correctly.
If it really must be done on float instead of double, you can use fmodf instead.
Since division is really just repeated subtraction, you can get the remainder by checking if the value is at least 2*PI, and if so subtract that value.
int n = 10;
float inc = 2*PI/n;
for(int i=0;i<10;i++) {
theta += inc;
if (theta >= 2*PI) theta -= 2*PI;
}
Note that because the amount of the increment is less than the 2*PI limit we can do the "over" check just once. This is likely cheaper than the operations that would be involved if fmod was called. If it was more you would at least need while instead, or just use fmod.

Log calls return NaN in C [duplicate]

This question already has answers here:
Why does dividing two int not yield the right value when assigned to double?
(10 answers)
Closed 6 years ago.
I have an array of double:
double theoretical_distribution[] = {1/21, 2/21, 3/21, 4/21, 5/21, 6/21};
And I am trying to computer it's entropy as:
double entropy = 0;
for (int i = 0; i < sizeof(theoretical_distribution)/sizeof(*theoretical_distribution); i++) {
entropy -= (theoretical_distribution[i] * (log10(theoretical_distribution[i])/log10(arity)));
}
However I am getting NaN, I have checked the part
(theoretical_distribution[i] * (log10(theoretical_distribution[i])/log10(arity)))
And found it to return NaN itself, so I assume it's the culprit, however all it's supposed to be is a simple base conversion of the log? Am I missing some detail about the maths of it?
Why is it evaluating to NaN.
You are passing 0 to the log10 function.
This is because your array theoretical_distribution is being populated with constant values that result from integer computations, all of which have a denominator larger than the numerator.
You probably intended floating computations, so make at least one of the numerator or denominator a floating constant.

adding numbers in a for loop [duplicate]

This question already has answers here:
Is floating point math broken?
(31 answers)
Closed 6 years ago.
I tried to sum some numbers in a for loop but it didn't go as I expected
float sum = 0;
int i;
printf("0.1+0.1=%f\n", 0.1 + 0.1);
for (i = 0; i<1000000; i++)
{
sum = sum + 0.1;
}
printf("the sum need to be 100000 \n");
printf("the real sum is:\n %f\n", sum);
system("PAUSE");
this program prints:
0.1+0.1=0.200000
the sum need to be 100000
the real sum is:
100958.343750
Press any key to continue . . .
can you explain please this strange result?
the international standard for floating point numbers does not have an exact representation for some decimal numbers.
http://en.wikipedia.org/wiki/IEEE_754
It is due to the way they are stored in memory, the way the mantissa and exponent are stored.
https://en.wikipedia.org/wiki/Floating_point
This is also the reason why you should never compare two float numbers even if they look "the same".
I still remember how surprised I was the fist time a simple code comparing two float numbers didn't work :) This alone would open a dedicated universe of discussions. It is very worth reading anyway:
http://floating-point-gui.de/errors/comparison/
The floating numbers are stored in memory as x*2^y where x is between 0 and 1 with some precision and y is integer and so they accurately don't represent most numbers, they represent numbers "close enough".
When you do this addition multiple times, the error is just more visible.
You can use double type for better accuracy.

Equality test and accuracy of machine [duplicate]

This question already has answers here:
Floating point inaccuracy examples
(7 answers)
C++ floating point precision [duplicate]
(5 answers)
Closed 8 years ago.
I found this code snippet on Page 174, A Book on C -Al Kelley, Ira Pohl .
int main()
{
int cnt=0; double sum=0.0,x;
for( x=0.0 ;x!= 9.9 ;x+=0.1)
{
sum=sum +x;
printf("cnt = %5d\n",cnt++);
}
return 0;
}
and it became a infinite loop as the book said it would. It didnt mention the precise reason except saying that it had to do with the accuracy of the machine.
I modified the code to check if
x=9.9
would ever become true, i.e. x was attaining 9.9 by adding the following lines
diff=x-9.9;
printf("cnt =10%d \a x =%10.10lf dif=%10.10lf \n",++cnt,x,diff);
and i got the following lines among the output
cnt =1098 x =9.7000000000 dif=-0.2000000000
cnt =1099 x =9.8000000000 dif=-0.1000000000
cnt =10100 x =9.9000000000 dif=-0.0000000000
cnt =10101 x =10.0000000000 dif=0.1000000000
cnt =10102 x =10.1000000000 dif=0.2000000000
if x is attaining the value 9.9 exactly , why is it still a infinite loop?
You are simply printing the number with too poor accuracy to notice that it isn't exact. Try something like this:
#include <stdio.h>
int main()
{
double d = 9.9;
if(d == 9.9)
{
printf("Equal!");
}
else
{
printf("Not equal! %.20f", d);
}
}
Output on my machine:
Not equal! 9.90000000000000035527
The book is likely trying to teach you to never use == or != operators to compare floating point variables. Also for the same reason, never use floats as loop iterators.
The problem is that most floating point implementation are based on IEEE 754. See http://en.wikipedia.org/wiki/IEEE_floating_point
The problem with this is, that numbers are builded with base 2 (binary formats).
The number 9.9 can never be build with base 2 excatly.
The "Numerical Computation Guide" by David Goldberg gves an exact statement about it:
Several different representations of real numbers have been proposed,
but by far the most widely used is the floating-point representation.
Floating-point representations have a base b (which is always assumed to
be even) and a precision p. If b = 10 and p = 3, then the number 0.1 is
represented as 1.00 × 10^-1. If b = 2 and p = 24, then the decimal
number 0.1 cannot be represented exactly, but is approximately
1.10011001100110011001101 × 2^-4.
You can safely assume two floating point numbers are never equal 'exactly' (unless one is a copy of the other).
Computer works on binary and floating point, in other words in base 2. Just like base 10, base 2 have numbers that it cannot build. For example, try to write the fraction 10/3 in base 10. You'll end up with infinite 3s. and in Binary, you cannot even write 0.1 (decimal) in binary, you'll also get a recurring pattern 0.0001100110011... (binary).
This video will do better to explain http://www.youtube.com/watch?v=PZRI1IfStY0

Adding two doubles gives weird rounding result in C [duplicate]

This question already has answers here:
Why adding these two double does not give correct answer? [duplicate]
(2 answers)
Closed 8 years ago.
I'm a bit of C newbie but this problem is really confusing me.
I have a variable double = 436553940.0000000000 (it was cast from an Int) and an other variable double 0.095832496.
My result should be 436553940.0958324*96*, however I get 436553940.0958324*67*.
Why does this happen and how can I prevent it from happening?
The number you expect is simply not representable by a double. The value you receive is instead a close approximation based on rounding results:
In [9]: 436553940.095832496
Out[9]: 436553940.09583247
In [18]: 436553940.095832496+2e-8
Out[18]: 436553940.09583247
In [19]: 436553940.095832496+3e-8
Out[19]: 436553940.0958325
In [20]: 436553940.095832496-2e-8
Out[20]: 436553940.09583247
In [21]: 436553940.095832496-3e-8
Out[21]: 436553940.0958324
You've just run out of significand bits.
Doubles are not able to represent every number. We can write some C++ code (that implements doubles in the same way) to show this.
#include <cstdio>
#include <cmath>
int main() {
double x = 436553940;
double y = 0.095832496;
double sum = x + y;
printf("prev: %50.50lf\n", std::nextafter(sum, 0));
printf("sum: %50.50lf\n", sum);
printf("next: %50.50lf\n", std::nextafter(sum, 500000000));
}
This code computes the sum of the two numbers you are talking about, and stores it as sum. We then compute the next representable double before that number, and after that number.
Here's the output:
[11:43am][wlynch#watermelon /tmp] ./foo
prev: 436553940.09583240747451782226562500000000000000000000000000
sum: 436553940.09583246707916259765625000000000000000000000000000
next: 436553940.09583252668380737304687500000000000000000000000000
So, we are not able to have the calculation equal 436553940.0958324_96_, because that number is not a valid double. So the IEEE-754 standard (and your compiler) defines some rules that tell us how the number should be rounded, to reach the nearest representable double.

Resources