Double comparison in C [duplicate] - c

This question already has answers here:
Why does floating-point arithmetic not give exact results when adding decimal fractions?
(31 answers)
Closed 7 years ago.
A checker(of type double) array is obtained from a gsl_vector in the following way.
for (i=0; i<M; i++)
{
checker[i] = (double)gsl_vector_get(check, i);
printf(" %f", checker[i]);
}
Array checker has [ 3.000000 -3.000000 -11.000000 -5.000000 ] elements after the above operation (from the output of the above program).
I am facing a weird problem now.
for (i=0; i<M; i++)
{
printf("checker: %f\n", checker[i]);
if(checker[0] == 3.00)
{
printf("Inside If: %f\n", checker[i]);
}
}
The above code outputs
checker: 3.000000
checker: -3.000000
checker: -11.000000
checker: -5.000000
As seen, the if loop inside for is not executed. What could be the problem?
Edit: The above problem is gone when I directly copied [ 3.000000 -3.000000 -11.000000 -5.000000 ] into the checker Array instead of the gsl_vector_get(check,i). Check value comes from dgmev function where a matrix and a vector are multiplied.
Thanks

A floating point number maybe represented in the following form:
[sign] [mantissa] * 2[exponent]
So there will be rounding or relative errors when the space is less in memory.
From wiki:
Single-precision floating-point format is a computer number format that occupies 4 bytes (32 bits) in computer memory and represents a wide dynamic range of values by using a floating point.
The IEEE 754 standard specifies a binary32 as having:
Sign bit: 1 bit
Exponent width: 8 bits
Significand precision: 24 bits (23 explicitly stored)
This gives from 6 to 9 significant decimal digits precision (if a
decimal string with at most 6 significant decimal is converted to IEEE
754 single precision and then converted back to the same number of
significant decimal, then the final string should match the original;
and if an IEEE 754 single precision is converted to a decimal string
with at least 9 significant decimal and then converted back to single,
then the final number must match the original [4]).
Larger (more bits) floating point representations allow for greater precision.
Floating point math is not exact. Simple values like 0.1 cannot be precisely represented using binary floating point numbers, and the limited precision of floating point numbers means that slight changes in the order of operations can change the result. A must read:
What Every Computer Scientist Should Know About Floating-Point Arithmetic
The IEEE standard divides exceptions into 5 classes: overflow, underflow, division by zero, invalid operation and inexact. There is a separate status flag for each class of exception. The meaning of the first three exceptions is self-evident. Invalid operation covers the situations listed in TABLE D-3, and any comparison that involves a NaN.

Related

Why does C print float values after the decimal point different from the input value? [duplicate]

This question already has answers here:
Why IEEE754 single-precision float has only 7 digit precision?
(2 answers)
Closed 1 year ago.
Why does C print float values after the decimal point different from the input value?
Following is the code.
CODE:
#include <stdio.h>
#include<math.h>
void main()
{
float num=2118850.132000;
printf("num:%f",num);
}
OUTPUT:
num:2118850.250000
This should have printed 2118850.132000, But instead it is changing the digits after the decimal to .250000. Why is it happening so?
Also, what can one do to avoid this?
Please guide me.
Your computer uses binary floating point internally. Type float has 24 bits of precision, which translates to approximately 7 decimal digits of precision.
Your number, 2118850.132, has 10 decimal digits of precision. So right away we can see that it probably won't be possible to represent this number exactly as a float.
Furthermore, due to the properties of binary numbers, no decimal fraction that ends in 1, 2, 3, 4, 6, 7, 8, or 9 (that is, numbers like 0.1 or 0.2 or 0.132) can be exactly represented in binary. So those numbers are always going to experience some conversion or roundoff error.
When you enter the number 2118850.132 as a float, it is converted internally into the binary fraction 1000000101010011000010.01. That's equivalent to the decimal fraction 2118850.25. So that's why the .132 seems to get converted to 0.25.
As I mentioned, float has only 24 bits of precision. You'll notice that 1000000101010011000010.01 is exactly 24 bits long. So we can't, for example, get closer to your original number by using something like 1000000101010011000010.001, which would be equivalent to 2118850.125, which would be closer to your 2118850.132. No, the next lower 24-bit fraction is 1000000101010011000010.00 which is equivalent to 2118850.00, and the next higher one is 1000000101010011000010.10 which is equivalent to 2118850.50, and both of those are farther away from your 2118850.132. So 2118850.25 is as close as you can get with a float.
If you used type double you could get closer. Type double has 53 bits of precision, which translates to approximately 16 decimal digits. But you still have the problem that .132 ends in 2 and so can never be exactly represented in binary. As type double, your number would be represented internally as the binary number 1000000101010011000010.0010000111001010110000001000010 (note 53 bits), which is equivalent to 2118850.132000000216066837310791015625, which is much closer to your 2118850.132, but is still not exact. (Also notice that 2118850.132000000216066837310791015625 begins to diverge from your 2118850.1320000000 after 16 digits.)
So how do you avoid this? At one level, you can't. It's a fundamental limitation of finite-precision floating-point numbers that they cannot represent all real numbers with perfect accuracy. Also, the fact that computers typically use binary floating-point internally means that they can almost never represent "exact-looking" decimal fractions like .132 exactly.
There are two things you can do:
If you need more than about 7 digits worth of precision, definitely use type double, don't try to use type float.
If you believe your data is accurate to three places past the decimal, print it out using %.3f. If you take 2118850.132 as a double, and printf it using %.3f, you'll get 2118850.132, like you want. (But if you printed it with %.12f, you'd get the misleading 2118850.132000000216.)
This will work if you use double instead of float:
#include <stdio.h>
#include<math.h>
void main()
{
double num=2118850.132000;
printf("num:%f",num);
}

Unexpected behavior for floating point number in C programming [duplicate]

This question already has answers here:
Is floating point math broken?
(31 answers)
Closed 3 years ago.
I'm getting confused with floating point number in c programming
#include <stdio.h>
int main()
{
float a = 255.167715;
printf("%f", a);
return 0;
} // it print value of 255.167709
Why it produce value like so? To be honest can you tell how it actually works in C programming?
In binary, 255.167715 is approximately 11111111.001010101110111101011110110010000000110001110…2. In your C implementation, most likely, the source code 255.167715 is converted to 11111111.0010101011101111010111101100100000001100011102, which is 255.16771499999998695784597657620906829833984375, because 255.167715 is a double constant, and that is the closest value representable in your implementation’s double type to the decimal number 255.167715, because the double type has only 53-bit significands. (A significand is the fraction portion of a floating-point number. There is also a sign and an exponent portion.)
Then, for float a = 255.167715;, this double value is converted to float. Since the float type has only 24-bit significands, the result is 11111111.00101010111011112, which is 255.1677093505859375 in decimal.
When you print this with the default formatting of %f, six digits after the decimal place are used, so it prints “255.167709”.
The short answer is that there is no such single-precision floating-point number as 255.167715. This is true for any computer using IEEE 754 floating-point formats; the situation is not unique to C. As a single-precision floating-point number, the closest value is 255.167709.
Single-precision floating point gives you the equivalent of 7 or so decimal digits of precision. And as you can see, the input you gave and the output you got agreed to seven places, namely 255.1677. Any digits past that are essentially meaningless.
For much more about the sometimes-surprising aspects of floating-point math, see "Is floating point math broken?".

Why floating point comparisons gives different outputs on different compiler? [duplicate]

This question already has answers here:
strange output in comparison of float with float literal
(8 answers)
Closed 7 years ago.
I was reading this. It contains following C program.
#include<stdio.h>
int main()
{
float x = 0.1;
if (x == 0.1)
printf("IF");
else if (x == 0.1f)
printf("ELSE IF");
else
printf("ELSE");
}
The article says that
The output of above program is “ELSE IF” which means the expression “x
== 0.1″ returns false and expression “x == 0.1f” returns true.
But I tried it on different compilers & getting different outputs:
Here is the outputs on various IDEs.
1) Orwell Dev C++: ELSE
2) Code Blocks 13.12: ELSE IF but it gives following warnings during compilation.
warning: comparing floating point with == or != is unsafe.
Why this comparison is unsafe?
3) Ideone.com: ELSE IF (see run: http://ideone.com/VOE3E0)
4) TDM GCC 32 bit: ELSE IF
5) MSVS 2010: ELSE IF but compiles with warning
Warning 1 warning C4305: 'initializing' : truncation from 'double' to
'float'
What is exactly happening here? What's wrong with the program? Is it implementation defined behavior occurring?
Please help me.
A floating point number maybe represented in the following form:
[sign] [mantissa] * 2[exponent]
So there will be rounding or relative errors when the space is less in memory.
From wiki:
Single-precision floating-point format is a computer number format that occupies 4 bytes (32 bits) in computer memory and represents a wide dynamic range of values by using a floating point.
The IEEE 754 standard specifies a binary32 as having:
Sign bit: 1 bit
Exponent width: 8 bits
Significand precision: 24 bits (23 explicitly stored)
This gives from 6 to 9 significant decimal digits precision (if a
decimal string with at most 6 significant decimal is converted to IEEE
754 single precision and then converted back to the same number of
significant decimal, then the final string should match the original;
and if an IEEE 754 single precision is converted to a decimal string
with at least 9 significant decimal and then converted back to single,
then the final number must match the original [4]).
Larger (more bits) floating point representations allow for greater precision.
Floating point math is not exact. Simple values like 0.1 cannot be precisely represented using binary floating point numbers, and the limited precision of floating point numbers means that slight changes in the order of operations can change the result. A must read:
What Every Computer Scientist Should Know About Floating-Point Arithmetic
The IEEE standard divides exceptions into 5 classes: overflow, underflow, division by zero, invalid operation and inexact. There is a separate status flag for each class of exception. The meaning of the first three exceptions is self-evident. Invalid operation covers the situations listed in TABLE D-3, and any comparison that involves a NaN.

float strange imprecision error in c [duplicate]

This question already has answers here:
Is floating point math broken?
(31 answers)
Closed 8 years ago.
today happened to me a strange thing, when I try to compile and execute the output of this code isn't what I expected. Here is the code that simply add floating values to an array of float and then print it out.
The simple code:
int main(){
float r[10];
int z;
int i=34;
for(z=0;z<10;z++){
i=z*z*z;
r[z]=i;
r[z]=r[z]+0.634;
printf("%f\n",r[z]);
}
}
the output:
0.634000
1.634000
8.634000
27.634001
64.634003
125.634003
216.634003
343.634003
512.633972
729.633972
note that from the 27 appears numbers after the .634 that should not be there. Anyone know why this happened? It's an event caused by floating point approximation?..
P.S I have a linux debian system, 64 bit
thanks all
A number maybe represented in the following form:
[sign] [mantissa] * 2[exponent]
So there will be rounding or relative errors when the space is less in memory.
From wiki:
Single-precision floating-point format is a computer number format that occupies 4 bytes (32 bits) in computer memory and represents a wide dynamic range of values by using a floating point.
The IEEE 754 standard specifies a binary32 as having:
Sign bit: 1 bit
Exponent width: 8 bits
Significand precision: 24 bits (23 explicitly stored)
This gives from 6 to 9 significant decimal digits precision (if a
decimal string with at most 6 significant decimal is converted to IEEE
754 single precision and then converted back to the same number of
significant decimal, then the final string should match the original;
and if an IEEE 754 single precision is converted to a decimal string
with at least 9 significant decimal and then converted back to single,
then the final number must match the original [4]).
Edit (Edward's comment): Larger (more bits) floating point representations allow for greater precision.
Yes, this is a floating point approximation error or Round-off error. Floating point numbers representation uses quantization to represent a large range of numbers, so it only represent steps and round all the in-between numbers to the nearest step. This cause error if the wanted number is not one of these steps.
In addition to the other useful answers, it can be illustrative to print more digits than the default:
int main(){
float r[10];
int z;
int i=34;
for(z=0;z<10;z++){
i=z*z*z;
r[z]=i;
r[z]=r[z]+0.634;
printf("%.30f\n",r[z]);
}
}
gives
0.634000003337860107421875000000
1.633999943733215332031250000000
8.633999824523925781250000000000
27.634000778198242187500000000000
64.634002685546875000000000000000
125.634002685546875000000000000000
216.634002685546875000000000000000
343.634002685546875000000000000000
512.633972167968750000000000000000
729.633972167968750000000000000000
In particular, note that 0.634 isn't actually "0.634", but instead is the closest number representable by a float.
"float" has only about six digit precision, so it isn't unexpected that you get errors that large.
If you used "double", you would have about 15 digits precision. You would have an error, but you would get for example 125.634000000000003 and not 125.634003.
So you will always get rounding errors and your results will not be quite what you expect, but by using double the effect will be minimal. Warning: If you do things like adding 125 + 0.634 and then subtract 125, the result will (most likely) not be 0.634. No matter whether you use float or double. But with double, the result will be very, very close to 0.634.
In principle, given the choice of float and double, you should never use float, unless you have a very, very good reason.

Number of significant digits for a floating point type

The description for type float in C mentions that the number of significant digits is 6. However,
float f = 12345.6;
and then printing it using printf() does not print 12345.6, it prints 12345.599609. So what does "6 significant digits" (or "15 in case of a double") mean for a floating point type?
6 significant digits means that the maximum error is approximately +/- 0.0001%. The single float value actually has about 7.2 digits of precision (source). This means that the error is about +/- 12345.6/10^7 = 0.00123456. Which is on the order of your error (0.000391).
According to the standard, not all decimal number can be stored exactly in memory. Depending on the size of the representation, the error can get to a certain maximum. For float this is 0.0001% (6 significant digits = 10^-6 = 10^-4 %).
In your case the error is (12345.6 - 12345.599609) / 12345.6 = 3.16e-08 far lower than the maximum error for floats.
What you're seeing is not really any issue with significant digits, but the fact that numbers on a computer are stored in binary, and there is no finite binary representation for 3/5 (= 0.6). 3/5 in binary looks like 0.100110011001..., with the "1001" pattern repeating forever. This sequence is equivalent to 0.599999... repeating. You're actually getting to three decimal places to the right of the decimal point before any error related to precision kicks in.
This is similar to how there is no finite base-10 representation of 1/3; we have 0.3333 repeating forever.
The problem here is that you cannot assure a number can be stored in a float. You need to represent this number with mantissa, base and exponent as IEEE 754 explains. The number printf(...) shows you is the real float number that was stored. You can not assure a number of significant digits in a float number.

Resources