Why floating point comparisons gives different outputs on different compiler? [duplicate] - c

This question already has answers here:
strange output in comparison of float with float literal
(8 answers)
Closed 7 years ago.
I was reading this. It contains following C program.
#include<stdio.h>
int main()
{
float x = 0.1;
if (x == 0.1)
printf("IF");
else if (x == 0.1f)
printf("ELSE IF");
else
printf("ELSE");
}
The article says that
The output of above program is “ELSE IF” which means the expression “x
== 0.1″ returns false and expression “x == 0.1f” returns true.
But I tried it on different compilers & getting different outputs:
Here is the outputs on various IDEs.
1) Orwell Dev C++: ELSE
2) Code Blocks 13.12: ELSE IF but it gives following warnings during compilation.
warning: comparing floating point with == or != is unsafe.
Why this comparison is unsafe?
3) Ideone.com: ELSE IF (see run: http://ideone.com/VOE3E0)
4) TDM GCC 32 bit: ELSE IF
5) MSVS 2010: ELSE IF but compiles with warning
Warning 1 warning C4305: 'initializing' : truncation from 'double' to
'float'
What is exactly happening here? What's wrong with the program? Is it implementation defined behavior occurring?
Please help me.

A floating point number maybe represented in the following form:
[sign] [mantissa] * 2[exponent]
So there will be rounding or relative errors when the space is less in memory.
From wiki:
Single-precision floating-point format is a computer number format that occupies 4 bytes (32 bits) in computer memory and represents a wide dynamic range of values by using a floating point.
The IEEE 754 standard specifies a binary32 as having:
Sign bit: 1 bit
Exponent width: 8 bits
Significand precision: 24 bits (23 explicitly stored)
This gives from 6 to 9 significant decimal digits precision (if a
decimal string with at most 6 significant decimal is converted to IEEE
754 single precision and then converted back to the same number of
significant decimal, then the final string should match the original;
and if an IEEE 754 single precision is converted to a decimal string
with at least 9 significant decimal and then converted back to single,
then the final number must match the original [4]).
Larger (more bits) floating point representations allow for greater precision.
Floating point math is not exact. Simple values like 0.1 cannot be precisely represented using binary floating point numbers, and the limited precision of floating point numbers means that slight changes in the order of operations can change the result. A must read:
What Every Computer Scientist Should Know About Floating-Point Arithmetic
The IEEE standard divides exceptions into 5 classes: overflow, underflow, division by zero, invalid operation and inexact. There is a separate status flag for each class of exception. The meaning of the first three exceptions is self-evident. Invalid operation covers the situations listed in TABLE D-3, and any comparison that involves a NaN.

Related

Unexpected behavior for floating point number in C programming [duplicate]

This question already has answers here:
Is floating point math broken?
(31 answers)
Closed 3 years ago.
I'm getting confused with floating point number in c programming
#include <stdio.h>
int main()
{
float a = 255.167715;
printf("%f", a);
return 0;
} // it print value of 255.167709
Why it produce value like so? To be honest can you tell how it actually works in C programming?
In binary, 255.167715 is approximately 11111111.001010101110111101011110110010000000110001110…2. In your C implementation, most likely, the source code 255.167715 is converted to 11111111.0010101011101111010111101100100000001100011102, which is 255.16771499999998695784597657620906829833984375, because 255.167715 is a double constant, and that is the closest value representable in your implementation’s double type to the decimal number 255.167715, because the double type has only 53-bit significands. (A significand is the fraction portion of a floating-point number. There is also a sign and an exponent portion.)
Then, for float a = 255.167715;, this double value is converted to float. Since the float type has only 24-bit significands, the result is 11111111.00101010111011112, which is 255.1677093505859375 in decimal.
When you print this with the default formatting of %f, six digits after the decimal place are used, so it prints “255.167709”.
The short answer is that there is no such single-precision floating-point number as 255.167715. This is true for any computer using IEEE 754 floating-point formats; the situation is not unique to C. As a single-precision floating-point number, the closest value is 255.167709.
Single-precision floating point gives you the equivalent of 7 or so decimal digits of precision. And as you can see, the input you gave and the output you got agreed to seven places, namely 255.1677. Any digits past that are essentially meaningless.
For much more about the sometimes-surprising aspects of floating-point math, see "Is floating point math broken?".

When does a float constant overflow if it is implicitly converted to int type

I have two code snippets and both produce different results. I am using TDM-GCC 4.9.2 compiler and my compiler is 32-bit version
( Size of int is 4 bytes and Minimum value in float is -3.4e38 )
Code 1:
int x;
x=2.999999999999999; // 15 '9s' after decimal point
printf("%d",x);
Output:
2
Code 2:
int x;
x=2.9999999999999999; // 16 '9s' after decimal point
printf("%d",x);
Output:
3
Why is the implicit conversion different in these cases?
Is it due to some overflow in the Real constant specified and if so how does it happen?
(Restricting this answer to IEEE754).
When you assign a constant to a floating point, the IEEE754 standard requires the closest possible floating point number to be picked. Both the numbers you present cannot be represented exactly.
The nearest IEEE754 double precision floating point number to 2.999999999999999
is 2.99999999999999911182158029987476766109466552734375 whereas the nearest one to 2.9999999999999999 is 3.
Hence the output. Converting to an integral type truncates the value towards zero.
Using round is one way to obviate this effect.
Further reading: Is floating point math broken?

Double comparison in C [duplicate]

This question already has answers here:
Why does floating-point arithmetic not give exact results when adding decimal fractions?
(31 answers)
Closed 7 years ago.
A checker(of type double) array is obtained from a gsl_vector in the following way.
for (i=0; i<M; i++)
{
checker[i] = (double)gsl_vector_get(check, i);
printf(" %f", checker[i]);
}
Array checker has [ 3.000000 -3.000000 -11.000000 -5.000000 ] elements after the above operation (from the output of the above program).
I am facing a weird problem now.
for (i=0; i<M; i++)
{
printf("checker: %f\n", checker[i]);
if(checker[0] == 3.00)
{
printf("Inside If: %f\n", checker[i]);
}
}
The above code outputs
checker: 3.000000
checker: -3.000000
checker: -11.000000
checker: -5.000000
As seen, the if loop inside for is not executed. What could be the problem?
Edit: The above problem is gone when I directly copied [ 3.000000 -3.000000 -11.000000 -5.000000 ] into the checker Array instead of the gsl_vector_get(check,i). Check value comes from dgmev function where a matrix and a vector are multiplied.
Thanks
A floating point number maybe represented in the following form:
[sign] [mantissa] * 2[exponent]
So there will be rounding or relative errors when the space is less in memory.
From wiki:
Single-precision floating-point format is a computer number format that occupies 4 bytes (32 bits) in computer memory and represents a wide dynamic range of values by using a floating point.
The IEEE 754 standard specifies a binary32 as having:
Sign bit: 1 bit
Exponent width: 8 bits
Significand precision: 24 bits (23 explicitly stored)
This gives from 6 to 9 significant decimal digits precision (if a
decimal string with at most 6 significant decimal is converted to IEEE
754 single precision and then converted back to the same number of
significant decimal, then the final string should match the original;
and if an IEEE 754 single precision is converted to a decimal string
with at least 9 significant decimal and then converted back to single,
then the final number must match the original [4]).
Larger (more bits) floating point representations allow for greater precision.
Floating point math is not exact. Simple values like 0.1 cannot be precisely represented using binary floating point numbers, and the limited precision of floating point numbers means that slight changes in the order of operations can change the result. A must read:
What Every Computer Scientist Should Know About Floating-Point Arithmetic
The IEEE standard divides exceptions into 5 classes: overflow, underflow, division by zero, invalid operation and inexact. There is a separate status flag for each class of exception. The meaning of the first three exceptions is self-evident. Invalid operation covers the situations listed in TABLE D-3, and any comparison that involves a NaN.

float strange imprecision error in c [duplicate]

This question already has answers here:
Is floating point math broken?
(31 answers)
Closed 8 years ago.
today happened to me a strange thing, when I try to compile and execute the output of this code isn't what I expected. Here is the code that simply add floating values to an array of float and then print it out.
The simple code:
int main(){
float r[10];
int z;
int i=34;
for(z=0;z<10;z++){
i=z*z*z;
r[z]=i;
r[z]=r[z]+0.634;
printf("%f\n",r[z]);
}
}
the output:
0.634000
1.634000
8.634000
27.634001
64.634003
125.634003
216.634003
343.634003
512.633972
729.633972
note that from the 27 appears numbers after the .634 that should not be there. Anyone know why this happened? It's an event caused by floating point approximation?..
P.S I have a linux debian system, 64 bit
thanks all
A number maybe represented in the following form:
[sign] [mantissa] * 2[exponent]
So there will be rounding or relative errors when the space is less in memory.
From wiki:
Single-precision floating-point format is a computer number format that occupies 4 bytes (32 bits) in computer memory and represents a wide dynamic range of values by using a floating point.
The IEEE 754 standard specifies a binary32 as having:
Sign bit: 1 bit
Exponent width: 8 bits
Significand precision: 24 bits (23 explicitly stored)
This gives from 6 to 9 significant decimal digits precision (if a
decimal string with at most 6 significant decimal is converted to IEEE
754 single precision and then converted back to the same number of
significant decimal, then the final string should match the original;
and if an IEEE 754 single precision is converted to a decimal string
with at least 9 significant decimal and then converted back to single,
then the final number must match the original [4]).
Edit (Edward's comment): Larger (more bits) floating point representations allow for greater precision.
Yes, this is a floating point approximation error or Round-off error. Floating point numbers representation uses quantization to represent a large range of numbers, so it only represent steps and round all the in-between numbers to the nearest step. This cause error if the wanted number is not one of these steps.
In addition to the other useful answers, it can be illustrative to print more digits than the default:
int main(){
float r[10];
int z;
int i=34;
for(z=0;z<10;z++){
i=z*z*z;
r[z]=i;
r[z]=r[z]+0.634;
printf("%.30f\n",r[z]);
}
}
gives
0.634000003337860107421875000000
1.633999943733215332031250000000
8.633999824523925781250000000000
27.634000778198242187500000000000
64.634002685546875000000000000000
125.634002685546875000000000000000
216.634002685546875000000000000000
343.634002685546875000000000000000
512.633972167968750000000000000000
729.633972167968750000000000000000
In particular, note that 0.634 isn't actually "0.634", but instead is the closest number representable by a float.
"float" has only about six digit precision, so it isn't unexpected that you get errors that large.
If you used "double", you would have about 15 digits precision. You would have an error, but you would get for example 125.634000000000003 and not 125.634003.
So you will always get rounding errors and your results will not be quite what you expect, but by using double the effect will be minimal. Warning: If you do things like adding 125 + 0.634 and then subtract 125, the result will (most likely) not be 0.634. No matter whether you use float or double. But with double, the result will be very, very close to 0.634.
In principle, given the choice of float and double, you should never use float, unless you have a very, very good reason.

recurring binary for decimal number [duplicate]

This question already has answers here:
strange output in comparison of float with float literal
(8 answers)
Closed 9 years ago.
float a;
a=8.3;
if(a==8.3)
printf("1");
else
printf("2");
giving a as 8.3 and 8.4 respectively and comparing with 8.3 and 8.4 correspondingly , output becomes 2 but when comparing with 8.5 output is 1. I found that it is related to concept of recurring binary which takes 8 bytes. I want to know how to find which number is recurring binary. kindly give some input.
Recurring numbers are not representable, hence floating point comparison will not work.
Floating point math is not exact. Simple values like 0.2 cannot be precisely represented using binary floating point numbers, and the limited precision of floating point numbers means that slight changes in the order of operations can change the result. Also as in the 2nd comment - floating point literals 8.3 has type double and a has type float.
Comparing with epsilon – absolute error
Since floating point calculations involve a bit of uncertainty we can try to allow for this by seeing if two numbers are ‘close’ to each other. If you decide – based on error analysis, testing, or a wild guess – that the result should always be within 0.00001 of the expected result then you can change your comparison to this:
if (fabs(result - expectedResult) < 0.00001)
For example, 3/7 is a repeating binary fraction, its computed value in double precision is different from its stored value in single precision. Thus the comparison 3/7 with its stored computed value fails.
For more please read - What Every Computer Scientist Should Know About Floating-Point Arithmetic
You should not compare floating point numbers for equality using ==. Because of how floating point numbers are actually stored in memory it will give inaccurate results.
Use something like this to determine if your number a is close enough to the desired value:
if(fabs(a-8.3) < 0.0000005))
There are two problems here.
First is that floating point literals like 8.3 have type double, while a has type float. Doubles and floats store values to different precisions, and for values that don't have an exact floating point representation (such as 8.3), the stored values are slightly different. Thus, the comparison fails.
You could fix this by writing the comparison as a==8.3f; the f suffix forces the literal to be a float instead of a double.
However, it's bad juju to compare floating point values directly; again, most values cannot be represented exactly, but only to an approximation. If a were the result of an expression involving multiple floating-point calcuations, it may not be equivalent to 8.3f. Ideally, you should look at the difference between the two values, and if it's less than some threshold, then they are effectively equivalent:
if ( fabs( a - 8.3f) < EPSILON )
{
// a is "equal enough" to 8.3
}
The exact value of EPSILON depends on a number of factors, not least of which is the magnitude of the values being compared. You only have so many digits of precision, so if the values you're trying to compare are greater than 999999.0, then you can't test for differences within 0.000001 of each other.

Resources