Initializing floating point variable with large literal - c

#include <stdio.h>
int main(void) {
double x = 0.12345678901234567890123456789;
printf("%0.16f\n", x);
return 0;
};
In the code above I'm initializing x with literal that is too large to be represented by the IEEE 754 double. On my PC with gcc 4.9.2 it works well. The literal is rounded to the nearest value that fits into double. I'm wondering what happens behind the scene (on the compiler level) in this case? Does this behaviour depend on the platform? Is it legal?

When you write double x = 0.1;, the decimal number you have written is rounded to the nearest double. So what happens when you write 0.12345678901234567890123456789 is not fundamentally different.
The behavior is essentially implementation-defined, but most compilers will use the nearest representable double in place of the constant. The C standard specifies that it has to be either the double immediately above or the one immediately below.

Related

C - ceil/float rounding to int guarantees

I'm wondering if there are any circumstances where code like this will be incorrect due to floating point inaccuracies:
#include <math.h>
// other code ...
float f = /* random but not NAN or INF */;
int i = (int)floorf(f);
// OR
int i = (int)ceilf(f);
Are there any guarantees about these values? If I have a well-formed f (not NAN or INF) will i always be the integer that it rounds to, whichever way that is.
I can image a situation where (with a bad spec/implementation) the value you get is the value just below the true value rather than just above/equal but is actually closer. Then when you truncate it actually rounds down to the next lower value.
It doesn't seem possible to me given that integers can be exact values in ieee754 floating point but I don't know if float is guaranteed to be that standard
The C standard is sloppy in specifying floating-point behavior, so it is technically not completely specified that floorf(f) produces the correct floor of f or that ceilf(f) produces the correct ceiling of f.
Nonetheless, no C implementations I am aware of get this wrong.
If, instead of floorf(some variable), you have floorf(some expression), there are C implementations that may evaluate the expression in diverse ways that will not get the same result as if IEEE-754 arithmetic were used throughout.
If the C implementation defines __STDC_IEC_559__, it should evaluate the expressions using IEEE-754 arithmetic.
Nonetheless, int i = (int)floorf(f); is of course not guaranteed to set i to the floor of f if the floor of f is out of range of int.

Why does %f print large values when floating point constants are passed instead of variables?

In the given program why did I get different results for each of the printfs?
#include <stdio.h>
int main()
{
float c = 4.4e10;
printf("%f\n", c);
printf("%f\n", 4.4e10);
return 0;
}
And it shows the following output:
44000002048.000000
44000000000.000000
A float is a type that holds a 32-bit floating point number, while the constant 4.4e10 represents a double, which holds a 64-bit floating point number (i.e. a double-precision floating point number)
When you assign 4.4e10 to c, the value 4.4e10 cannot be represented precisely (a rounding error in a parameter called the mantissa), and the closest possible value (44000002048) is stored. When it is passed to printf, it is promoted back to double, including the rounding error.
In the second case, the value is a double the whole time, without narrowing and widening, and it happens to be the case that a double can represent the value exactly.
If this is undesirable behavior, you can declare c as a double for a bit more precision (but beware that you'll still hit precision limits eventually).
You're actually printing the values of two different types here.
In the first case you're assigning a value to a variable of type float. The precision of a float is roughly 6 or 7 decimal digits, so unless the value can be represented exactly you'll see the closest value that can be represented by that type.
In the second case you're passing the constant 4.4e10 which has type double. This type has around 16 decimal digits of precision, and the value is within that range, so the exact value is printed.

When does a float constant overflow if it is implicitly converted to int type

I have two code snippets and both produce different results. I am using TDM-GCC 4.9.2 compiler and my compiler is 32-bit version
( Size of int is 4 bytes and Minimum value in float is -3.4e38 )
Code 1:
int x;
x=2.999999999999999; // 15 '9s' after decimal point
printf("%d",x);
Output:
2
Code 2:
int x;
x=2.9999999999999999; // 16 '9s' after decimal point
printf("%d",x);
Output:
3
Why is the implicit conversion different in these cases?
Is it due to some overflow in the Real constant specified and if so how does it happen?
(Restricting this answer to IEEE754).
When you assign a constant to a floating point, the IEEE754 standard requires the closest possible floating point number to be picked. Both the numbers you present cannot be represented exactly.
The nearest IEEE754 double precision floating point number to 2.999999999999999
is 2.99999999999999911182158029987476766109466552734375 whereas the nearest one to 2.9999999999999999 is 3.
Hence the output. Converting to an integral type truncates the value towards zero.
Using round is one way to obviate this effect.
Further reading: Is floating point math broken?

Dodging the inaccuracy of a floating point number

I totally understand the problems associated with floating points, but I have seen a very interesting behavior that I can't explain.
float x = 1028.25478;
long int y = 102825478;
float z = y/(float)100000.0;
printf("x = %f ", x);
printf("z = %f",z);
The output is:
x = 1028.254761 z = 1028.254780
Now if floating numbers failed to represent that specific random value (1028.25478) when I assigned that to variable x. Why isn't it the same in case of variable z?
P.S. I'm using pellesC IDE to test the code (C11 compiler).
I am pretty sure that what happens here is that the latter floating point variable is elided and instead kept in a double-precision register; and then passed as is as an argument to printf. Then the compiler will believe that it is safe to pass this number at double precision after default argument promotions.
I managed to produce a similar result using GCC 7.2.0, with these switches:
-Wall -Werror -ffast-math -m32 -funsafe-math-optimizations -fexcess-precision=fast -O3
The output is
x = 1028.254761 z = 1028.254800
The number is slightly different there^.
The description for -fexcess-precision=fast says:
-fexcess-precision=style
This option allows further control over excess precision on
machines where floating-point operations occur in a format with
more precision or range than the IEEE standard and interchange
floating-point types. By default, -fexcess-precision=fast is in
effect; this means that operations may be carried out in a wider
precision than the types specified in the source if that would
result in faster code, and it is unpredictable when rounding to
the types specified in the source code takes place. When
compiling C, if -fexcess-precision=standard is specified then
excess precision follows the rules specified in ISO C99; in
particular, both casts and assignments cause values to be rounded
to their semantic types (whereas -ffloat-store only affects
assignments). This option [-fexcess-precision=standard] is enabled by default for C if a
strict conformance option such as -std=c99 is used. -ffast-math
enables -fexcess-precision=fast by default regardless of whether
a strict conformance option is used.
This behaviour isn't C11-compliant
Restricting this to IEEE754 strict floating point, the answers should be the same.
1028.25478 is actually 1028.2547607421875. That accounts for x.
In the evaluation of y / (float)100000.0;, y is converted to a float, by C's rules of argument promotion. The closest float to 102825478 is 102825480. IEEE754 requires the returning of the the best result of a division, which should be 1028.2547607421875 (the value of z): the closest number to 1028.25480.
So my answer is at odds with your observed behaviour. I put that down to your compiler not implementing floating point strictly; or perhaps not implementing IEEE754.
Code acts as if z was a double and y/(float)100000.0 is y/100000.0.
float x = 1028.25478;
long int y = 102825478;
double z = y/100000.0;
// output
x = 1028.254761 z = 1028.254780
An important consideration is FLT_EVAL_METHOD. This allows select floating point code to evaluate at higher precision.
#include <float.h>
#include <stdio.h>
printf("FLT_EVAL_METHOD %d\n", FLT_EVAL_METHOD);
Except for assignment and cast ..., the values yielded by operators with floating operands and values subject to the usual arithmetic conversions and of floating constants are evaluated to a format whose range and precision may be greater than required by the type. The use of evaluation formats is characterized by the implementation-defined value of FLT_EVAL_METHOD.
-1 indeterminable;
0 evaluate all operations and constants just to the range and precision of the
type;
1 evaluate ... type float and double to the
range and precision of the double type, evaluate long double
... to the range and precision of the long double
type;
2 evaluate all ... to the range and precision of the
long double type.
Yet this does not apply as z with float z = y/(float)100000.0; should lose all higher precision on the assignment.
I agree with #Antti Haapala that code is using a speed optimization that has less adherence to the expected rules of floating point math.

In C, is specifying 2.0f the same as 2.000000f?

Are these lines the same?
float a = 2.0f;
and
float a = 2.000000f;
Yes, it is. No matter what representation you use, when the code is compiled, the number will be converted to a unique binary representation. There's only one way of representing 2 in the IEEE 754 binary32 standard used in modern computers to represent float numbers.
The only thing the C99 standard has to say on the matter is this (section 6.4.4.2):
For decimal floating constants ... the result is either
the nearest representable value, or the larger or smaller representable value immediately
adjacent to the nearest representable value, chosen in an implementation-defined manner.
That bit about "implementation-defined" means that technically an implementation could choose to do something different in each case. Although in practice, nothing weird is going to happen for a value like 2.
It's important to bear in mind that the C standards don't require IEEE-754.
Yes, they are the same.
Simple check:
http://codepad.org/FOQsufB4
int main() {
printf("%d",2.0f == 2.000000f);
}
^ Will output 1 (true)
Yes Sure it is the same extra zeros on the right are ignored just likes zeros on the left

Resources