I don't think most compilers care if, for instance, you don't append the f to a variable of float type. But just because I want to be as explicit and accurate as I can, I want to express the correct type.
What is the type of the result of two literal operands of different types, or does it depend on the circumstances? e.g.:
int i=1.0f/1;
float f=1.0f/1;
The compiler wouldn't complain in both these instances, is it because of its tolerant view of literal types or because the type of the result of the operation is always converted according to the context?
First, compilers do care about the f suffix. 1.0 is a double. 1.0f is a float. For instance:
printf("%.15f %.15f\n", 0.1f, 0.1);
produces 0.100000001490116 0.100000000000000 (note that the second number is not the real 0.1 any more than the first, it is just closer to it).
This matters not only to determine what numbers can and cannot be represented but also to determine the type of the operations this constant is involved in. For instance:
printf("%.15f %.15f\n", 1.0f/10.0f, 1.0/10.0);
produces the same output as before. What is important to notice here is that 1 and 10 are both representable exactly as float as well as as double. What we are seeing is not rounding taking place at the literal level, but the type of the operation being decided from the type of the operands.
Compilers are not being lenient about your examples. They are strictly applying rules that you can find in the relevant standard, especially sections 6.3.1.4 and 6.3.1.5.
The first one divides a float by an int, which always results in a float (as does dividing an int by a float). This float result is then converted into an int to be stored in i.
The second one takes the float that resulted from the operation, and assigns it to a float variable, preserving the floatness.
There are two separate issues here: How literals with or without suffixes are interpreted by the compiler, and the result of operations performed on mixed numeric types.
The type of literals is important in other contexts, for example (assuming f is a float):
f = f + 1.0; // Cast `f` to double, do a double-precision add, and case the result back.
f = f + 1.0f; // Single-precision addition.
Note that some compilers provide a relaxed mode, where both of them will generate the same code.
Related
I totally understand the problems associated with floating points, but I have seen a very interesting behavior that I can't explain.
float x = 1028.25478;
long int y = 102825478;
float z = y/(float)100000.0;
printf("x = %f ", x);
printf("z = %f",z);
The output is:
x = 1028.254761 z = 1028.254780
Now if floating numbers failed to represent that specific random value (1028.25478) when I assigned that to variable x. Why isn't it the same in case of variable z?
P.S. I'm using pellesC IDE to test the code (C11 compiler).
I am pretty sure that what happens here is that the latter floating point variable is elided and instead kept in a double-precision register; and then passed as is as an argument to printf. Then the compiler will believe that it is safe to pass this number at double precision after default argument promotions.
I managed to produce a similar result using GCC 7.2.0, with these switches:
-Wall -Werror -ffast-math -m32 -funsafe-math-optimizations -fexcess-precision=fast -O3
The output is
x = 1028.254761 z = 1028.254800
The number is slightly different there^.
The description for -fexcess-precision=fast says:
-fexcess-precision=style
This option allows further control over excess precision on
machines where floating-point operations occur in a format with
more precision or range than the IEEE standard and interchange
floating-point types. By default, -fexcess-precision=fast is in
effect; this means that operations may be carried out in a wider
precision than the types specified in the source if that would
result in faster code, and it is unpredictable when rounding to
the types specified in the source code takes place. When
compiling C, if -fexcess-precision=standard is specified then
excess precision follows the rules specified in ISO C99; in
particular, both casts and assignments cause values to be rounded
to their semantic types (whereas -ffloat-store only affects
assignments). This option [-fexcess-precision=standard] is enabled by default for C if a
strict conformance option such as -std=c99 is used. -ffast-math
enables -fexcess-precision=fast by default regardless of whether
a strict conformance option is used.
This behaviour isn't C11-compliant
Restricting this to IEEE754 strict floating point, the answers should be the same.
1028.25478 is actually 1028.2547607421875. That accounts for x.
In the evaluation of y / (float)100000.0;, y is converted to a float, by C's rules of argument promotion. The closest float to 102825478 is 102825480. IEEE754 requires the returning of the the best result of a division, which should be 1028.2547607421875 (the value of z): the closest number to 1028.25480.
So my answer is at odds with your observed behaviour. I put that down to your compiler not implementing floating point strictly; or perhaps not implementing IEEE754.
Code acts as if z was a double and y/(float)100000.0 is y/100000.0.
float x = 1028.25478;
long int y = 102825478;
double z = y/100000.0;
// output
x = 1028.254761 z = 1028.254780
An important consideration is FLT_EVAL_METHOD. This allows select floating point code to evaluate at higher precision.
#include <float.h>
#include <stdio.h>
printf("FLT_EVAL_METHOD %d\n", FLT_EVAL_METHOD);
Except for assignment and cast ..., the values yielded by operators with floating operands and values subject to the usual arithmetic conversions and of floating constants are evaluated to a format whose range and precision may be greater than required by the type. The use of evaluation formats is characterized by the implementation-defined value of FLT_EVAL_METHOD.
-1 indeterminable;
0 evaluate all operations and constants just to the range and precision of the
type;
1 evaluate ... type float and double to the
range and precision of the double type, evaluate long double
... to the range and precision of the long double
type;
2 evaluate all ... to the range and precision of the
long double type.
Yet this does not apply as z with float z = y/(float)100000.0; should lose all higher precision on the assignment.
I agree with #Antti Haapala that code is using a speed optimization that has less adherence to the expected rules of floating point math.
/**Program for internal typecasting of the compiler**/
#include<stdio.h>
int main(void)
{
float b = 0;
// The Second operand is a integer value which gets added to first operand
// which is of float type. Will the second operand be typecasted to float?
b = (float)15/2 + 15/2;
printf("b is %f\n",b);
return 0;
}
OUTPUT : b is 14.500000
Yes, an integral value can be added to a float value.
The basic math operations (+, -, *, /), when given an operand of type float and int, the int is converted to float first.
So 15.0f + 2 will convert 2 to float (i.e. to 2.0f) and the result is 17.0f.
In your expression (float)15/2 + 15/2, since / has higher precedence than +, the effect will the same as computing ((float)15/2) + (15/2).
(float)15/2 explicitly converts 15 to float and therefore implicitly converts 2 to float, yielding the final result of division as 7.5f.
However, 15/2 does an integer division, so produces the result 7 (there is no implicit conversion to float here).
Since (float)15/2 has been computed as a float, the value 7 is then converted to float before addition. The result will therefore be 14.5f.
Note: floating point types are also characterised by finite precision and rounding error that affects operations. I've ignored that in the above (and it is unlikely to have a notable effect with the particular example anyway).
Note 2: Old versions of C (before the C89/90 standard) actually converted float operands to double in expressions (and therefore had to convert values of type double back to float, when storing the result in a variable of type float). Thankfully the C89/90 standard fixed that.
Rule of thumb: When doing an arithmetic calculation between two different built-in types, the "smaller" type will be converted into the "larger" type.
double > float > long long(C99) > long > short > char.
b = (float)15/2 + 15/2;
Here the first part, (float)15/2 is equivalent to 15.0f / 2. Because an operation involving a "larger" type and a "smaller" type will yield a result in the "larger" type, (float)15/2 is 7.500000, or 7.5f.
When it comes to 15/2, since both operands are integers, the operation is done only on integer level. Therefore the decimal point is stripped (from int), and only gives 7 as a result.
So the expression is calculated into
b = 7.5f + 7;
No doubt you'll have 14.500000 as the final result, because it's exactly 14.5f.
b = (float)15/2 + 15/2;
The first one((float)15/2) will work fine. The second one will also work but will be converted into an integer first, so you will lose precision. Like:
b = (float)15/2 + 15/2;
b = 7.500000f + 7
b = 14.500000
It's worth asking: if an integer value could not be added to floating-point value, what would the symptom be?
Compiler issues error or warning message.
Something gets truncated; you don't get the result you want.
Undefined behavior: you might or might not get the result you want, and the compiler might or might not warn you about it.
But in fact none of these things happen. When you add an integer to a floating-point value, the compiler automatically converts the integer to a floating-point value so it can do the addition that way, and this is perfectly well defined. For example, if you have the code
double d = 7.5;
int i = 7;
double result = d + i;
the compiler interprets this just as if you had written
double result = d + (double)i;
And it works this way for just about all operations: the same logic is applied when you subtract, multiply, or divide a floating-point value and an integer.
And it works this way for just about all types. If you add a long int and an int, the plain int automatically gets converted to a long.
As a general rule (and I really can't think of too many exceptions), the compiler always wants to do arithmetic on two values of the same type. So whenever you have two values of different type, the compiler will just about always convert one of them for you. The full set of rules for how it does this are rather elaborate, but they're all supposed to make sense, and do what you want. The full set of rules is called the usual arithmetic conversions, and if you do a Google search on that phrase you'll find lots of explanations.
One case that does not necessarily do what you want is when the two variables are not different types. In particular, if the two variables are both integers, and the operation you're doing is division, the compiler doesn't have to convert anything: it divides the integer by the integer, discarding any remainder, and gives you an integer result. So if you have
int i = 1;
int j = 2;
int k = i / j;
then k ends up containing 0. And if you have
double d = i / j;
then d ends up containing 0 also, because the compiler follows exactly the same rules when performing the division; it doesn't "peek outside" to see that it's going to need a floating-point result.
P.S. I said, "As a general rule, the compiler always wants to do arithmetic on two values of the same type", and I said I couldn't think of too many exceptions. But if you're curious, two exceptions are the << and >> operators. If you have x << y, where x is a long int and y is a plain int, the compiler does not have to convert y to a long int first.
Going on understanding of these datatypes as primitives
(int) char, and (char) int are intepretations of data. (int) c gives the integer value of that character, and (char) 14 gives you back the character encoded by 14.
I've always understood this as being a "memory parse", such that it just takes the value at that position and then applies a type filter to it.
Given that floating points are stored as some version of scientific notation, what is stored in memory should be garbage as an integer. Looking into this utility http://www.h-schmidt.net/FloatConverter/IEEE754.html it appears that the whole number portion is separated.
However, since this is in the higher portion of memory, how does the int cast know to "reformat"? Does the compiler identify that it was a float and apply special handling, or what's going on?
Your understanding of casts is completely wrong. Casts are nothing but explicit requests for a value conversion from one type to another. They do not reinterpret the representation of one type as if it had a different type. The source code:
float f = 42.5;
int x;
x = (int)f;
simply instructs the compiler to produce code that truncates the floating point value of the expression f to an integer and store the result in the object x.
I've always understood this as being a "memory parse", such that it just takes the value at that position and then applies a type filter to it.
That is an incorrect understanding.
The language specifies conversions between the fundamental arithmetic types. Lookup "Usual Arithmetic Conversions" on the web. You will find a lot of links that describe that. For converting a floating point type to an integral type, this is what the C99 Standard has to say:
6.3.1.4 Real floating and integer
1 When a finite value of real floating type is converted to an integer type other than _Bool, the fractional part is discarded (i.e., the value is truncated toward zero). If the value of the integral part cannot be represented by the integer type, the behavior is undefined.
float f = 4.5;
int i = (int); // i is 4
f = -6.3;
i = (int)f; // i is -6
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
There are numerous reference on the subject (here or here). However I still fails to understand why the following is not considered UB and properly reported by my favorite compiler (insert clang and/or gcc) with a neat warning:
// f1, f2 and epsilon are defined as double
if ( f1 / f2 <= epsilon )
As per C99:TC3, 5.2.4.2.2 §8: we have:
Except for assignment and cast (which remove all extra range and
precision), the values of operations with floating operands and values
subject to the usual arithmetic conversions and of floating constants
are evaluated to a format whose range and precision may be greater
than required by the type. [...]
Using typical compilation f1 / f2 would be read directly from the FPU. I've tried here using gcc -m32, with gcc 5.2. So f1 / f2 is (over-here) on an 80 bits (just a guess dont have the exact spec here) floating point register. There is not type promotion here (per standard).
I've also tested clang 3.5, this compiler seems to cast the result of f1 / f2 back to a normal 64 bits floating point representation (this is an implementation defined behavior but for my question I prefer the default gcc behavior).
As per my understanding the comparison will be done in between a type for which we don't know the size (ie. format whose range and precision may be greater) and epsilon which size is exactly 64 bits.
What I really find hard to understand is equality comparison with a well known C types (eg. 64bits double) and something whose range and precision may be greater. I would have assumed that somewhere in the standard some kind of promotion would be required (eg. standard would mandates that epsilon would be promoted to a wider floating point type).
So the only legitimate syntaxes should instead be:
if ( (double)(f1 / f2) <= epsilon )
or
double res = f1 / f2;
if ( res <= epsilon )
As a side note, I would have expected the litterature to document only the operator <, in my case:
if ( f1 / f2 < epsilon )
Since it is always possible to compare floating point with different size using operator <.
So in which cases the first expression would make sense ? In other word, how could the standard defines some kind of equality operator in between two floating point representation with different size ?
EDIT: The whole confusion here, was that I assumed it was possible to compare two float of different size. Which cannot possibly happen. (thanks #DevSolar!).
<= is well-defined for all possible floating point values.
There is one exception though: the case when at least one of the arguments is uninitialised. But that's more to do with reading an uninitialised variable being UB; not the <= itself
I think you're confusing implementation-defined with undefined behavior. The C language doesn't mandate IEEE 754, so all floating point operations are essentially implementation-defined. But this is different from undefined behavior.
After a bit of chat, it became clear where the miscommunication came from.
The quoted part of the standard explicitly allows an implementation to use wider formats for floating operands in calculations. This includes, but is not limited to, using the long double format for double operands.
The standard section in question also does not call this "type promotion". It merely refers to a format being used.
So, f1 / f2 may be done in some arbitrary internal format, but without making the result any other type than double.
So when the result is compared (by either <= or the problematic ==) to epsilon, there is no promotion of epsilon (because the result of the division never got a different type), but by the same rule that allowed f1 / f2 to happen in some wider format, epsilon is allowed to be evaluated in that format as well. It is up to the implementation to do the right thing here.
The value of FLT_EVAL_METHOD might tell what exactly an implementation is doing exactly (if set to 0, 1, or 2 respectively), or it might have a negative value, which indicates "indeterminate" (-1) or "implementation-defined", which means "look it up in your compiler manual".
This gives an implementation "wiggle room" to do any kind of funny things with floating operands, as long as at least the range / precision of the actual type is preserved. (Some older FPUs had "wobbly" precisions, depending on the kind of floating operation performed. The quoted part of the standard caters for exactly that.)
In no case may any of this lead to undefined behaviour. Implementation-defined, yes. Undefined, no.
The only case where you would get undefined behavior is when a large floating point variable gets demoted to a smaller one which cannot represent the contents. I don't quite see how that applies in this case.
The text you quote is concerned about whether or not floats may be evaluated as doubles etc, as indicated by the text you unfortunately didn't include in the quote:
The use of evaluation formats is characterized by the
implementation-defined value of FLT_EVAL_METHOD:
-1 indeterminable;
0 evaluate all operations and constants just to the range and precision of the type;
1 evaluate operations and constants of type float and double to the range and precision of the double type, evaluate long double operations and constants to the range and precision of the long double type;
2 evaluate all operations and constants to the range and precision of the long double type.
However, I don't believe this macro overwrites the behavior of the usual arithmetic conversions. The usual arithmetic conversions guarantee that you can never compare two float variables of different size. So I don't see how you could run into undefined behavior here. The only possible issue you would have is performance.
In theory, in case FLT_EVAL_METHOD == 2 then your operands could indeed get evaluated as type long double. But please note that if the compiler allows such implicit promotions to larger types, there will be a reason for it.
According to the text you cited, explicit casting will counter this compiler behavior.
In which case the code if ( (double)(f1 / f2) <= epsilon ) is nonsense. By the time you cast the result of f1 / f2 to double, the calculation is already done and have been carried out on long double. The calculation of the result <= epsilon will however be carried out on double since you forced this with the cast.
To avoid long double entirely, you would have to write the code as:
if ( (double)((double)f1 / (double)f2) <= epsilon )
or to increase readability, preferably:
double div = (double)f1 / (double)f2;
if( (double)div <= (double)epsilon )
But again, code like this does only make sense if you know that there will be implicit promotions, which you wish to avoid to increase performance. In practice, I doubt you'll ever run into that situation, as the compiler is most likely far more capable than the programmer to make such decisions.
Can I compare a floating-point number to an integer?
Will the float compare to integers in code?
float f; // f has a saved predetermined floating-point value to it
if (f >=100){__asm__reset...etc}
Also, could I...
float f;
int x = 100;
x+=f;
I have to use the floating point value f received from an attitude reference system to adjust a position value x that controls a PWM signal to correct for attitude.
The first one will work fine. 100 will be converted to a float, and IEE754 can represent all integers exactly as floats, up to about 223.
The second one will also work but will be converted into an integer first, so you'll lose precision (that's unavoidable if you're turning floats into integers).
Since you've identified yourself as unfamiliar with the subtleties of floating point numbers, I'll refer you to this fine paper by David Goldberg: What Every Computer Scientist Should Know About Floating-Point Arithmetic (reprint at Sun).
After you've been scared by that, the reality is that most of the time floating point is a huge boon to getting calculations done. And modern compilers and languages (including C) handle conversions sensibly so that you don't have to worry about them. Unless you do.
The points raised about precision are certainly valid. An IEEE float effectively has only 24 bits of precision, which is less than a 32-bit integer. Use of double for intermediate calculations will push all rounding and precision loss out to the conversion back to float or int.
Mixed-mode arithmetic (arithmetic between operands of different types and/or sizes) is legal but fragile. The C standard defines rules for type promotion in order to convert the operands to a common representation. Automatic type promotion allows the compiler to do something sensible for mixed-mode operations, but "sensible" does not necessarily mean "correct."
To really know whether or not the behavior is correct you must first understand the rules for promotion and then understand the representation of the data types. In very general terms:
shorter types are converted to longer types (float to double, short to int, etc.)
integer types are converted to floating-point types
signed/unsigned conversions favor avoiding data loss (whether signed is converted to
unsigned or vice-versa depends on the size of the respective types)
Whether code like x > y (where x and y have different types) is right or wrong depends on the values that x and y can take. In my experience it's common practice to prohibit (via the coding standard) implicit type conversions. The programmer must consider the context and explicitly perform any type conversions necessary.
Can you compare a float and an integer, sure. But the problem you will run into is precision. On most C/C++ implementations, float and int have the same size (4 bytes) and wildly different precision levels. Neither type can hold all values of the other type. Since one type cannot be converted to the other type without loss of precision and the types cannot be native compared, doing a comparison without considering another type will result in precision loss in some scenarios.
What you can do to avoid precision loss is to convert both types to a type which has enough precision to represent all values of float and int. On most systems, double will do just that. So the following usually does a non-lossy comparison
float f = getSomeFloat();
int i = getSomeInt();
if ( (double)i == (double)f ) {
...
}
LHS defines the precision,
So if your LHS is int and RHS is float, then this results in loss of precision.
Also take a look at FP related CFAQ
Yes, you can compare them, you can do math on them without terribly much regard for which is which, in most cases. But only most. The big bugaboo is that you can check for f<i etc. but should not check for f==i. An integer and a float that 'should' be identical in value are not necessarily identical.
Yeah, it'll work fine. Specifically, the int will be converted to float for the purposes of the conversion. In the second one you'll need to cast to int but it should be fine otherwise.
Yes, and sometimes it'll do exactly what you expect.
As the others have pointed out, comparing, eg, 1.0 == 1 will work out, because the integer 1 is type cast to double (not float) before the comparison.
However, other comparisons may not.
About that, the notation 1.0 is of type double so the comparison is made in double by type promotion rules like said before. 1.f or 1.0f is of type float and the comparison would have been made in float. And it would have worked as well since we said that 2^23 first integers are representible in a float.