Comparison of double and float - Implicit casting - c

Assume we have the variables double d and float f in the C programming language.
As far as I understand the expression d == (float) d will not be true for all double values since when we cast it to float we truncate it and hence loose precision.
On the other hand f == (double) f should be true for all float values (except for NaN, because it NaN != NaN) since we aren't loosing anything (just extending the mantissa with zeros).
I have read that when comparing a float to a double the float will implicitly be cast to a double: https://en.cppreference.com/w/c/language/conversion#Usual_arithmetic_conversions, is this implicit casting correct for all values (including infinity and NaN)?
I am aware that this is a pretty straightforward question; I have played with it for a while, but it would be great if someone could confirm this. The first part is already answered in other posts, but I haven't found answers for the second part of the question.

In a C implementation that conforms to the C standard, f == (double) f evaluates to true for all float values of f other than NaNs. (For a NaN, f == f is false.) This is true because, in f == (double) f, the left operand is a float, so it is automatically converted to double, and the expression is then equivalent to (double) f == (double) f, and so is inherently true.
The C standard allows implementations to evaluate floating-point expressions with more precision than the nominal types of the operands. However, excess precision would have no effect on cast operators (which are required to discard excess precision) or the == operator. So (double) f == (double) f is not affected by this, and its computed value is the same as its mathematical value.
You might be interested in the result of f == (float) (double) f. In this, since both operands of == have type float, there is no automatic conversion to double. You could ask whether the cast conversion to double introduces some change, and then converting back to float could produce a different value. It cannot.
To see that it cannot, consider if f is infinity. Then (double) f is infinity, and so is (float) (double) f, so the result is a comparison of infinity to infinity, which evaluates to true. (This also holds for negative infinity.) If f is not infinity or a NaN, it is a finite value.
Per C 2018 6.2.5 10, “The set of values of the type float is a subset of the set of values of the type double;…” Therefore, every value representable in float is representable in double, so the conversion to double does not change the value, and neither does the conversion back to float. Therefore, f == (float) (double) f evaluates to true for all float values of f other than NaN.
Note that while you cannot determine whether two NaNs are identical using ==, you could compare the bytes in their representations using memcmp. In this case, conversion to double and back to float is not required to preserve any information in the NaN object other than that it is a NaN; any payload information may be lost.

Yes. Casting to a greater precision will not produce incorrect values, and it is true that comparing a float to a double will implicitly cast the float into a double.

Related

Why is the statement "f == (float)(double)f;" wrong?

I have recently taken a lecture of System Programming, and my professor told me that f == (float)(double) f is which wrong that I cannot get.
I know that double type loses its data when converted to float, but I believe the loss happens only if the stored number in double type cannot be expressed in float type.
Shouldn't it be true as same as x == (int)(double)x; is true?
the picture is the way I'm understanding it
I'm so sorry that I didn't make my question clearly.
the question is not about declaration, but about double type conversion.
I hope you don't lose your precious time because of my fault.
Assuming IEC 60559, the result of f == (float)(double) f depends on the type of f.
Further assuming f is a float, then there's nothing "wrong" about the expression - it will evaluate to true (unless f held NaN, in which case the expression will evaluate to false).
On the other hand, x == (int)(double)x (assuming x is a int) is (potentially) problematic, since a double precision IEC 60559 floating point value only has 53 bits for the significand1, which cannot represent all possible values of an int if it uses more than 53 bits for its value on your platform (admittedly rare). So it will evaluate to true on platforms where ints are 32-bit (using 31 bits for the value), and might evaluate to false on platforms where ints are 64-bit (using 63 bits for the value) (depending on the value).
Relevant quotes from the C standard (6.3.1.4 and 6.3.1.5) :
When a value of integer type is converted to a real floating type, if the value being converted can be represented exactly in the new type, it is unchanged.
When a finite value of real floating type is converted to an integer type other than _Bool, the fractional part is discarded (i.e., the value is truncated toward zero). If the value of the integral part cannot be represented by the integer type, the behavior is undefined.
When a value of real floating type is converted to a real floating type, if the value being converted can be represented exactly in the new type, it is unchanged.
1 a double precision IEC 60559 floating point value consists of 1 bit for the sign, 11 bits for the exponent, and 53 bits for the significand (of which 1 is implied and not stored) - totaling 64 (stored) bits.
Taking the question as posed in the title literally,
Why is the statement “f == (float)(double)f;” wrong?
the statement is "wrong" not in any way related to the representation of floating point values but because it is trivially optimized away by any compiler and thus you might as well have saved the electrons used to store it. It is exactly equivalent to the statement
1;
or, if you like, to the statement (from the original question)
x == (int)(double)x;
(which has exactly the same effect as that in the title, regardless of the available precision of the types int, float, and double, i.e. none whatsoever).
Programming being somewhat concerned with precision you should perhaps take note of the difference between a statement and an expression. An expression has a value which might be true or false or something else, but when you add a semicolon (as you did in the question) it becomes a statement (as you called it in the question) and in the absence of side effects the compiler is free to throw it away.
NaNs are retained through float => double => float, but they not equal themselves.
#include <math.h>
#include <stdio.h>
int main(void) {
float f = HUGE_VALF;
printf("%d\n", f == (float)(double) f);
f = NAN;
printf("%d\n", f == (float)(double) f);
printf("%d\n", f == f);
}
Prints
1
0
0

Can a int value added to a float value?

/**Program for internal typecasting of the compiler**/
#include<stdio.h>
int main(void)
{
float b = 0;
// The Second operand is a integer value which gets added to first operand
// which is of float type. Will the second operand be typecasted to float?
b = (float)15/2 + 15/2;
printf("b is %f\n",b);
return 0;
}
OUTPUT : b is 14.500000
Yes, an integral value can be added to a float value.
The basic math operations (+, -, *, /), when given an operand of type float and int, the int is converted to float first.
So 15.0f + 2 will convert 2 to float (i.e. to 2.0f) and the result is 17.0f.
In your expression (float)15/2 + 15/2, since / has higher precedence than +, the effect will the same as computing ((float)15/2) + (15/2).
(float)15/2 explicitly converts 15 to float and therefore implicitly converts 2 to float, yielding the final result of division as 7.5f.
However, 15/2 does an integer division, so produces the result 7 (there is no implicit conversion to float here).
Since (float)15/2 has been computed as a float, the value 7 is then converted to float before addition. The result will therefore be 14.5f.
Note: floating point types are also characterised by finite precision and rounding error that affects operations. I've ignored that in the above (and it is unlikely to have a notable effect with the particular example anyway).
Note 2: Old versions of C (before the C89/90 standard) actually converted float operands to double in expressions (and therefore had to convert values of type double back to float, when storing the result in a variable of type float). Thankfully the C89/90 standard fixed that.
Rule of thumb: When doing an arithmetic calculation between two different built-in types, the "smaller" type will be converted into the "larger" type.
double > float > long long(C99) > long > short > char.
b = (float)15/2 + 15/2;
Here the first part, (float)15/2 is equivalent to 15.0f / 2. Because an operation involving a "larger" type and a "smaller" type will yield a result in the "larger" type, (float)15/2 is 7.500000, or 7.5f.
When it comes to 15/2, since both operands are integers, the operation is done only on integer level. Therefore the decimal point is stripped (from int), and only gives 7 as a result.
So the expression is calculated into
b = 7.5f + 7;
No doubt you'll have 14.500000 as the final result, because it's exactly 14.5f.
b = (float)15/2 + 15/2;
The first one((float)15/2) will work fine. The second one will also work but will be converted into an integer first, so you will lose precision. Like:
b = (float)15/2 + 15/2;
b = 7.500000f + 7
b = 14.500000
It's worth asking: if an integer value could not be added to floating-point value, what would the symptom be?
Compiler issues error or warning message.
Something gets truncated; you don't get the result you want.
Undefined behavior: you might or might not get the result you want, and the compiler might or might not warn you about it.
But in fact none of these things happen. When you add an integer to a floating-point value, the compiler automatically converts the integer to a floating-point value so it can do the addition that way, and this is perfectly well defined. For example, if you have the code
double d = 7.5;
int i = 7;
double result = d + i;
the compiler interprets this just as if you had written
double result = d + (double)i;
And it works this way for just about all operations: the same logic is applied when you subtract, multiply, or divide a floating-point value and an integer.
And it works this way for just about all types. If you add a long int and an int, the plain int automatically gets converted to a long.
As a general rule (and I really can't think of too many exceptions), the compiler always wants to do arithmetic on two values of the same type. So whenever you have two values of different type, the compiler will just about always convert one of them for you. The full set of rules for how it does this are rather elaborate, but they're all supposed to make sense, and do what you want. The full set of rules is called the usual arithmetic conversions, and if you do a Google search on that phrase you'll find lots of explanations.
One case that does not necessarily do what you want is when the two variables are not different types. In particular, if the two variables are both integers, and the operation you're doing is division, the compiler doesn't have to convert anything: it divides the integer by the integer, discarding any remainder, and gives you an integer result. So if you have
int i = 1;
int j = 2;
int k = i / j;
then k ends up containing 0. And if you have
double d = i / j;
then d ends up containing 0 also, because the compiler follows exactly the same rules when performing the division; it doesn't "peek outside" to see that it's going to need a floating-point result.
P.S. I said, "As a general rule, the compiler always wants to do arithmetic on two values of the same type", and I said I couldn't think of too many exceptions. But if you're curious, two exceptions are the << and >> operators. If you have x << y, where x is a long int and y is a plain int, the compiler does not have to convert y to a long int first.

C don't get the right result

I need the result of this variable in a program, but I don't understand why I can't get the right result.
double r = pow((3/2), 2) * 0.0001;
printf("%f", r);
The problem is integer division, where the fractional part (remainder) is discarded
Try:
double r = pow((3.0/2.0), 2) * 0.0001;
The first argument of the pow() expects a double. Because the ratio: 3/2 uses integer values, the result passed to the argument is 1. By changing to float values, the result of the division can retain the fractional part, and the result becomes 1.5, the form expected by the function.
(3/2) involves two integers, so it's integer division, with the result 1. What you want is floating point (double) division, so coerce the division to use doubles by writing it as (3.0/2.0)

C Floating point zero comparison

Will the following code, with nothing in between the lines, always produce a value of true for the boolean b?
double d = 0.0;
bool b = (d == 0.0);
I'm using g++ version 4.8.1.
Assuming IEEE-754 (and probably most floating point representations), this is correct as 0.0 is representable exactly in all IEEE-754 formats.
Now if we take another literal that is not representable exactly in IEEE-754 binary formats, like 0.1:
double d = 0.1;
bool b = (d == 0.1);
This may result in false value in b object!
The implementation has the right to use for example a double precision for d and a greater precision for the comparison with the literal.
(C99, 5.2.4.2.2p8) "Except for assignment and cast (which remove all extra range and precision), the values of operations with floating operands and values subject to the usual arithmetic conversions and of floating constants are evaluated to a format whose range and precision may be greater than required by the type."

Type of literal operands

I don't think most compilers care if, for instance, you don't append the f to a variable of float type. But just because I want to be as explicit and accurate as I can, I want to express the correct type.
What is the type of the result of two literal operands of different types, or does it depend on the circumstances? e.g.:
int i=1.0f/1;
float f=1.0f/1;
The compiler wouldn't complain in both these instances, is it because of its tolerant view of literal types or because the type of the result of the operation is always converted according to the context?
First, compilers do care about the f suffix. 1.0 is a double. 1.0f is a float. For instance:
printf("%.15f %.15f\n", 0.1f, 0.1);
produces 0.100000001490116 0.100000000000000 (note that the second number is not the real 0.1 any more than the first, it is just closer to it).
This matters not only to determine what numbers can and cannot be represented but also to determine the type of the operations this constant is involved in. For instance:
printf("%.15f %.15f\n", 1.0f/10.0f, 1.0/10.0);
produces the same output as before. What is important to notice here is that 1 and 10 are both representable exactly as float as well as as double. What we are seeing is not rounding taking place at the literal level, but the type of the operation being decided from the type of the operands.
Compilers are not being lenient about your examples. They are strictly applying rules that you can find in the relevant standard, especially sections 6.3.1.4 and 6.3.1.5.
The first one divides a float by an int, which always results in a float (as does dividing an int by a float). This float result is then converted into an int to be stored in i.
The second one takes the float that resulted from the operation, and assigns it to a float variable, preserving the floatness.
There are two separate issues here: How literals with or without suffixes are interpreted by the compiler, and the result of operations performed on mixed numeric types.
The type of literals is important in other contexts, for example (assuming f is a float):
f = f + 1.0; // Cast `f` to double, do a double-precision add, and case the result back.
f = f + 1.0f; // Single-precision addition.
Note that some compilers provide a relaxed mode, where both of them will generate the same code.

Resources