Negative float values casting to unsigned int on Arm and intel - c

I am facing the below which is when I run below code
int main()
{
float test = -6.25f;
unsigned int result_1;
result_1= (unsigned int)test ;
printf("test result_1: %x \n", result_1);
return 0;
}
The output on arm is equal 0 and
output on intel is equal 4294967291 .
Do you know how can force intel compiler to make output 0 as arm compiler?

From https://en.cppreference.com/w/c/language/conversion :
Real floating-integer conversions
A finite value of any real floating type can be implicitly converted to any integer type. Except where covered by boolean conversion above, the rules are:
The fractional part is discarded (truncated towards zero).
If the resulting value can be represented by the target type, that value is used
otherwise, the behavior is undefined
Your code does:
float test = -6.25f;
(unsigned int)test;
The type unsigned int is not able to represent the value -6. You can't convert a float with a negative value to unsigned type. Your code has undefined behavior.
Do you know how can force intel compiler to make output 0 as arm compiler?
Check if the value is less than 0.
int result_1 = test < 0 ? 0 : <something else here>;
If your compiler if following ANNEX F addition to the C language, then according to https://port70.net/~nsz/c/c11/n1570.html#F.4
[...] if the integral part of the floating value exceeds the range of the integer type, then the ''invalid'' floating- point exception is raised and the resulting value is unspecified [...]
In which case anyway, the resulting value is unspecified, so it may differ between compilers as you experience.

When converting an float to unsigned, code is subject to:
When a finite value of real floating type is converted to an integer type other than _Bool, the fractional part is discarded (i.e., the value is truncated toward zero). If the value of the integral part cannot be represented by the integer type, the behavior is undefined.) C17dr § 6.3.1.4 1
To avoid undefined behavior (UB) for that conversion for out-of-range values, code needs to test the float value first. The typical valid range for conversion is -0.999... to 4,294,967,295.999...
#define UINT_MAX_PLUS1_FLT ((UINT_MAX/2 + 1)*2.0f)
unsigned float_to_unsigned(float f) {
// Exclude values that are too small, too big or NAN
if (f > -1.0f && f < UINT_MAX_PLUS1_FLT) {
return (unsigned) f;
}
return 0; // Adjust as desired
}

Related

If 'float'<= INT_MAX is true, then why (int)'float' may trigger undefined behavior?

Sample code (t0.c):
#include <stdio.h>
#include <limits.h>
#define F 2147483600.0f
int main(void)
{
printf("F %f\n", F);
printf("INT_MAX %d\n", INT_MAX);
printf("F <= INT_MAX %d\n", F <= INT_MAX);
if ( F <= INT_MAX )
{
printf("(int)F %d\n", (int)F);
}
return 0;
}
Invocations:
$ gcc t0.c && ./a.exe
F 2147483648.000000
INT_MAX 2147483647
F <= INT_MAX 1
(int)F 2147483647
$ clang t0.c && ./a.exe
F 2147483648.000000
INT_MAX 2147483647
F <= INT_MAX 1
(int)F 0
Questions:
If F is printed as 2147483648.000000, then why F <= INT_MAX is true?
What is the correct way to avoid UB here?
UPD. Solution:
if ( lrintf(F) <= INT_MAX )
{
printf("(int)F %d\n", (int)F);
}
UPD2. Better solution:
if ( F <= nextafter(((float)INT_MAX) + 1.0f, -INFINITY) )
{
printf("(int)F %d\n", (int)F);
}
You're comparing a value of type int with a value of type float. The operands of the <= operator need to first be converted to a common type to evaluate the comparison.
This falls under the usual arithmetic conversions. In this case, the value of type int is converted to type float. And because the value in question (2147483647) cannot be represented exactly as a float, it results in the closest representable value, in this case 2147483648. This matches what the constant represented by the macro F converts to, so the comparison is true.
Regarding the cast of F to type int, because the integer part of F is outside the range of an int, this triggers undefined behavior.
Section 6.3.1.4 of the C standard dictates how these conversions from integer to floating point and from floating point to integer are performed:
1 When a finite value of real floating type is converted to an integer type other than _Bool, the fractional part is discarded
(i.e., the value is truncated toward zero). If the value of
the integral part cannot be represented by the integer type, the
behavior is undefined.
2 When a value of integer type is converted to a real floating type, if the value being converted can be represented
exactly in the new type, it is unchanged. If the value being
converted is in the range of values that can be represented
but cannot be represented exactly, the result is either the nearest
higher or nearest lower representable value, chosen in an
implementation-defined manner. If the value being converted is
outside the range of values that can be represented, the
behavior is undefined. Results of some implicit conversions may
be represented in greater range and precision than that
required by the new type (see 6.3.1.8 and 6.8.6.4)
And section 6.3.1.8p1 dictates how the usual arithmetic conversions are performed:
First, if the corresponding real type of either operand is long double, the other operand is converted, without change of type domain,
to a type whose corresponding real type is long double.
Otherwise, if the corresponding real type of either operand
is double, the other operand is converted, without change of type
domain, to a type whose corresponding real type is double.
Otherwise, if the corresponding real type of either operand
is float, the other operand is converted, without change of type
domain, to a type whose corresponding real type is float.
As for how to avoid undefined behavior in this case, if the constant F has no suffix i.e. 2147483600.0 then it has type double. This type can represent exactly any 32 bit integer value, so the given value is not rounded and can be stored in an int.
The root cause of your problem is the implicit conversion from the INT_MAX literal to a float value, when doing the F <= INT_MAX comparisons. The float data type simply does not have enough precision to properly store the 2147483647 value, and (it so happens), the value of 2147483648 is stored, instead†.
The clang-cl compiler warns about this:
warning : implicit conversion from 'int' to 'float' changes value from
2147483647 to 2147483648 [-Wimplicit-const-int-float-conversion]
And, you can confirm this yourself by adding the following line to your code:
printf("(float)IMAX %f\n", (float)INT_MAX);
That line displays (float)IMAX 2147483648.000000 on my system (Windows 10, 64-bit, clang-cl in Visual Studio 2019).
† The actual value stored in the float in such cases is implementation-defined, as pointed out in the excellent answer by dbush.
If F is printed as 2147483648.000000, then why F <= INT_MAX is true?
If 'float'<= INT_MAX is true, then why (int)'float' may trigger undefined behavior?
#define F 2147483600.0f ... if ( F <= INT_MAX ) is an insufficient test as it is imprecise. The conversion of INT_MAX to float typically suffers rounding.
What is the correct way to avoid UB here?
To test if a float is convertible to a int without a problem, first review the spec:
When a finite value of standard floating type is converted to an integer type other than _Bool, the fractional part is discarded (i.e., the value is truncated toward zero). If the value of the integral part cannot be represented by the integer type, the behavior is undefined. C17dr
This means floating point values like [-2,147,483,648.999... to 2,147,483,647.999...] are acceptable float - or with extended math: (INT_MIN - 1 to INT_MAX + 1). Note [] v. ().
There is no need for wider types.
Code needs to compare the range, as float, precisely.
(INT_MAX/2 + 1) * 2.0f is exact as INT_MAX is a Mersenne Number. *1
// Form INT_MAX_PLUS1 as a float
#define F_INT_MAX_PLUS1 ((INT_MAX/2 + 1) * 2.0f)
// With 2's complement, INT_MIN is exact as a float.
if (some_float < F_INT_MAX_PLUS1 && (some_float - INT_MIN) > -1.0f)) {
int i = (int) some_float; // no problem
} else {
puts("Conversion problem");
}
Tip: form the test like above to also catch some_float as not-a-number.
*1 some_int_MAX may be a problem with UINT128_MAX or more due to limited float exponent range.

cast float to unsigned int in C with gcc

I am using gcc to test some simple casts between float to unsigned int.
The following piece of code gives the result 0.
const float maxFloat = 4294967295.0;
unsigned int a = (unsigned int) maxFloat;
printf("%u\n", a);
0 is printed (which I belive is very strange).
On the other hand the following piece of code:
const float maxFloat = 4294967295.0;
unsigned int a = (unsigned int) (signed int) maxFloat;
printf("%u\n", a);
prints 2147483648 which I belive is the correct results.
What happens that I get 2 different results?
If you first do this:
printf("%f\n", maxFloat);
The output you'll get is this:
4294967296.000000
Assuming a float is implemented as an IEEE754 single precision floating point type, the value 4294967295.0 cannot be represented exactly by this type because there's aren't enough bits of precision. The closest value it can store is 4294967296.0.
Assuming an int (and likewise unsigned int) is 32 bits, the value 4294967296.0 is outside the range of both of these types. Converting a floating point type to an integer type when the value cannot be represented in the given integer type invokes undefined behavior.
This is detailed in section 6.3.1.4 of the C standard which dictates conversion from floating point types to integer types:
1 When a finite value of real floating type is converted to an integer type other than _Bool, the fractional part is discarded (i.e.,
the value is truncated toward zero). If the value of the integral part
cannot be represented by the integer type, the behavior is undefined.61)
...
61) The remaindering operation performed when a value of integer type
is converted to unsigned type need not be performed when a value of
real floating type is converted to unsigned type. Thus, the range of
portable real floating values is (−1, Utype_MAX+1).
The footnote in the above passage is referencing section 6.3.1.3, which details integer to integer conversions:
1 When a value with integer type is converted to another integer type other than _Bool, if the value can be represented by the new
type, it is unchanged.
2 Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that
can be represented in the new type until the value is in the range of
the new type.
3 Otherwise, the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an
implementation-defined signal is raised.
The behavior you see in the first code snippet is consistent with an out-of-range conversion to an unsigned type when the value in question is an integer, however because the value being converted has a floating point type it is undefined behavior.
Just because one implementation does this doesn't mean that all will. In fact, gcc gives a different result if you change the optimization settings.
For example, on my machine using gcc 5.4.0, given this code:
float n = 4294967296;
printf("n=%f\n", n);
unsigned int a = (unsigned int) n;
int b = (signed int) n;
unsigned int c = (unsigned int) (signed int) n;
printf("a=%u\n", a);
printf("b=%d\n", b);
printf("c=%u\n", c);
I get the following results with -O0:
n=4294967296.000000
a=0
b=-2147483648
c=2147483648
And this with -O1:
n=4294967296.000000
a=4294967295
b=2147483647
c=2147483647
If on the other hand n is defined as long or long long, you would always get this output:
n=4294967296
a=0
b=0
c=0
The conversion to unsigned is well defined by the C standard as sited above, and the conversion to signed is implementation defined, which gcc defines as follows:
The result of, or the signal raised by, converting an integer to a signed integer type when the value cannot be represented in an object
of that type (C90 6.2.1.2, C99 and C11 6.3.1.3).
For conversion to a type of width N, the value is reduced modulo 2^N
to be within range of the type; no signal is raised.
Assuming IEEE 754 floating point numbers, the number 4294967295.0 can't be stored exactly in a float. It will be stored as 4294967296.0 instead (which is 232).
Further assuming your unsigned int has 32 value bits, this is just by one too large to fit in an unsigned int, so the result of the conversion is undefined according to the C standard -- 0 is a "reasonable" outcome.
In your second case, you have undefined behavior as well, and I have no theory what's happening here on the representation level. Fact is, the number is much too large for a 32 bit signed int (still assuming this is what your machine uses).
From this remark in your question:
prints 2147483648 which I belive is the correct results.
I assume you wanted to see the representation of your float in memory. Casting will convert the value, so that's not the way to see the representation. The following code would do:
int main(void) {
const float maxFloat = 4294967295.0;
unsigned char *floatBytes = &maxFloat;
for (int i=0; i < sizeof maxFloat; ++i)
{
printf("0x%02x ", floatBytes[i]);
}
puts("");
}
online example

What happens when casting floating point types to unsigned integer types when the value would overflow?

I'm wondering what happens when casting from a floating point type to an unsigned integer type in C when the value can't be accurately represented by the integer type in question. Take for instance
func (void)
{
float a = 1E10;
unsigned b = a;
}
The value of b I get on my system (with unsigned on my system being able to represent values from 0 to 232-1) is 1410065408. This seems sensible to me because it's simply the lowest order bits of the result of the cast.
I believe the behavior of operations such as these is undefined by the standard. Am I wrong? What can I expect in practice if I do things like this?
Also, what happens with signed types? If b is of type int, I get -2147483648, which doesn't really make sense to me.
What happens when casting floating point types to unsigned integer types when the value would overflow (?)
undefined behavior (UB)
In addition #user694733 fine answer, to prevent undefined behavior caused by out of range float to unsigned code can first test the float value.
Yet testing for the range is tricky, for unsigned types and especially for signed types. The detail is that all conversions and constants prior to the integer conversion must be exact. FP math near the limits needs to be exact too.
Examples:
Conversion to a 32-bit unsigned is valid for the range -0.999... to 4294967295.999....
Conversion to a 32-bit 2's complement signed is valid for the range -2147483648.999... to 2147483647.999....
// code uses FP constants that are exact powers-of-2 to insure their exact encoding.
// Form a FP constant that is exactly UINT_MAX + 1
#define FLT_UINT_MAX_P1 ((UINT_MAX/2 + 1)*2.0f)
bool convert_float_to_unsigned(unsigned *u, float f) {
if (f > -1.0f && f < FLT_UINT_MAX_P1) {
*u = (unsigned) f;
return true;
}
return false; // out of range
}
#define FLT_INT_MAX_P1 ((INT_MAX/2 + 1)*2.0f)
bool convert_float_to_int(int *i, float f) {
#if INT_MIN == -INT_MAX
// Rare non 2's complement integer
if (fabsf(f) < FLT_INT_MAX_P1) {
*i = (int) f;
return true;
}
#else
// Do not use f + 1 > INT_MIN as it may incur rounding
// Do not use f > INT_MIN - 1.0f as it may incur rounding
// f - INT_MIN is expected to be exact for values near the limit
if (f - INT_MIN > -1 && f < FLT_INT_MAX_P1) {
*i = (int) f;
return true;
}
#endif
return false; // out of range
}
Pedantic code would take additional steps to cope with the rare FLT_RADIX 10.
FLT_EVAL_METHOD, which allows for float math be calculated at higher precision, may play a role, yet so far I do not see it negatively affecting the above solution.
In both cases value is out of range, so it's undefined behaviour.
6.3.1.4 Real floating and integer
When a finite value of real floating type is converted to an integer type other than _Bool,
the fractional part is discarded (i.e., the value is truncated toward zero). If the value of
the integral part cannot be represented by the integer type, the behavior is undefined. 61)
61) The remaindering operation performed when a value of integer type is converted to unsigned type
need not be performed when a value of real floating type is converted to unsigned type. Thus, the
range of portable real floating values is (−1, Utype_MAX+1).
To make this well defined code, you should check that value is within possible range before doing the conversion.

Problems casting a double into an unsigned char

Why does casting a double 728.3 to an unsigned char produce zero? 728 is 0x2D8, so shouldn't w be 0xD8 (216)?
int w = (unsigned char)728.3;
int x = (int)728.3;
int y = (int)(unsigned char)728.3;
int z = (unsigned char)(int)728.3;
printf( "%i %i %i %i", w, x, y, z );
// prints 0 728 0 216
From the C standard 6.3.1.4p1:
When a finite value of real floating type is converted to an integer type other than _Bool, the fractional part is discarded (i.e., the value is truncated toward zero). If the value of the integral part cannot be represented by the integer type, the behavior is undefined.
So, unless you have >=10 bit unsigned char, your code invokes undefined behaviour.
Note that the cast explicitly tells the compiler you know what you are doing, thus suppresses a warning.
Supposing that unsigned char has 8 value bits, as is nearly (but not completely) certain for your implementation, the behavior of converting the double value 728.3 to type unsigned char is undefined, as specified by paragraph 6.3.1.4/1 of the standard:
When a finite value of real floating type is converted to an integer
type other than _Bool, the fractional part is discarded (i.e., the
value is truncated toward zero). If the value of the integral part
cannot be represented by the integer type, the behavior is undefined.
This applies to both your w and your y. It does not apply to your x, and the rules covering conversions between integer values (i.e. your z) are different.
Basically, then, there is no answer at the C level for why you see the specific results you do, nor for why I see different ones when I run your code. The behavior is undefined; I can be thankful that it did not turn out to be an outpouring of nasal demons.

storing a big float into an integer (cast and no cast)

What does the standard (are there differences in the standards?) say about assigning a float number out of the range of an integer to this integer?
So what should happen here,
assuming 16 bit short, to keep the number small (USHRT_MAX == 65535)
float f = 100000.0f;
short s = f;
s = (short) f;
unsigned short us = f;
us = (unsigned short) f;
This is undefined behaviour (with no diagnostic required). See C11 6.3.1.4 (earlier standards had similar text):
When a finite value of real floating type is converted to an integer type other than _Bool, the fractional part is discarded (i.e., the value is truncated toward zero). If the value of the integral part cannot be represented by the integer type, the behavior is undefined.
So, assuming your system has USHRT_MAX as 65535, short s = f; and all subsequent lines cause undefined behaviour.

Resources