Ignoring why I would want to do this, the 754 IEEE fp standard doesn't define the behavior for the following:
float h = NAN;
printf("%x %d\n", (int)h, (int)h);
Gives: 80000000 -2147483648
Basically, regardless of what value of NAN I give, it outputs 80000000 (hex) or -2147483648 (dec). Is there a reason for this and/or is this correct behavior? If so, how come?
The way I'm giving it different values of NaN are here:
How can I manually set the bit value of a float that equates to NaN?
So basically, are there cases where the payload of the NaN affects the output of the cast?
Thanks!
The result of a cast of a floating point number to an integer is undefined/unspecified for values not in the range of the integer variable (±1 for truncation).
Clause 6.3.1.4:
When a finite value of real floating type is converted to an integer type other than _Bool, the fractional part is discarded (i.e., the value is truncated toward zero). If the value of the integral part cannot be represented by the integer type, the behavior is undefined.
If the implementation defines __STDC_IEC_559__, then for conversions from a floating-point type to an integer type other than _BOOL:
if the floating value is infinite or NaN or if the integral part of the floating value exceeds the range of the integer type, then the "invalid" floating-
point exception is raised and the resulting value is unspecified.
(Annex F [normative], point 4.)
If the implementation doesn't define __STDC_IEC_559__, then all bets are off.
There is a reason for this behavior, but it is not something you should usually rely on.
As you note, IEEE-754 does not specify what happens when you convert a floating-point NaN to an integer, except that it should raise an invalid operation exception, which your compiler probably ignores. The C standard says the behavior is undefined, which means not only do you not know what integer result you will get, you do not know what your program will do at all; the standard allows the program to abort or get crazy results or do anything. You probably executed this program on an Intel processor, and your compiler probably did the conversion using one of the built-in instructions. Intel specifies instruction behavior very carefully, and the behavior for converting a floating-point NaN to a 32-bit integer is to return 0x80000000, regardless of the payload of the NaN, which is what you observed.
Because Intel specifies the instruction behavior, you can rely on it if you know the instruction used. However, since the compiler does not provide such guarantees to you, you cannot rely on this instruction being used.
First, a NAN is everything not considered a float number according to the IEEE standard.
So it can be several things. In the compiler I work with there is NAN and -NAN, so it's not about only one value.
Second, every compiler has its isnan set of functions to test for this case, so the programmer doesn't have to deal with the bits himself. To summarize, I don't think peeking at the value makes any difference. You might peek the value to see its IEEE construction, like sign, mantissa and exponent, but, again, each compiler gives its own functions (or better say, library) to deal with it.
I do have more to say about your testing, however.
float h = NAN;
printf("%x %d\n", (int)h, (int)h);
The casting you did trucates the float for converting it to an int. If you want to get the
integer represented by the float, do the following
printf("%x %d\n", *(int *)&h, *(int *)&h);
That is, you take the address of the float, then refer to it as a pointer to int, and eventually take the int value. This way the bit representation is preserved.
Related
I'm wondering if there are any circumstances where code like this will be incorrect due to floating point inaccuracies:
#include <math.h>
// other code ...
float f = /* random but not NAN or INF */;
int i = (int)floorf(f);
// OR
int i = (int)ceilf(f);
Are there any guarantees about these values? If I have a well-formed f (not NAN or INF) will i always be the integer that it rounds to, whichever way that is.
I can image a situation where (with a bad spec/implementation) the value you get is the value just below the true value rather than just above/equal but is actually closer. Then when you truncate it actually rounds down to the next lower value.
It doesn't seem possible to me given that integers can be exact values in ieee754 floating point but I don't know if float is guaranteed to be that standard
The C standard is sloppy in specifying floating-point behavior, so it is technically not completely specified that floorf(f) produces the correct floor of f or that ceilf(f) produces the correct ceiling of f.
Nonetheless, no C implementations I am aware of get this wrong.
If, instead of floorf(some variable), you have floorf(some expression), there are C implementations that may evaluate the expression in diverse ways that will not get the same result as if IEEE-754 arithmetic were used throughout.
If the C implementation defines __STDC_IEC_559__, it should evaluate the expressions using IEEE-754 arithmetic.
Nonetheless, int i = (int)floorf(f); is of course not guaranteed to set i to the floor of f if the floor of f is out of range of int.
I am currently working on an embedded microcontroller and use a custom printf routine. The toolchain is the GCC Toolchain for the AVR32 architecture.
I have the problem that upon calling vsnprintf or similar for the second time that the CPU enters an exception condition.
From support, I received the answer that:
We could not find any obvious reason for such behavior. However, creating a float overflow condition by writing byte by byte is not safe. We cannot ensure the value generated by this and it is recommended to check using “FLT_MAX”.
Now I am wondering: What are "illegal" float values? Shouldn't all bit combinations represent at least some value? If relevant: sizeof(float) is 4 bytes.
Summary
I suggest you print the bits of floating-point values as if they were a hexadecimal integer, as shown in code below, so that you can analyze those bits to see if they contain the values you are attempting to compute or have been modified improperly due to some bug.
Details
The AVR32CU Technical Reference Manual says “The floating point hardware conforms to the requirements of the C standard, which is based on the IEEE 754 floating point standard.“ The latter clause is false; the C standard is not based on IEEE 754. The C standard does specify bindings to IEEE 754 (via the name IEC 60559) as an optional feature of C implementations. I will presume that the model of AVR32 CPU you are using conforms to IEEE 754 to some degree.
There are no “illegal” values in IEEE 754. There are values that do not represent numbers, and some of those values are intended to cause exceptions. Such a value is called a NaN (for “Not a Number”). There are quiet NaNs and signaling NaNs. Quiet NaNs are intended to pass through operations silently, producing a NaN result. E.g., 3 + NaN should produce NaN. Signaling NaNs are intended to cause exceptions, which may cause changes to program control (such as signals or program aborts).
The technical reference manual cited above also says “Signalling NaN are not provided, all NaN are non-signalling (quiet).”
A good vsnprintf routine should accept quiet NaN values for printing and should format them by producing a string such as “NaN”. When a signaling NaN is passed for formatting, I suppose it might be reasonable either to format it or to produce an exception.
I expect the message you received from support is suggesting that your software created some kind of NaN, and that vsnprintf cannot handle these. From the phrasing, I think their response is speculative.
If you are creating floating-point values by assembling bytes, then you may have created a NaN when you did not intend to, if there was some error in your software. I suggest that you debug this by using vsnprintf to print the bytes of the floating-point value instead of printing it with a floating-point format specifier.
If the GCC version you are using has the usual features of GCC, and the unsigned int in your implementation is 32 bits, you can format the bits of a 32-bit float value x as a hexadecimal value using:
vsnprintf(Buffer, BufferLength, "0x%x",
(union { float f; unsigned int u; }) {x} .u);
The second line uses a compound literal to put the value x into a union and reinterpret its bytes as an unsigned int. (This is a supported way in C to reinterpret the bytes of an object. Many people use pointer aliasing, which works in GCC if the appropriate flag is used, but it is not generally supported by the C standard. Another supported method is to copy the bytes, as with unsigned int u; memcpy(&u, &x, sizeof u);.)
Once you see what the bits in the float are, you can interpret them manually from information in the IEEE 754 standard or using an online analyzer. (Select the “hexadecimal” button to input hexadecimal values to be interpreted.)
In an IEEE-754 32-bit binary floating-point object, the value is a NaN if:
Bits 31 has any value. (It is the sign bit, irrelevant for recognizing a NaN.)
Bits 30 to 23 are all ones.
Bits 22 to 0 are not all zeros.
(If Bits 30 to 23 are all ones but bits 22 to 0 are all zeroes, the value is an infinity. This is not illegal but might also cause a low-quality vsnprintf to generate an exception.)
i didnt work on AVR32, but 'illegal' float values, generally float (single-precision arithmetic) is important topic in numeric methods. Maximal number for float is:
FLT_MAX = 3.40282e+38
but float have also limit for floating values. The closer to zero you are, more floating digits you can specify.
for example:
the minimal value between [1,2] is 1.19209e-07 ( it is 2^-23) also known as macheps (machine epsilon ( FLT_EPSILON from float.h ))
the minimal value between [2,4] is 2 * 1.19209e-07 = 2 * 2^-23
it also works for the oder side:
the minimal value between [1/2,1] is 2^-24.
why this is happening?
Let define number as beforedot.afterdot.
the larger number is, more bits is required to write beforedot number, and simetric for less numbers.
In conclusion:
Min for float: 1.0842e-19,
Max for float: 3.40282e+38.
Are these lines the same?
float a = 2.0f;
and
float a = 2.000000f;
Yes, it is. No matter what representation you use, when the code is compiled, the number will be converted to a unique binary representation. There's only one way of representing 2 in the IEEE 754 binary32 standard used in modern computers to represent float numbers.
The only thing the C99 standard has to say on the matter is this (section 6.4.4.2):
For decimal floating constants ... the result is either
the nearest representable value, or the larger or smaller representable value immediately
adjacent to the nearest representable value, chosen in an implementation-defined manner.
That bit about "implementation-defined" means that technically an implementation could choose to do something different in each case. Although in practice, nothing weird is going to happen for a value like 2.
It's important to bear in mind that the C standards don't require IEEE-754.
Yes, they are the same.
Simple check:
http://codepad.org/FOQsufB4
int main() {
printf("%d",2.0f == 2.000000f);
}
^ Will output 1 (true)
Yes Sure it is the same extra zeros on the right are ignored just likes zeros on the left
When a double has an 'exact' integer value, like so:
double x = 1.0;
double y = 123123;
double z = -4.000000;
Is it guaranteed that it will round properly to 1, 123123, and -4 when cast to an integer type via (int)x, (int)y, (int)z? (And not truncate to 0, 123122 or -5 b/c of floating point weirdness). I ask b/c according to this page (which is about fp's in lua, a language that only has doubles as its numeric type by default), talks about how integer operations with doubles are exact according to IEEE 754, but I'm not sure if, when calling C-functions with integer type parameters, I need to worry about rounding doubles manually, or it is taken care of when the doubles have exact integer values.
Yes, if the integer value fits in an int.
A double could represent integer values that are out of range for your int type. For example, 123123.0 cannot be converted to an int if your int type has only 16 bits.
It's also not guaranteed that a double can represent every value a particular type can represent. IEEE 754 uses something like 52 or 53 bits for the mantissa. If your long has 64 bits, then converting a very large long to double and back might not give the same value.
As Daniel Fischer stated, if the value of the integer part of the double (in your case, the double exactly) is representable in the type you are converting to, the result is exact. If the value is out of range of the destination type, the behavior is undefined. “Undefined” means the standard allows any behavior: You might get the closest representable number, you might get zero, you might get an exception, or the computer might explode. (Note: While the C standard permits your computer to explode, or even to destroy the universe, it is likely the manufacturer’s specifications impose a stricter limit on the behavior.)
It will do it correctly, if it really is true integer, which you might be assured of in some contexts. But if the value is the result of previous floating point calculations, you could not easily know that.
Why not explicitly calculate the value with the floor() function, as in long value = floor(x + 0.5). Or, even better, use the modf() function to inspect for an integer value.
Yes it will hold the exact value you give it because you input it in code. Sometimes in calculations it would yield 0.99999999999 for example but that is due to the error in calculating with doubles not its storing capacity
I have the following C program:
#include <stdio.h>
int main()
{
double x=0;
double y=0/x;
if (y==1)
printf("y=1\n");
else
printf("y=%f\n",y);
if (y!=1)
printf("y!=1\n");
else
printf("y=%f\n",y);
return 0;
}
The output I get is
y=nan
y!=1
But when I change the line
double x=0;
to
int x=0;
the output becomes
Floating point exception
Can anyone explain why?
You're causing the division 0/0 with integer arithmetic (which is invalid, and produces the exception you see). Regardless of the type of y, what's evaluated first is 0/x.
When x is declared to be a double, the zero is converted to a double as well, and the operation is performed using floating-point arithmetic.
When x is declared to be an int, you are dividing one int 0 by another, and the result is not valid.
Because due to IEEE 754, NaN will be produced when conducting an illegal operation on floating point numbers (e.g. 0/0, ∞×0, or sqrt(−1)).
There are actually two kinds of NaNs, signaling and quiet. Using a
signaling NaN in any arithmetic operation (including numerical
comparisons) will cause an "invalid" exception. Using a quiet NaN
merely causes the result to be NaN too.
The representation of NaNs specified by the standard has some
unspecified bits that could be used to encode the type of error; but
there is no standard for that encoding. In theory, signaling NaNs
could be used by a runtime system to extend the floating-point numbers
with other special values, without slowing down the computations with
ordinary values. Such extensions do not seem to be common, though.
Also, Wikipedia says this about integer division by zero:
Integer division by zero is usually handled differently from floating
point since there is no integer representation for the result. Some
processors generate an exception when an attempt is made to divide an
integer by zero, although others will simply continue and generate an
incorrect result for the division. The result depends on how division
is implemented, and can either be zero, or sometimes the largest
possible integer.
There's a special bit-pattern in IEE754 which indicates NaN as the result of floating point division by zero errors.
However there's no such representation when using integer arithmetic, so the system has to throw an exception instead of returning NaN.
Check the min and max values of an integer data type. You will see that an undefined or nan result is not in it's range.
And read this what every computer scientist should know about floating point.
Integer division by 0 is illegal and is not handled. Float values on the other hand are handled in C using NaN. The following how ever would work.
int x=0;
double y = 0.0 / x;
If you divide int to int you can divide by 0.
0/0 in doubles is NaN.
int x=0;
double y=0/x; //0/0 as ints **after that** casted to double. You can use
double z=0.0/x; //or
double t=0/(double)x; // to avoid exception and get NaN
Floating point is inherently modeling the reals to limited precision. There are only a finite number of bit-patterns, but an infinite (continuous!) number of reals. It does its best of course, returning the closest representable real to the exact inputs it is given. Answers that are too small to be directly represented are instead represented by zero. Dividing by zero is an error in the real numbers. In floating point, however, because zero can arise from these very small answers, it can be useful to consider x/0.0 (for positive x) to be "positive infinity" or "too big to be represented". This is no longer useful for x = 0.0.
The best we could say is that dividing zero by zero is really "dividing something small that can't be told apart from zero by something small that can't be told apart from zero". What the answer to this? Well, there is no answer for the exact case of 0/0, and there is no good way of treating it inexactly. It would depend on the relative magnitudes, and so the processor basically shrugs and says "I lost all precision -- any result I gave you would be misleading", by returning Not a Number.
In contrast, when doing an integer divide by zero, the divisor really can only mean precisely zero. There's no possible way to give a consistent meaning to it, so when your code asks for the answer, it really is doing something illegitimate.
(It's an integer division in the second case, but not the first because of the promotion rules of C. 0 can be taken as an integer literal, and as both sides are integers, the division is integer division. In the first case, the fact that x is a double causes the dividend to be promoted to double. If you replace the 0 by 0.0, it will be a floating-point division, no matter the type of x.)