C - ceil/float rounding to int guarantees - c

I'm wondering if there are any circumstances where code like this will be incorrect due to floating point inaccuracies:
#include <math.h>
// other code ...
float f = /* random but not NAN or INF */;
int i = (int)floorf(f);
// OR
int i = (int)ceilf(f);
Are there any guarantees about these values? If I have a well-formed f (not NAN or INF) will i always be the integer that it rounds to, whichever way that is.
I can image a situation where (with a bad spec/implementation) the value you get is the value just below the true value rather than just above/equal but is actually closer. Then when you truncate it actually rounds down to the next lower value.
It doesn't seem possible to me given that integers can be exact values in ieee754 floating point but I don't know if float is guaranteed to be that standard

The C standard is sloppy in specifying floating-point behavior, so it is technically not completely specified that floorf(f) produces the correct floor of f or that ceilf(f) produces the correct ceiling of f.
Nonetheless, no C implementations I am aware of get this wrong.
If, instead of floorf(some variable), you have floorf(some expression), there are C implementations that may evaluate the expression in diverse ways that will not get the same result as if IEEE-754 arithmetic were used throughout.
If the C implementation defines __STDC_IEC_559__, it should evaluate the expressions using IEEE-754 arithmetic.
Nonetheless, int i = (int)floorf(f); is of course not guaranteed to set i to the floor of f if the floor of f is out of range of int.

Related

Facing 'invalid operands to binary expression ('float' and 'float')' while using C

While doing CS50 problem set 1 - Cash, I faced the following problem when I try to write my code. I have declared the variables to integer. Why is it still happening? Thanks a lot for the help.
"invalid operands to binary expression ('float' and 'float')"
#include <stdio.h>
#include <cs50.h>
#include <math.h>
int main(){
float owe_in_dollars;
float owe_in_cent;
int coin_count = 0;
do
{
owe_in_dollars = get_float("Change: ");
}while(owe_in_dollars<0);
owe_in_cent = (int)(owe_in_dollars*100);
if (owe_in_cent%(int)25 > 0){
coin_count++;
}
printf("%i", coin_count);
}
There are several issues with this code, but I think the particular problem which produces the compiler error is
if (owe_in_cent%(int)25 > 0){
owe_in_cent is a float. There is no reason for it to be floating point, since you have assigned it to an integer value. But you declared it float, so that's what it is. 25 is an int, so there's no point in casting it to an int, but with or without the cast, it will be converted to a float in order to do arithmetic with owe_in_cent, because all arithmetic operators require that there operands be of the same type. Search for "usual arithmetic conversions" for details, but the bottom line is that these automatic conversions are always integer → floating point, never floating point → integer.
Then the problem shows up, because the % operator requires its operands to be integers, not floating point. There is a math function which can compute a floating point modulus, but you really want integer arithmetic so your best bet is to make owe_in_cent an int rather than a float.
And actually, you really should get into the habit of using double for floating point values. float is very imprecise and, other than in video chips and embedded processors, there's no point in using so inexact a representation. It saves you nothing.
Finally, remember two important facts about floating point:
It cannot precisely represent fractions whose denominators are not powers of two. In other words, 5.25 has an exact representation, because .25 is one-quarter, which is a power of two, but 5.26 cannot be exactly represented and will end up being a number either slightly greater than or slightly less than 5.26. when you mulitiply that number by 100, you will end up with something which is slightly more or slightly less than 526.
Casting a floating point number to an integer just drops the fractional part, no matter how close to 1.0 it is. So, for example, (int)525.9997 is 525, not 526. You should be able to see the problem that could produce.
There is a library function called round which rounds a floating point number to the closest integer, which is probably what you wanted.

Is operator ≤ UB for floating point comparison? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
There are numerous reference on the subject (here or here). However I still fails to understand why the following is not considered UB and properly reported by my favorite compiler (insert clang and/or gcc) with a neat warning:
// f1, f2 and epsilon are defined as double
if ( f1 / f2 <= epsilon )
As per C99:TC3, 5.2.4.2.2 §8: we have:
Except for assignment and cast (which remove all extra range and
precision), the values of operations with floating operands and values
subject to the usual arithmetic conversions and of floating constants
are evaluated to a format whose range and precision may be greater
than required by the type. [...]
Using typical compilation f1 / f2 would be read directly from the FPU. I've tried here using gcc -m32, with gcc 5.2. So f1 / f2 is (over-here) on an 80 bits (just a guess dont have the exact spec here) floating point register. There is not type promotion here (per standard).
I've also tested clang 3.5, this compiler seems to cast the result of f1 / f2 back to a normal 64 bits floating point representation (this is an implementation defined behavior but for my question I prefer the default gcc behavior).
As per my understanding the comparison will be done in between a type for which we don't know the size (ie. format whose range and precision may be greater) and epsilon which size is exactly 64 bits.
What I really find hard to understand is equality comparison with a well known C types (eg. 64bits double) and something whose range and precision may be greater. I would have assumed that somewhere in the standard some kind of promotion would be required (eg. standard would mandates that epsilon would be promoted to a wider floating point type).
So the only legitimate syntaxes should instead be:
if ( (double)(f1 / f2) <= epsilon )
or
double res = f1 / f2;
if ( res <= epsilon )
As a side note, I would have expected the litterature to document only the operator <, in my case:
if ( f1 / f2 < epsilon )
Since it is always possible to compare floating point with different size using operator <.
So in which cases the first expression would make sense ? In other word, how could the standard defines some kind of equality operator in between two floating point representation with different size ?
EDIT: The whole confusion here, was that I assumed it was possible to compare two float of different size. Which cannot possibly happen. (thanks #DevSolar!).
<= is well-defined for all possible floating point values.
There is one exception though: the case when at least one of the arguments is uninitialised. But that's more to do with reading an uninitialised variable being UB; not the <= itself
I think you're confusing implementation-defined with undefined behavior. The C language doesn't mandate IEEE 754, so all floating point operations are essentially implementation-defined. But this is different from undefined behavior.
After a bit of chat, it became clear where the miscommunication came from.
The quoted part of the standard explicitly allows an implementation to use wider formats for floating operands in calculations. This includes, but is not limited to, using the long double format for double operands.
The standard section in question also does not call this "type promotion". It merely refers to a format being used.
So, f1 / f2 may be done in some arbitrary internal format, but without making the result any other type than double.
So when the result is compared (by either <= or the problematic ==) to epsilon, there is no promotion of epsilon (because the result of the division never got a different type), but by the same rule that allowed f1 / f2 to happen in some wider format, epsilon is allowed to be evaluated in that format as well. It is up to the implementation to do the right thing here.
The value of FLT_EVAL_METHOD might tell what exactly an implementation is doing exactly (if set to 0, 1, or 2 respectively), or it might have a negative value, which indicates "indeterminate" (-1) or "implementation-defined", which means "look it up in your compiler manual".
This gives an implementation "wiggle room" to do any kind of funny things with floating operands, as long as at least the range / precision of the actual type is preserved. (Some older FPUs had "wobbly" precisions, depending on the kind of floating operation performed. The quoted part of the standard caters for exactly that.)
In no case may any of this lead to undefined behaviour. Implementation-defined, yes. Undefined, no.
The only case where you would get undefined behavior is when a large floating point variable gets demoted to a smaller one which cannot represent the contents. I don't quite see how that applies in this case.
The text you quote is concerned about whether or not floats may be evaluated as doubles etc, as indicated by the text you unfortunately didn't include in the quote:
The use of evaluation formats is characterized by the
implementation-defined value of FLT_EVAL_METHOD:
-1 indeterminable;
0 evaluate all operations and constants just to the range and precision of the type;
1 evaluate operations and constants of type float and double to the range and precision of the double type, evaluate long double operations and constants to the range and precision of the long double type;
2 evaluate all operations and constants to the range and precision of the long double type.
However, I don't believe this macro overwrites the behavior of the usual arithmetic conversions. The usual arithmetic conversions guarantee that you can never compare two float variables of different size. So I don't see how you could run into undefined behavior here. The only possible issue you would have is performance.
In theory, in case FLT_EVAL_METHOD == 2 then your operands could indeed get evaluated as type long double. But please note that if the compiler allows such implicit promotions to larger types, there will be a reason for it.
According to the text you cited, explicit casting will counter this compiler behavior.
In which case the code if ( (double)(f1 / f2) <= epsilon ) is nonsense. By the time you cast the result of f1 / f2 to double, the calculation is already done and have been carried out on long double. The calculation of the result <= epsilon will however be carried out on double since you forced this with the cast.
To avoid long double entirely, you would have to write the code as:
if ( (double)((double)f1 / (double)f2) <= epsilon )
or to increase readability, preferably:
double div = (double)f1 / (double)f2;
if( (double)div <= (double)epsilon )
But again, code like this does only make sense if you know that there will be implicit promotions, which you wish to avoid to increase performance. In practice, I doubt you'll ever run into that situation, as the compiler is most likely far more capable than the programmer to make such decisions.

In C, is specifying 2.0f the same as 2.000000f?

Are these lines the same?
float a = 2.0f;
and
float a = 2.000000f;
Yes, it is. No matter what representation you use, when the code is compiled, the number will be converted to a unique binary representation. There's only one way of representing 2 in the IEEE 754 binary32 standard used in modern computers to represent float numbers.
The only thing the C99 standard has to say on the matter is this (section 6.4.4.2):
For decimal floating constants ... the result is either
the nearest representable value, or the larger or smaller representable value immediately
adjacent to the nearest representable value, chosen in an implementation-defined manner.
That bit about "implementation-defined" means that technically an implementation could choose to do something different in each case. Although in practice, nothing weird is going to happen for a value like 2.
It's important to bear in mind that the C standards don't require IEEE-754.
Yes, they are the same.
Simple check:
http://codepad.org/FOQsufB4
int main() {
printf("%d",2.0f == 2.000000f);
}
^ Will output 1 (true)
Yes Sure it is the same extra zeros on the right are ignored just likes zeros on the left

Problems casting NAN floats to int

Ignoring why I would want to do this, the 754 IEEE fp standard doesn't define the behavior for the following:
float h = NAN;
printf("%x %d\n", (int)h, (int)h);
Gives: 80000000 -2147483648
Basically, regardless of what value of NAN I give, it outputs 80000000 (hex) or -2147483648 (dec). Is there a reason for this and/or is this correct behavior? If so, how come?
The way I'm giving it different values of NaN are here:
How can I manually set the bit value of a float that equates to NaN?
So basically, are there cases where the payload of the NaN affects the output of the cast?
Thanks!
The result of a cast of a floating point number to an integer is undefined/unspecified for values not in the range of the integer variable (±1 for truncation).
Clause 6.3.1.4:
When a finite value of real floating type is converted to an integer type other than _Bool, the fractional part is discarded (i.e., the value is truncated toward zero). If the value of the integral part cannot be represented by the integer type, the behavior is undefined.
If the implementation defines __STDC_IEC_559__, then for conversions from a floating-point type to an integer type other than _BOOL:
if the floating value is infinite or NaN or if the integral part of the floating value exceeds the range of the integer type, then the "invalid" floating-
point exception is raised and the resulting value is unspecified.
(Annex F [normative], point 4.)
If the implementation doesn't define __STDC_IEC_559__, then all bets are off.
There is a reason for this behavior, but it is not something you should usually rely on.
As you note, IEEE-754 does not specify what happens when you convert a floating-point NaN to an integer, except that it should raise an invalid operation exception, which your compiler probably ignores. The C standard says the behavior is undefined, which means not only do you not know what integer result you will get, you do not know what your program will do at all; the standard allows the program to abort or get crazy results or do anything. You probably executed this program on an Intel processor, and your compiler probably did the conversion using one of the built-in instructions. Intel specifies instruction behavior very carefully, and the behavior for converting a floating-point NaN to a 32-bit integer is to return 0x80000000, regardless of the payload of the NaN, which is what you observed.
Because Intel specifies the instruction behavior, you can rely on it if you know the instruction used. However, since the compiler does not provide such guarantees to you, you cannot rely on this instruction being used.
First, a NAN is everything not considered a float number according to the IEEE standard.
So it can be several things. In the compiler I work with there is NAN and -NAN, so it's not about only one value.
Second, every compiler has its isnan set of functions to test for this case, so the programmer doesn't have to deal with the bits himself. To summarize, I don't think peeking at the value makes any difference. You might peek the value to see its IEEE construction, like sign, mantissa and exponent, but, again, each compiler gives its own functions (or better say, library) to deal with it.
I do have more to say about your testing, however.
float h = NAN;
printf("%x %d\n", (int)h, (int)h);
The casting you did trucates the float for converting it to an int. If you want to get the
integer represented by the float, do the following
printf("%x %d\n", *(int *)&h, *(int *)&h);
That is, you take the address of the float, then refer to it as a pointer to int, and eventually take the int value. This way the bit representation is preserved.

Simple question about 'floating point exception' in C

I have the following C program:
#include <stdio.h>
int main()
{
double x=0;
double y=0/x;
if (y==1)
printf("y=1\n");
else
printf("y=%f\n",y);
if (y!=1)
printf("y!=1\n");
else
printf("y=%f\n",y);
return 0;
}
The output I get is
y=nan
y!=1
But when I change the line
double x=0;
to
int x=0;
the output becomes
Floating point exception
Can anyone explain why?
You're causing the division 0/0 with integer arithmetic (which is invalid, and produces the exception you see). Regardless of the type of y, what's evaluated first is 0/x.
When x is declared to be a double, the zero is converted to a double as well, and the operation is performed using floating-point arithmetic.
When x is declared to be an int, you are dividing one int 0 by another, and the result is not valid.
Because due to IEEE 754, NaN will be produced when conducting an illegal operation on floating point numbers (e.g. 0/0, ∞×0, or sqrt(−1)).
There are actually two kinds of NaNs, signaling and quiet. Using a
signaling NaN in any arithmetic operation (including numerical
comparisons) will cause an "invalid" exception. Using a quiet NaN
merely causes the result to be NaN too.
The representation of NaNs specified by the standard has some
unspecified bits that could be used to encode the type of error; but
there is no standard for that encoding. In theory, signaling NaNs
could be used by a runtime system to extend the floating-point numbers
with other special values, without slowing down the computations with
ordinary values. Such extensions do not seem to be common, though.
Also, Wikipedia says this about integer division by zero:
Integer division by zero is usually handled differently from floating
point since there is no integer representation for the result. Some
processors generate an exception when an attempt is made to divide an
integer by zero, although others will simply continue and generate an
incorrect result for the division. The result depends on how division
is implemented, and can either be zero, or sometimes the largest
possible integer.
There's a special bit-pattern in IEE754 which indicates NaN as the result of floating point division by zero errors.
However there's no such representation when using integer arithmetic, so the system has to throw an exception instead of returning NaN.
Check the min and max values of an integer data type. You will see that an undefined or nan result is not in it's range.
And read this what every computer scientist should know about floating point.
Integer division by 0 is illegal and is not handled. Float values on the other hand are handled in C using NaN. The following how ever would work.
int x=0;
double y = 0.0 / x;
If you divide int to int you can divide by 0.
0/0 in doubles is NaN.
int x=0;
double y=0/x; //0/0 as ints **after that** casted to double. You can use
double z=0.0/x; //or
double t=0/(double)x; // to avoid exception and get NaN
Floating point is inherently modeling the reals to limited precision. There are only a finite number of bit-patterns, but an infinite (continuous!) number of reals. It does its best of course, returning the closest representable real to the exact inputs it is given. Answers that are too small to be directly represented are instead represented by zero. Dividing by zero is an error in the real numbers. In floating point, however, because zero can arise from these very small answers, it can be useful to consider x/0.0 (for positive x) to be "positive infinity" or "too big to be represented". This is no longer useful for x = 0.0.
The best we could say is that dividing zero by zero is really "dividing something small that can't be told apart from zero by something small that can't be told apart from zero". What the answer to this? Well, there is no answer for the exact case of 0/0, and there is no good way of treating it inexactly. It would depend on the relative magnitudes, and so the processor basically shrugs and says "I lost all precision -- any result I gave you would be misleading", by returning Not a Number.
In contrast, when doing an integer divide by zero, the divisor really can only mean precisely zero. There's no possible way to give a consistent meaning to it, so when your code asks for the answer, it really is doing something illegitimate.
(It's an integer division in the second case, but not the first because of the promotion rules of C. 0 can be taken as an integer literal, and as both sides are integers, the division is integer division. In the first case, the fact that x is a double causes the dividend to be promoted to double. If you replace the 0 by 0.0, it will be a floating-point division, no matter the type of x.)

Resources