Comparing Floating Point Numbers [duplicate] - c

This question already has answers here:
What is the most effective way for float and double comparison?
(34 answers)
Closed 6 years ago.
Please before you think that I'm asking the same N% Question read it first and please pay Attention to it.
I'm working on a project where I have more functions which returns double and it may be possible that some of them are the same which is a good thing in my project and if is true then I need a double comparison to see if are equal.
I know that doing an equality comparison if( x == y ) is not a smart thing and we don't need to speak why, but we can check for < or > which is the part of this Question.
Does the language (standard) guarantee this, that the comparison < and > are 100%?
If yes then, the following program can be used:
#include <stdio.h>
int main(void){
double x = 3.14;
double y = 3.14;
if( x < y || x > y){
/* Are not Equal */
}else{
/* Are Equal, foo() will be called here\n" */
foo(x, y);
}
}
Does foo(x, y); get executed? BecauseX and Y should be equal here.
EDIT:
This question doesn't seek a way to compare two double, it is only the fact that should I use, or should I don't use < > instead of ==

I know that doing an equality comparison if( x == y ) is not a smart thing
This is simply not true. It may be the right thing to do or the wrong thing to do, depending on the particular problem.
if (x < y || x > y)
This has guaranteed exactly the same effect1 as
if (x != y)
and the opposite effect of
if (x == y)
When one is wrong, the other is wrong too. When one is right, the other is right as well. Writing an equality condition with < and > symbols instead of == or != doesn't suddenly make it smarter.
[1] Except maybe when one of the operands is a NaN.

some of them are the same ... and if is true then I need a double comparison to see if are equal.
OP is questioning two different ways for testing FP equality and wondering if they are the functionally alike.
Aside from maybe NaN, which is not well defined by C, (but well defined by IEEE 754), both comparisons are alike yet fail conically equivalence testing.
Consider this double code:
if (a==b) {
double ai = 1.0/a;
double bi = 1.0/b;
printf("%d\n", ai == bi);
} else {
printf("%d\n", 1);
}
Is the result always "1"? Below is an exception (mouse over to see)
Consider a=0.0; b=-0.0. Both are equal to each other, but their inverses typically are not the same. One being positive infinity, the other: negative infinity.
The question comes down to how equal do you need? Are NaN important? Using memcmp(&a, &b, sizeof a) is certainly a strong test and maybe too strong for on select systems FP numbers can have the same non-zero value, yet different encodings. If these differences are important, or maybe just the exceptional case above, is for OP to decide.
For testing if 2 different codes/functions were producing the same binary64 result, consider rating their Unit-in-the-Last-Place difference. Something like the following: compare unsigned long long ULP_diff() against 0, 1 or 2. depending on your error tolerance.
// not highly portable
#include <assert.h>
unsigned long long ULP(double x) {
union {
double d;
unsigned long long ull;
} u;
assert(sizeof(double) == sizeof(unsigned long long));
u.d = x;
if (u.ull & 0x8000000000000000) {
u.ull ^= 0x8000000000000000;
return 0x8000000000000000 - u.ull;
}
return u.ull + 0x8000000000000000;
}
unsigned long long ULP_diff(double x, double y) {
unsigned long ullx = ULP(x);
unsigned long ully = ULP(y);
if (x > y) return ullx - ully;
return ully - ullx;
}

If you want fractional number equality, you either have to use an epsilon comparison (ie. check if the numbers are close enough to one another within a specific threshold), or use some fixed-point arithmetic very carefully to avoid rounding errors.
And yes, this same question has been asked more times than necessary:
Most effective way for float and double comparison
Floating point equality and tolerances
You need to do more reading into how comparisons work, and specifically why floating point equality doesn't work. It's not an issue with the equals operator itself, as you appear to think (For arithmetic types [when no special values like NaN are involved], !(x > y || y > x) will always be the same as x == y. In fact, most compilers will optimize x < y || x > y to x != y), but rather because rounding error is a basic part of floating point operation in the first place. x == y does indeed work for floating point types, and you can do it freely. It becomes an issue after you do any arithmetic operation and then want to compare them, because it's unpredictable what the rounding error will do.
So essentially, yes. Compare equality all you want unless you are actually doing anything with the doubles. If you are just using them as an index or something of the like, there shouldn't be a problem, as long as you know you are assigning them the same value. Using boolean identities won't save you from the basic functionality of floating-point numbers.

First of all your conditional is a little off. To check for non equality you want
( x < y || y < x)
not
(x < y || y > x )
which just checks the same thing twice, meaning x < y comes back as false.
Ignoring that small issue:
Yes > and < should be 100% in that it is almost always the same as the ==. The only difference is different behavior with Nan. But it doesn't fix your problem.
Here is a really contrived example.
#include <stdio.h>
void foo(double x, double y){
printf( "OK a:%f, b:%f\n",x,y);
}
void bar(double x, double y){
printf( "BAD a:%f, b:%f\n",x,y);
}
int main(void){
double x = 3.14;
double y = 3.14;
if( x < y || y < x){
/* Are not Equal */
bar(x, y);
}else{
/* Are Equal, foo() will be called here\n" */
foo(x, y);
}
for( int i = 0; i < 1000; i++) {
y = y + 0.1;
}
x = x + 100;
if( x < y || y < x){
bar(x, y);
}else{
/* Are Equal, foo() will be called here\n" */
foo(x, y);
}
}
Here is you output (hint its BAD)
$ ./a.exe
OK a:3.140000, b:3.140000
BAD a:103.140000, b:103.140000
Best practice I know for double equality is to check there closeness within some epsilon,
eps = 0.00000000001
if( abs( x - y ) < eps ) {
printf("EQUAL!");
}

#include <stdio.h>
int main(void){
double x = 3.14;
double y = 3.14;
if( x < y || x > y){
/* Are not Equal */
}else{
/* Are Equal, foo() will be called here\n" */
printf("yes");
}
}
prints yes

Related

Alternative to ceil() and floor() to get the closest integer values, above and below of a floating point value?

I´m looking for an alternative for the ceil() and floor() functions in C, due to I am not allowed to use these in a project.
What I have build so far is a tricky back and forth way by the use of the cast operator and with that the conversion from a floating-point value (in my case a double) into an int and later as I need the closest integers, above and below the given floating-point value, to be also double values, back to double:
#include <stdio.h>
int main(void) {
double original = 124.576;
double floorint;
double ceilint;
int f;
int c;
f = (int)original; //Truncation to closest floor integer value
c = f + 1;
floorint = (double)f;
ceilint = (double)c;
printf("Original Value: %lf, Floor Int: %lf , Ceil Int: %lf", original, floorint, ceilint);
}
Output:
Original Value: 124.576000, Floor Int: 124.000000 , Ceil Int: 125.000000
For this example normally I would not need the ceil and floor integer values of c and f to be converted back to double but I need them in double in my real program. Consider that as a requirement for the task.
Although the output is giving the desired values and seems right so far, I´m still in concern if this method is really that right and appropriate or, to say it more clearly, if this method does bring any bad behavior or issue into the program or gives me a performance-loss in comparison to other alternatives, if there are any other possible alternatives.
Do you know a better alternative? And if so, why this one should be better?
Thank you very much.
Do you know a better alternative? And if so, why this one should be better?
OP'code fails:
original is already a whole number.
original is a negative like -1.5. Truncation is not floor there.
original is just outside int range.
original is not-a-number.
Alternative construction
double my_ceil(double x)
Using the cast to some integer type trick is a problem when x is outsize the integer range. So check first if x is inside range of a wide enough integer (one whose precision exceeds double). x values outside that are already whole numbers. Recommend to go for the widest integer (u)intmax_t.
Remember that a cast to an integer is a round toward 0 and not a floor. Different handling needed if x is negative/positive when code is ceil() or floor(). OP's code missed this.
I'd avoid if (x >= INTMAX_MAX) { as that involves (double) INTMAX_MAX whose rounding and then precise value is "chosen in an implementation-defined manner". Instead, I'd compare against INTMAX_MAX_P1. some_integer_MAX is a Mersenne Number and with 2's complement, ...MIN is a negated "power of 2".
#include <inttypes.h>
#define INTMAX_MAX_P1 ((INTMAX_MAX/2 + 1)*2.0)
double my_ceil(double x) {
if (x >= INTMAX_MAX_P1) {
return x;
}
if (x < INTMAX_MIN) {
return x;
}
intmax_t i = (intmax_t) x; // this rounds towards 0
if (i < 0 || x == i) return i; // negative x is already rounded up.
return i + 1.0;
}
As x may be a not-a-number, it is more useful to reverse the compare as relational compare of a NaN is false.
double my_ceil(double x) {
if (x >= INTMAX_MIN && x < INTMAX_MAX_P1) {
intmax_t i = (intmax_t) x; // this rounds towards 0
if (i < 0 || x == i) return i; // negative x is already rounded up.
return i + 1.0;
}
return x;
}
double my_floor(double x) {
if (x >= INTMAX_MIN && x < INTMAX_MAX_P1) {
intmax_t i = (intmax_t) x; // this rounds towards 0
if (i > 0 || x == i) return i; // positive x is already rounded down.
return i - 1.0;
}
return x;
}
You're missing an important step: you need to check if the number is already integral, so for ceil assuming non-negative numbers (generalisation is trivial), use something like
double ceil(double f){
if (f >= LLONG_MAX){
// f will be integral unless you have a really funky platform
return f;
} else {
long long i = f;
return 0.0 + i + (f != i); // to obviate potential long long overflow
}
}
Another missing piece in the puzzle, which is covered off by my enclosing if, is to check if f is within the bounds of a long long. On common platforms if f was outside the bounds of a long long then it would be integral anyway.
Note that floor is trivial due to the fact that truncation to long long is always towards zero.

Why does this code fail for these weird numbers?

I wrote a function to find the cube root of a number a using the Newton-Raphson method to find the root of the function f(x) = x^3 - a.
#include <stdio.h>
#include <math.h>
double cube_root(double a)
{
double x = a;
double y;
int equality = 0;
if(x == 0)
{
return(x);
}
else
{
while(equality == 0)
{
y = (2 * x * x * x + a) / (3 * x * x);
if(y == x)
{
equality = 1;
}
x = y;
}
return(x);
}
}
f(x) for a = 20 (blue) and a = -20 (red) http://graphsketch.com/?eqn1_color=1&eqn1_eqn=x*x*x%20-%2020&eqn2_color=2&eqn2_eqn=x*x*x%20%2B%2020&eqn3_color=3&eqn3_eqn=&eqn4_color=4&eqn4_eqn=&eqn5_color=5&eqn5_eqn=&eqn6_color=6&eqn6_eqn=&x_min=-8&x_max=8&y_min=-75&y_max=75&x_tick=1&y_tick=1&x_label_freq=5&y_label_freq=5&do_grid=0&bold_labeled_lines=0&line_width=4&image_w=850&image_h=525
The code seemed to be working well, for example it calculates the cube root of 338947578237847893823789474.324623784 just fine, but weirdly fails for some numbers for example 4783748237482394? The code just seems to go into an infinite loop and must be manually terminated.
Can anyone explain why the code should fail on this number? I've included the graph to show that, using the starting value of a, this method should always keep providing closer and closer estimates until the two values are equal to working precision. So I don't really get what's special about this number.
Apart from posting an incorrect formula...
You are performing floating point arithmetic, and floating point arithmetic has rounding errors. Even with the rounding errors, you will get very very close to a cube root, but you won't get exactly there (usually cube roots are irrational, and floating point numbers are rational).
Once your x is very close to the cube root, when you calculate y, you should get the same result as x, but because of rounding errors, you may get something very close to x but slightly different instead. So x != y. Then you do the same calculation starting with y, and you may get x as the result. So your result will forever switch between two values.
You can do the same thing with three numbers x, y and z and quit when either z == y or z == x. This is much more likely to stop, and with a bit of mathematics you might even be able to proof that it will always stop.
Better to calculate the change in x, and determine whether that change is small enough so that the next step will not change x except for rounding errors.
shouldn't it be:
y = x - (2 * x * x * x + a) / (3 * x * x);
?

What operations and functions on +0.0 and -0.0 give different arithmetic results?

In C, when ±0.0 is supported, -0.0 or +0.0 assigned to a double typically makes no arithmetic difference. Although they have different bit patterns, they arithmetically compare as equal.
double zp = +0.0;
double zn = -0.0;
printf("0 == memcmp %d\n", 0 == memcmp(&zn, &zp, sizeof zp));// --> 0 == memcmp 0
printf("== %d\n", zn == zp); // --> == 1
Inspire by a #Pascal Cuoq comment, I am looking for a few more functions in standard C that provide arithmetically different results.
Note: Many functions, like sin(), return +0.0 from f(+0.0) and -0.0 from f(-0.0). But these do not provide different arithmetic results. Also the 2 results should not both be NaN.
There are a few standard operations and functions that form numerically different answers between f(+0.0) and f(-0.0).
Different rounding modes or other floating point implementations may give different results.
#include <math.h>
double inverse(double x) { return 1/x; }
double atan2m1(double y) { return atan2(y, -1.0); }
double sprintf_d(double x) {
char buf[20];
// sprintf(buf, "%+f", x); Changed to e
sprintf(buf, "%+e", x);
return buf[0]; // returns `+` or `-`
}
double copysign_1(double x) { return copysign(1.0, x); }
double signbit_d(double x) {
int sign = signbit(x); // my compile returns 0 or INT_MIN
return sign;
}
double pow_m1(double x) { return pow(x, -1.0); }
void zero_test(const char *name, double (*f)(double)) {
double fzp = (f)(+0.0);
double fzn = (f)(-0.0);
int differ = fzp != fzn;
if (fzp != fzp && fzn != fzn) differ = 0; // if both NAN
printf("%-15s f(+0):%-+15e %s f(-0):%-+15e\n",
name, fzp, differ ? "!=" : "==", fzn);
}
void zero_tests(void) {
zero_test("1/x", inverse);
zero_test("atan2(x,-1)", atan2m1);
zero_test("printf(\"%+e\")", sprintf_d);
zero_test("copysign(x,1)", copysign_1);
zero_test("signbit()", signbit_d);
zero_test("pow(x,-odd)", pow_m1);; // #Pascal Cuoq
zero_test("tgamma(x)", tgamma); // #vinc17 #Pascal Cuoq
}
Output:
1/x f(+0):+inf != f(-0):-inf
atan2(x,-1) f(+0):+3.141593e+00 != f(-0):-3.141593e+00
printf("%+e") f(+0):+4.300000e+01 != f(-0):+4.500000e+01
copysign(x,1) f(+0):+1.000000e+00 != f(-0):-1.000000e+00
signbit() f(+0):+0.000000e+00 != f(-0):-2.147484e+09
pow(x,-odd) f(+0):+inf != f(-0):-inf
tgamma(x) f(+0):+inf != f(-0):+inf
Notes:
tgamma(x) came up == on my gcc 4.8.2 machine, but correctly != on others.
rsqrt(), AKA 1/sqrt() is a maybe future C standard function. May/may not also work.
double zero = +0.0; memcpy(&zero, &x, sizeof x) can show x is a different bit pattern than +0.0 but x could still be a +0.0. I think some FP formats have many bit patterns that are +0.0 and -0.0. TBD.
This is a self-answer as provided by https://stackoverflow.com/help/self-answer.
The IEEE 754-2008 function rsqrt (that will be in the future ISO C standard) returns ±∞ on ±0, which is quite surprising. And tgamma also returns ±∞ on ±0. With MPFR, mpfr_digamma returns the opposite of ±∞ on ±0.
I think about this method, but I can't check before weekend, so someone might do some experiments on this, if he/she like, or just tell me that it is nonsense:
Generate a -0.0f. It should be possible to generate staticly by a assigning a tiny negative constant that underflows float representation.
Assign this constant to a volatile double and back to float.
By changing the bit representation 2 times, I assume that the
compiler specific standard bit representation for -0.0f is now in the
variable. The compiler can't outsmart me there, because a totally
other value could be in the volatile variable between those 2 copies.
compare the input to 0.0f. To detect if we have a 0.0f/-0.0f case
if it is equal, assign the input volitale double variable, and then back to float.
I again assume that it has now standard compiler representation for 0.0f
access the bit patterns by a union and compare them, to decide if it is -0.0f
The code might be something like:
typedef union
{
float fvalue;
/* assuming int has at least the same number of bits as float */
unsigned int bitpat;
} tBitAccess;
float my_signf(float x)
{
/* assuming double has smaller min and
other bit representation than float */
volatile double refitbits;
tBitAccess tmp;
unsigned int pat0, patX;
if (x < 0.0f) return -1.0f;
if (x > 0.0f) return 1.0f;
refitbits = (double) (float) -DBL_MIN;
tmp.fvalue = (float) refitbits;
pat0 = tmp.bitpat;
refitbits = (double) x;
tmp.fvalue = (float) refitbits;
patX = tmp.bitpat;
return (patX == pat0)? -1.0f : 1.0f;
}
It is not a standard function, or an operator, but a function that should differentiate between signs of -0.0 and 0.0.
It based (mainly) on the assumption that the compiler vendor does not use different bit patterns for -0.0f as result of changing of formats, even if the floating point format would allow it, and if this holds, it is independent from the chosen bit pattern.
For a floating point formats that have exact one pattern for -0.0f this function should safely do the trick without knowledge of the bit ordering in that pattern.
The other assumptions (about size of the types and so on) can be handled with precompiler switches on the float.h constants.
Edit: On a second thought: If we can force the value comparing to (0.0 || -0.0) below the smallest representable denormal (subnormal) floating point number or its negative counterpart, and there is no second pattern for -0.0f (exact) in the FP format, we could drop the casting to volatile double. (But maybe keep the float volatile, to ensure that with deactivated denormals the compiler can't do any fancy trick, to ignore operations, that reduce any further the absolut value of things comparing equal to 0.0.)
The Code then might look like:
typedef union
{
float fvalue;
/* assuming int has at least the same number of bits as float */
unsigned int bitpat;
} tBitAccess;
float my_signf(float x)
{
volatile tBitAccess tmp;
unsigned int pat0, patX;
if (x < 0.0f) return -1.0f;
if (x > 0.0f) return 1.0f;
tmp.fvalue = -DBL_MIN;
/* forcing something compares equal to 0.0f below smallest subnormal
- not sure if one abs()-factor is enough */
tmp.fvalue = tmp.fvalue * fabsf(tmp.fvalue);
pat0 = tmp.bitpat;
tmp.fvalue = x;
tmp.fvalue = tmp.fvalue * fabsf(tmp.fvalue);
patX = tmp.bitpat;
return (patX == pat0)? -1.0f : 1.0f;
}
This might not work with fancy rounding methods, that prevent rounding from negative values towards -0.0.
Not exactly an answer to the question, but can be useful to know:
Just faced the case a - b = c => b = a - c, which fails to hold if a is 0.0 and b is -0.0. We have 0.0 - (-0.0) = 0.0 => b = 0.0 - 0.0 = 0.0. The sign is lost. The -0.0 is not recovered.
Taken from here.

Test the subtraction of multiple unsigned int

After a few unsuccessful searches, I still don't know if there's a way to substract two unsigned int (or more) and detect if the result of this substraction is negative (or not).
I've try things like :
if(((int)x - (int)y) < 0)
But I don't think it's the best way.
Realize that what you intend by
unsigned int x;
unsigned int y;
if (x - y < 0)
is mathematically equivalent to:
unsigned int x;
unsigned int y;
if (y > x)
EDIT
There aren't many questions for which I can assert a definitive proof, but I can for this one. It's basic inequality algebra:
x - y < 0
add y to both sides:
x < y, which is the same as y > x.
You can do similarly with more variables, if you need:
x - y - z < 0 == x < y + z, or y + z > x
see chux's comment to his own answer, though, for a valid warning about integer overflow when dealing with multiple values.
Simply compare.
unsigned x, y, diff;
diff = x - y;
if (x < y) {
printf("Difference is negative and not representable as an unsigned.\n");
}
[Edit] OP change from "2 unsigned int" to "multiple unsigned int"
Confident doing N*(N-1)/2 compares would be needed if a wider integer width is not available for subtracting N unsigned.
With N > 2, simplest, if available, to use wider integers. Such as
long long diff;
// or
#include <stdint.h>
intmax_t diff;
Depending though on your platform, these type may or may not be wider than unsigned. Certainly not narrower.
Note: this issue similarly applies to multiple signed int too. Other compares are use though. But that is another question.

Can I replace the built-in pow function using this custom function?

I'm trying to write a power function in c without calling pow().
double power (double X, int Y)
{
int i;
double value = 1;
for (i = 0; i < Y; i++)
value *= X;
return value;
}
My question is, is there any reason you can see that this function would not work properly with any given test values? I am trying to cover all input possibilities.
-Thanks
This function is inadequate for several reasons:
It's buggy. Notice that value is declared as an int rather than a double, which means that if you try to compute power(1.5, 1), you'll get back 1 rather than 1.5. In fact, it will be wrong on almost all inputs.
It doesn't handle negative exponents. Try computing power(2, -1). The correct answer is 0.5, but your function (after fixing the int bug noted above) will return 1 rather than 0.5. You can fix this pretty easily (you could, for example, compute power(2, 1) and then take the reciprocal), but it's troublesome as currently written.
It's slow. Most exponentiation, when the power is an integer, is computed using an algorithm called exponentiation by squaring, which is considerably faster than your code. Exponentiation by squaring will do Θ(log Y) multiplications, compared to the Θ(Y) multiplications your code makes. It will take exponentially longer for your function to complete.
It doesn't handle fractional exponents. Try computing power(1.5, 1.5). You'll get the wrong answer because the exponent is an int, not a double. Correcting this isn't easy; search around on Stack Overflow for other questions on how to implement this properly.
It reinvents the wheel. At a fundamental level, you should ask yourself why you're rewriting a function provided to you by the language's math libraries. This can introduce bugs or inefficiencies into the program (see the earlier bullet points) and at the end of the day you haven't increased the functionality.
Hope this helps!
Your function should be like this, it will run slower than pow() which runs in O(log Y):
#include<math.h>
#define ABS(x) ((x<0)?(-x):(x))
double power (double X, int Y)
{
int i;
double value = 1;
if (Y == 0)
{
return 1.0;
}
else if (X == 0)
{
return 0.0;
}
for (i = 0; i < ABS(Y); i++)
{
value *= X;
if (value == NAN
|| value == INFINITY
|| (X > 0 && (value*X) < value)
|| (X < 0 && (value*X) > value))
{
return NAN;
}
}
if (Y < 0) return (1.0/value);
else return value;
}

Resources