Hello I'm learning Objective C and I was doing the classic Calculator example.
Problem is that I'm getting a negative zero when I multiply zero by any negative number, and I put the result into a (double) type!
To see what was going on, I played with the debugger and this is what I got:
(gdb) print -2*0
$1 = 0
(gdb) print (double) -2 * 0
$2 = -0
In the second case when I cast it to a double type, it turns into negative zero! How can I fix that in my application? I need to work with doubles.
How can I fix the result so I get a zero when the result should be zero?
I did a simple test:
double d = (double) -2.0 * 0;
if (d < 0)
printf("d is less than zero\n");
if (d == 0)
printf("d is equal to zero\n");
if (d > 0)
printf("d is greater than zero\n");
printf("d is: %lf\n", d);
It outputs:
d is equal to zero
d is: -0.000000
So, to fix this, you can add a simple if-check to your application:
if (d == 0) d = 0;
There is a misunderstanding here about operator precedence:
(double) -2 * 0
is parsed as
((double)(-(2))) * 0
which is essentially the same as (-2.0) * 0.0.
The C Standard informative Annex J lists as Unspecifier behavior Whether certain operators can generate negative zeros and whether a negative zero becomes a normal zero when stored in an object (6.2.6.2).
Conversely, (double)(-2 * 0) should generate a positive zero 0.0 on most current platforms as the multiplication is performed using integer arithmetic. The C Standard does have support for architectures that distinguish positive and negative zero integers, but these are vanishingly rare nowadays.
If you want to force zeros to be positive, this simple fix should work:
if (d == 0) {
d = 0;
}
You could make the intent clearer with this:
if (d == -0.0) {
d = +0.0;
}
But the test will succeed also if d is a positive zero.
Chux has a simpler solution for IEC 60559 complying environments:
d = d + 0.0; // turn -0.0 to +0.0
http://en.wikipedia.org/wiki/Signed_zero
The number 0 is usually encoded as +0, but can be represented by either +0 or −0
It shouldn't impact on calculations or UI output.
How can I fix that in my application?
Code really is not broken, so nothing needs to be "fixed". #kennytm
How can I fix the result so I get a zero when the result should be zero?
To easily get rid of the - when the result is -0.0, add 0.0. Code following standard (IEC 60559 floating-point) rules will produce drop the - sign.
double nzero = -0.0;
printf("%f\n", nzero);
printf("%f\n", nzero + 0.0);
printf("%f\n", fabs(nzero)); // This has a side effect of changing all negative values
// pedantic code using <math.h>
if (signbit(nzero)) nzero = 0.0; // This has a side effect of changing all negative values
printf("%f\n", nzero);
Usual output.
-0.000000
0.000000
0.000000
0.000000
Yet for general double x that may have any value, hard to beat the following. #Richard J. Ross III #chqrlie The x + 0.0 approach has an advantage in that likely does not introduce a branch, yet the following is clear.
if (x == 0.0) x = 0.0;
Note: fmax(-0.0, 0.0) may produce -0.0.
In my code (on C MPI intel compiler) -0.0 and +0.0 are not the same.
As an example:
d = -0.0
if (d < 0.0)
do something...
and it is doing this "something".
also adding -0.0 + 0.0 = -0.0...
GCC was seemingly optimizing out the simple fix of negzero += 0.0 as noted above until I realized that -fno-signed-zeros was in place. Duh.
But in the process I did find that this will fix a signed zero, even when -fno-signed-zeros is set:
if (negzero > -DBL_MIN && negzero < DBL_MIN && signbit(negzero))
negzero = 0.0;
or as a macro:
#define NO_NEG_ZERO(a) ( (a) > -DBL_MIN && (a) < DBL_MIN && signbit(a) ? 0.0 : (a) )
negzero = NO_NEG_ZERO(negzero)
Note that the comparitor is < and > (not <= or >=) so a really is zero! (OR it is a subnormal number...but nevermind the guy behind the curtain.)
Maybe this answer is slightly less correct in the sense that a value of between DBL_MIN and -DBL_MIN will be converted to 0.0, in which case this isn't the way if you need to support subnormal numbers.
If you do need subnormal numbers (!) then perhaps your the kind of person who plays with -fno-signed-zeros, too.
The lesson here for me and subnormal-numbers-guy is this: if you play outside of spec then expect out-of-spec results ;)
(Sorry, that was not PC. It could be subnormal-numbers-person...but I digress.)
Related
I have a short program that performs a numerical computation, and obtains an incorrect NaN result when some specific conditions hold. I cannot see how this NaN result can arise. Note that I am not using compiler options that allow the reordering of arithmetic operations, such as -ffath-math.
Question: I am looking for an explanation of how the NaN result arises. Mathematically, there is nothing in the computation that leads to division by zero or similar. Am I missing something obvious?
Note that I am not asking how to fix the problem—that is easy. I am simply looking for an understanding of how the NaN appears.
Minimal example
Note that this example is very fragile and even minor modifications, such as adding printf() calls in the loop to observe values, will change the behaviour. This is why I was unable to minimize it further.
// prog.c
#include <stdio.h>
#include <math.h>
typedef long long myint;
void fun(const myint n, double *result) {
double z = -1.0;
double phi = 0.0;
for (myint i = 0; i < n; i++) {
double r = sqrt(1 - z*z);
/* avoids division by zero when r == 0 */
if (i != 0 && i != n-1) {
phi += 1.0 / r;
}
double x = r*cos(phi);
double y = r*sin(phi);
result[i + n*0] = x;
result[i + n*1] = y;
result[i + n*2] = z;
z += 2.0 / (n - 1);
}
}
#define N 11
int main(void) {
// perform computation
double res[3*N];
fun(N, res);
// output result
for (int i=0; i < N; i++) {
printf("%g %g %g\n", res[i+N*0], res[i+N*1], res[i+N*2]);
}
return 0;
}
Compile with:
gcc -O3 -mfpmath=387 prog.c -o prog -lm
The last line of the output is:
nan nan 1
Instead of NaN, I expect a number close to zero.
Critical features of the example
The following must all hold for the NaN output to appear:
Compile with GCC on an x86 platform. I was able to reproduce with this GCC 12.2.0 (from MacPorts) on macOS 10.14.6, as well as with GCC versions 9.3.0, 8.3.0 and 7.5.0 on Linux (openSUSE Leap 15.3).
I cannot reproduce it with GCC 10.2.0 or later on Linux, or GCC 11.3.0 on macOS.
Choose to use x87 instructions with -mfpmath=387, and an optimization level of -O2 or -O3.
myint must be a signed 64-bit type.
Thinking of result as an n-by-3 matrix, it must be stored in column-major order.
No printf() calls in the main loop of fun().
Without these features, I do get the expected output, i.e. something like 1.77993e-08 -1.12816e-08 1 or 0 0 1 as the last line.
Explanation of the program
Even though it doesn't really matter to the question, I give a short explanation of what the program does, to make it easier to follow. It computes x, y, z three-dimensional coordinates of n points on the surface of a sphere in a specific arrangement. z values go from -1 to 1 in equal increments, however, the last value won't be precisely 1 due to numerical round-off errors. The coordinates are written into an n-by-3 matrix, result, stored in column-major order. r and phi are polar coordinates in the (x, y) plane.
Note that when z is -1 or 1 then r becomes 0. This happens in the first and last iteration steps. This would lead to division by 0 in the 1.0 / r expression. However, 1.0 / r is excluded from the first and last iteration of the loop.
This is caused by interplay of x87 80-bit internal precision, non-conforming behavior of GCC, and optimization decisions differing between compiler versions.
x87 supports IEEE binary32 and binary64 only as storage formats, converting to/from its 80-bit representation on loads/stores. To make program behavior predictable, the C standard requires that extra precision is dropped on assignments, and allows to check intermediate precision via the FLT_EVAL_METHOD macro. With -mfpmath=387, FLT_EVAL_METHOD is 2, so you know that intermediate precision corresponds to the long double type.
Unfortunately, GCC does not drop extra precision on assignments, unless you're requesting stricter conformance via -std=cNN (as opposed to -std=gnuNN), or explicitly passing -fexcess-precision=standard.
In your program, the z += 2.0 / (n - 1); statement should be computed by:
Computing 2.0 / (n - 1) in the intermediate 80-bit precision.
Adding to previous value of z (still in the 80-bit precision).
Rounding to the declared type of z (i.e. to binary64).
In the version that ends up with NaNs, GCC instead does the following:
Computes 2.0 / (n - 1) just once before the loop.
Rounds this fraction from binary80 to binary64 and stores on stack.
In the loop, it reloads this value from stack and adds to z.
This is non-conforming, because the 2.0 / (n - 1) undergoes rounding twice (first to binary80, then to binary64).
The above explains why you saw different results depending on compiler version and optimization level. However, in general you cannot expect your computation to not produce NaNs in the last iteration. When n - 1 is not a power of two, 2.0 / (n - 1) is not representable exactly and may be rounded up. In that case, 'z' may be growing a bit faster than the true sum -1.0 + 2.0 / (n - 1) * i, and may end up above 1.0 for i == n - 1, causing sqrt(1 - z*z) to produce a NaN due to a negative argument.
In fact, if you change #define N 11 to #define N 12 in your program, you will deterministically get a NaN both with 80-bit and 64-bit intermediate precision.
... how the NaN result arises (?)
Even though better adherence to the C spec may apparently solve OP's immediate problem, I assert other prevention practices should be considered.
sqrt(1 - z*z) is a candidate NaN when |z| > 1.0.
The index test prevention of division by zero may not be enough and then leading to cos(INFINITE), another NaN possibility.
// /* avoids division by zero when r == 0 */
// if (i != 0 && i != n-1) {
// phi += 1.0 / r;
// }
To avoid these, 1) test directly and 2) use more a more precise approach.
if (r) {
phi += 1.0 / r;
}
// double r = sqrt(1 - z*z);
double rr = (1-z)*(1+z); // More precise than 1 - z*z
double r = rr < 0.0 ? 0.0 : sqrt(rr);
I am making this big program in C, which is a part of my homework. My problem is that my program is outputing x = -0.00 instead of x = 0.00. I have tried comparing like if(x==-0.00) x=fabs(x) but I've read that it won't work like that with doubles. So my question is are there any other ways to check if double is equal to negative zero?
You can use the standard macro signbit(arg) from math.h. It will return nonzero value if arg is negative and 0 otherwise.
From the man page:
signbit() is a generic macro which can work on all real floating-
point types. It returns a nonzero value if the value of x has its
sign bit set.
This is not the same as x < 0.0, because IEEE 754 floating point
allows zero to be signed. The comparison -0.0 < 0.0 is false, but
signbit(-0.0) will return a nonzero value.
NaNs and infinities have a sign bit.
Also, from cppreference.com:
This macro detects the sign bit of zeroes, infinities, and NaNs. Along
with copysign, this macro is one of the only two portable ways to
examine the sign of a NaN.
Very few calculations actually give you a signed negative zero. What you're probably observing is a negative value close to zero that has been truncated by your formatting choice when outputting the value.
Note that -0.0 is defined to be equal to 0.0, so a simple comparison to 0.0 is enough to verify a signed zero.
If you want to convert an exact signed zero -0.0 to 0.0 then add 0.0 to it.
Most likely, your program has a small negative value, not zero, which printf formats as “-0.00”. To print such numbers as “0.00”, you can test how printf will format them and replace the undesired string with the desired string:
#include <stdio.h>
#include <string.h>
void PrintAdjusted(double x)
{
char buffer[6];
int result = snprintf(buffer, sizeof buffer, "%.2f", x);
/* If snprintf produces a result other than "-0.00", including
a result that does not fit in the buffer, use it.
Otherwise, print "0.00".
*/
if (sizeof buffer <= result || strcmp(buffer, "-0.00") != 0)
printf("%.2f", x);
else
printf("0.00");
}
This is portable. Alternatives such as comparing the number to -0.005 have portability issues, due to implementation-dependent details in floating-point formats and rounding methods in printf.
If you truly do want to test whether a number x is −0, you can use:
#include <math.h>
…
signbit(x) && x == 0
There are two functions you need here.
First, the signbit function can tell you if the sign bit is set on a floating point number. Second, the fpclassify function will tell you if a floating point number is some form of 0.
For example:
double x = 0.0;
double y = -0.0;
double a = 3;
double b = -2;
printf("x=%f, y=%f\n", x, y);
printf("x is zero: %d\n", (fpclassify(x) == FP_ZERO));
printf("y is zero: %d\n", (fpclassify(y) == FP_ZERO));
printf("a is zero: %d\n", (fpclassify(a) == FP_ZERO));
printf("b is zero: %d\n", (fpclassify(b) == FP_ZERO));
printf("x sign: %d\n", signbit(x));
printf("y sign: %d\n", signbit(y));
printf("a sign: %d\n", signbit(a));
printf("b sign: %d\n", signbit(b));
Output:
x=0.000000, y=-0.000000
x is zero: 1
y is zero: 1
a is zero: 0
b is zero: 0
x sign: 0
y sign: 1
a sign: 0
b sign: 1
So to check if a value is negative zero, do the following:
if (fpclassify(x) == FP_ZERO)) {
if (signbit(x)) {
printf("x is negative zero\n");
} else {
printf("x is positive zero\n");
}
}
To always get the non-negative version, you don't need the comparison at all.
You can take the absolute value all of the time. If the value is non-negative, fabs should return the original value.
I have written a code in c which gives me rotation of point by angle given in the form of triples.
When I compile and run for test case it gives me output as -0,7 .
Where as the same code in python gives me output as 0,7 .
When I run the same code on online compiling platforms it gives me correct output.
I am using codeblocks windows 10 os.
Is there something wrong with codeblocks?
What should i do?
C code:
#include<stdio.h>
#include<math.h>
int main()
{
double xp,yp,xq,yq,a,b,c;
double t,xn,yn;
int z;
scanf("%d",&z);
// printf("Enter coordinates of p \n");
scanf("%lf%lf",&xp,&yp);
// printf("\nEnter triple \n");
scanf("%lf%lf%lf",&a,&b,&c);
// printf("\nEnter coordinates of q \n");
scanf("%lf%lf",&xq,&yq);
t=asin(b/c);
if(z==0)
{
xn=xp*cos(t)-yp*sin(t)-xq*cos(t)+yq*sin(t)+xq;
yn=xp*sin(t)+yp*cos(t)-xq*sin(t)-yq*cos(t)+yq;
}
else
{
xn=xp*cos(t)+yp*sin(t)-xq*cos(t)-yq*sin(t)+xq;
yn=-xp*sin(t)+yp*cos(t)+xq*sin(t)-yq*cos(t)+yq;
}
printf("%lf %lf",xn,yn);
return 0;
}
Output:
0
4 7
3 4 5
2 3
-0.000000 7.000000
Process returned 0 (0x0) execution time : 10.675 s
Press any key to continue.
https://stackoverflow.com/questions/34088742/what-is-the-purpose-of-having-both-positive-and-negative-zero-0-also-written
The most likely thing here is that you don't actually have a signed -0.0, but your formatting is presenting it to you that way.
You'll get a signed negative zero in floating point if one of your calculations yields a negative subnormal number that's rounded to zero.
If you do indeed have a pure signed zero, then one workaround is to clobber it with a the ternary conditional operator as printf does reserve the right to propagate the signed zero into the output: f == 0.0 ? 0.0 : f is one such scheme or even with the flashier but obfuscated f ? f : 0.0. The C standard defines -0.0 to be equal to 0.0. Another way (acknowledge #EricPostpischil) is to add 0.0 to the value.
For floating point values there are two zeroes 0.0 and -0.0. They compare as equal (e.g. -0.0 == 0.0 returns 1) but they are two distinct values. They are there for symmetry, because for any small value other than 0, the sign does make a mathematical difference. For some edge cases they make a difference. For example 1.0/0.0 == INFINITY and 1.0/-0.0 == -INFINITY. (INFINITY, -INFINITY and NAN) are also values that the floating point variables can take.
To make printf not print -0 for -0.0 and any small that would be truncated to 0 or -0, one way is to artificially put very small values to 0.0, for example:
if(abs(x) < 1e-5) x = 0.0;
I have one double, and one int64_t. I want to know if they hold exactly the same value, and if converting one type into the other does not lose any information.
My current implementation is the following:
int int64EqualsDouble(int64_t i, double d) {
return (d >= INT64_MIN)
&& (d < INT64_MAX)
&& (round(d) == d)
&& (i == (int64_t)d);
}
My question is: is this implementation correct? And if not, what would be a correct answer? To be correct, it must leave no false positive, and no false negative.
Some sample inputs:
int64EqualsDouble(0, 0.0) should return 1
int64EqualsDouble(1, 1.0) should return 1
int64EqualsDouble(0x3FFFFFFFFFFFFFFF, (double)0x3FFFFFFFFFFFFFFF) should return 0, because 2^62 - 1 can be exactly represented with int64_t, but not with double.
int64EqualsDouble(0x4000000000000000, (double)0x4000000000000000) should return 1, because 2^62 can be exactly represented in both int64_t and double.
int64EqualsDouble(INT64_MAX, (double)INT64_MAX) should return 0, because INT64_MAX can not be exactly represented as a double
int64EqualsDouble(..., 1.0e100) should return 0, because 1.0e100 can not be exactly represented as an int64_t.
Yes, your solution works correctly because it was designed to do so, because int64_t is represented in two's complement by definition (C99 7.18.1.1:1), on platforms that use something resembling binary IEEE 754 double-precision for the double type. It is basically the same as this one.
Under these conditions:
d < INT64_MAX is correct because it is equivalent to d < (double) INT64_MAX and in the conversion to double, the number INT64_MAX, equal to 0x7fffffffffffffff, rounds up. Thus you want d to be strictly less than the resulting double to avoid triggering UB when executing (int64_t)d.
On the other hand, INT64_MIN, being -0x8000000000000000, is exactly representable, meaning that a double that is equal to (double)INT64_MIN can be equal to some int64_t and should not be excluded (and such a double can be converted to int64_t without triggering undefined behavior)
It goes without saying that since we have specifically used the assumptions about 2's complement for integers and binary floating-point, the correctness of the code is not guaranteed by this reasoning on platforms that differ. Take a platform with binary 64-bit floating-point and a 64-bit 1's complement integer type T. On that platform T_MIN is -0x7fffffffffffffff. The conversion to double of that number rounds down, resulting in -0x1.0p63. On that platform, using your program as it is written, using -0x1.0p63 for d makes the first three conditions true, resulting in undefined behavior in (T)d, because overflow in the conversion from integer to floating-point is undefined behavior.
If you have access to full IEEE 754 features, there is a shorter solution:
#include <fenv.h>
…
#pragma STDC FENV_ACCESS ON
feclearexcept(FE_INEXACT), f == i && !fetestexcept(FE_INEXACT)
This solution takes advantage of the conversion from integer to floating-point setting the INEXACT flag iff the conversion is inexact (that is, if i is not representable exactly as a double).
The INEXACT flag remains unset and f is equal to (double)i if and only if f and i represent the same mathematical value in their respective types.
This approach requires the compiler to have been warned that the code accesses the FPU's state, normally with #pragma STDC FENV_ACCESS on but that's typically not supported and you have to use a compilation flag instead.
OP's code has a dependency that can be avoided.
For a successful compare, d must be a whole number and round(d) == d takes care of that. Even d, as a NaN would fail that.
d must be mathematically in the range of [INT64_MIN ... INT64_MAX] and if the if conditions properly insure that, then the final i == (int64_t)d completes the test.
So the question comes down to comparing INT64 limits with the double d.
Let us assume FLT_RADIX == 2, but not necessarily IEEE 754 binary64.
d >= INT64_MIN is not a problem as -INT64_MIN is a power of 2 and exactly converts to a double of the same value, so the >= is exact.
Code would like to do the mathematical d <= INT64_MAX, but that may not work and so a problem. INT64_MAX is a "power of 2 - 1" and may not convert exactly - it depends on if the precision of the double exceeds 63 bits - rendering the compare unclear. A solution is to halve the comparison. d/2 suffers no precision loss and INT64_MAX/2 + 1 converts exactly to a double power-of-2
d/2 < (INT64_MAX/2 + 1)
[Edit]
// or simply
d < ((double)(INT64_MAX/2 + 1))*2
Thus if code does not want to rely on the double having less precision than uint64_t. (Something that likely applies with long double) a more portable solution would be
int int64EqualsDouble(int64_t i, double d) {
return (d >= INT64_MIN)
&& (d < ((double)(INT64_MAX/2 + 1))*2) // (d/2 < (INT64_MAX/2 + 1))
&& (round(d) == d)
&& (i == (int64_t)d);
}
Note: No rounding mode issues.
[Edit] Deeper limit explanation
Insuring mathematically, INT64_MIN <= d <= INT64_MAX, can be re-stated as INT64_MIN <= d < (INT64_MAX + 1) as we are dealing with whole numbers. Since the raw application of (double) (INT64_MAX + 1) in code is certainly 0, an alternative, is ((double)(INT64_MAX/2 + 1))*2. This can be extended for rare machines with double of higher powers-of-2 to ((double)(INT64_MAX/FLT_RADIX + 1))*FLT_RADIX. The comparison limits being exact powers-of-2, conversion to double suffers no precision loss and (lo_limit >= d) && (d < hi_limit) is exact, regardless of the precision of the floating point. Note: that a rare floating point with FLT_RADIX == 10 is still a problem.
In addition to Pascal Cuoq's elaborate answer, and given the extra context you give in comments, I would add a test for negative zeros. You should preserve negative zeros unless you have good reasons not to. You need a specific test to avoid converting them to (int64_t)0. With your current proposal, negative zeros will pass your test, get stored as int64_t and read back as positive zeros.
I am not sure what is the most efficient way to test them, maybe this:
int int64EqualsDouble(int64_t i, double d) {
return (d >= INT64_MIN)
&& (d < INT64_MAX)
&& (round(d) == d)
&& (i == (int64_t)d
&& (!signbit(d) || d != 0.0);
}
I edited a C program for my assignment, previously there wasn't typecasting and the iteration stopped at i=1, now with the typecasting it stops at i=6.
Any ideas why? Thanks in advance!
int main(void)
{
int i = 0;
double d = 0.0;
while ( (i == (int) (d * 10)) && (i < 10) )
{
i = i + 1;
d = (double) (d + 0.1);
printf("%d %lf\n", i, d);
}
printf("%d %lf\n", i, d);
getch();
return 0;
}
Floating point arithmetic is inexact. The value 0.1 is not exactly representable in binary floating point. The recommended reading here is: What Every Computer Scientist Should Know About Floating-Point Arithmetic.
At some point in the program, d becomes slightly less than i/10 due to rounding error, and so your loop terminates.
In addition to the other answers, I'd like to answer the question why the loop terminates earlier with the condition i == (d * 10) than with i == (int) (d * 10).
In the first case, int value at the left side of == is promoted to double, so the inequality happens when the accumulated error in d*10 is either positive or negative (e.g. 0.999999 or 1.000001).
In the 2nd case, the right side is truncated to int, so the inequality happens only when the error is negative (e.g. 5.999999). Therefore, the 1st version would fail earlier.
As has been stated many times before, the reason this doesn't work is that binary floating point numbers cannot represent all decimal floating point binary numbers, it just isn't possible. To read more, check out this really great article:
What Every Programmer Should Know About Floating-Point Arithmetic
Now, on the more practical side of things, when using floating point and comparing it to another number, you should almost always round the value or use an epsilon value, like this:
if (ABS(doubleValue - intValue) < 0.00001) // 0.00001 is a margin-of-error for floating point arithmetic
// the two numbers are even (or close to it)