cosf(M_PI_2) not returning zero

cosf(M_PI_2) not returning zero - c

This started suddenly today morning.
Original lines were this
float angle = (x+90)*(M_PI/180.0);
float xx = cosf(angle);
float yy = sinf(angle);
After putting a breakpoint and hovering cursor.. I get the correct answer for yy as 1. but xx is NOT zero.
I tried with cosf(M_PI_2); still no luck.. it was working fine till yesterday.. I did not change any compiler setting etc..
I am using Xcode latest version as of todays date

The first thing to notice is that you're using floats. These are inherently inaccurate, and for most calculations give you only a close approximation of the mathematically-correct answer. Assuming that x in your code has value 0, angle will have a close approximation to π/2. xx will therefore have an approximation to cos(π/2). However, this is unlikely to be exactly zero due to approximation and rounding issues.
If you were able to change your code to us doubles rather than floats you're likely to get more accuracy, and an answer nearer zero. However, if it is important for your code to produce a value of exactly zero at this point, you're going to have to rethink how you're doing the calculations.
If this doesn't answer your particular problem, give us some more details and we'll have another think.

Contrary to what others have said, this is not an x87 co-processor issue. XCode uses SSE for floating-point computation on Intel by default (except for long double arithmetic).
The "problem" is: when you write cosf(M_PI_2), you are actually telling the XCode compiler (gcc or llvm-gcc or clang) to do the following:
Look up the expansion of M_PI_2 in <math.h>. Per the POSIX standard, it is a double precision literal that converts to the correctly rounded value of π/2.
Round the converted double precision value to single precision.
Call the math library function cosf on the single precision value.
Note that, throughout this process, you are not operating on the actual value of π/2. You are instead operating on that value rounded to a representable floating-point number. While cos(π/2) is exactly zero, you are not telling the compiler to do that computation. You are instead telling the compiler to do cos(π/2 + tiny), where tiny is the difference between the rounded value (float)M_PI_2 and the (unrepresentable) exact value of π/2. If cos is computed with no error at all, the result of cos(π/2 + tiny) is approximately -tiny. If it returned zero, that would be an error.
edit: a step-by-step expansion of the computation on an Intel mac with the current XCode compiler:
M_PI_2 is defined to be
1.57079632679489661923132169163975144
but that's not actually a representable double precision number. When the compiler converts it to a double precision value it becomes exactly
1.5707963267948965579989817342720925807952880859375
This is the closest double-precision number to π/2, but it differs from the actual mathematical value of π/2 by about 6.12*10^(-17).
Step (2) rounds this number to single-precision, which changes the value to exactly
1.57079637050628662109375
Which is approximately π/2 + 4.37*10^(-8). When we compute cosf of this number then, we get:
-0.00000004371138828673792886547744274139404296875
which is very nearly the exact value of cosine evaluated at that point:
-0.00000004371139000186241438857289400265215231661...
In fact, it is the correctly rounded result; there is no value that the computation could have returned that would be more accurate. The only error here is that the computation that you asked the compiler to perform is different from the computation that you thought you were asking it to do.

I suspect the answer is as near as damnit to 0 as not to be worth worrying about.
If i run the same thing through I get the answer "-4.3711388e-008" which can also be written as "-0.000000043711388". Which is pretty damned close to 0. Definitely near enough to not worry about it being out at the 8th decimal place.
Edit: Further to what LiraLuna is saying I wrote the following piece of x87 assembler under visual studio
float fRes;
_asm
{
fld1
fld1
fadd st, st(1)
fldpi
fdiv st, st(1)
fcos
fstp [fRes]
}
char str[16];
sprintf( str, "%f", fRes );
Basically this uses the x87's fcos instruction to do a cosine of pi/2. the value held in str is "0.000000"
This, however, is not actually what fcos returned. It ACTUALLY returned 6.1230318e-017. This implies that the error occurs at the 17th decimal place and, lets be honest, thats far less significant than the standard debug cosf above.
As SSE3 has no specific cosine instruction I suspect (though i cannot confirm without seeing the assembler generated) that it is either using its own taylor series expansion or it is using the fcos instruction anyway. Either way you are still unlikely to get better precision than the error occurring at the 17th decimal place, in my opinion.

The only thing I can think of is a malicious macro substituion i.e. M_PI_2 is no longer 1.57079632679489661923.
Try calling cosf( 1.57079632679489661923 ) to test this.

The real thing you should be careful about is the sign of cosine. Make sure it is the same as you expected. E.g. if you operate with angles between 0 and pi/2. make sure that what you use as PI_2 is less that actual value of pi/2!
And the difference between 0.000001 and 0.0 is less than you think.

The reason
What you are experiencing is the infamous x87 math co-processor float truncate 'bug' - or rather - a feature. IEEE floats have an amazing range of numbers, but at a cost. They sacrifice precession for high range.
They are not inaccurate as you think, though - this is a semi-myth generate by Intel's x87 chip design, that internally uses 80bit internal representation for floats - they have far superior precession though a bit slower.
When you perform a float comparison, x87 caches the float as an 80bit float, then when it's stack is full, it saves the 32bit representation in RAM, decreasing accuracy by a large degree.
The solution
x87 is old, really old. It's replacement is SSE. SSE computes 32bit floats and 64bit floats natively, leading to minimal precession lost on math. Please note that precession issues with floats still exist, but printf("%f\n", cosf(M_PI_2)); should be zero. Heck - even float comparison with SSE is accurate again! (unlike x87).
Since latest Xcode is actually GCC 4.2.1, use the compiler switch -msse3 -mfpmath=sse and see how you get a perfectly round 0.00000 (Note: if you get -0.00000, do not worry, it's perfectly fine and still equals 0.00000 under the IEEE spec (read more at this wikipedia article)).
All Intel macs are guaranteed to have SSE3 support (OSx86 Macs excluded, if you want to support those, use -msse2).

Related

Why didn't I get tan(PI /2) = infinty in C

When I calculate tan(PI/2) I get -22877332 in c, but tan(Pi/2) is infinity.
and google giving it as 3060023.30695 Why i am getting different answer
I tried on mingw compiler and in google both are giving different answer
float32 Tan_f32 (float32 ValValue )
{
float32 Result_Val;
Result_Val= (tanf(ValValue));
return Result_Val;
}
in mingw compiler it gives -22877332
and in google 3060023.30695

It is impossible to pass π/2 to tan or tanf because π is irrational, so any floating-point number, no matter how precise, will be at least slightly different from π/2. Therefore, tanf(ValValue) returns the tangent of some value close to π/2, and that tangent is large but not infinite.
In the common format used for float, IEEE-754 basic 32-bit binary floating-point, the closest representable number to π/2 is 1.57079637050628662109375. The tangent of that number is approximately −22877332.4289, and the closest value representable in float is −22877332, which is the result you got. So your tanf is giving you the best possible result for the input number you gave it.

The C standard, or indeed the common but by no means ubiquitous IEEE754 floating point standard, give no guarantee of the accuracy of tan (Cf sqrt). An implementation will make a compromise in getting a good result out in a reasonable number of clock cycles.
In particular, the behaviour of the trigonometric function near an asymptote is particularly unpredictable; and that's the case here.
Accepting that the fault is not due to your value of pi (worth a check although note that because pi is transcendental it can't be represented exactly in any floating point system), if you want a well-behaved tan function across the whole domain, you'll be better off using a third party mathematics library.
Finally, note that under IEEE754, you might get more consistent behaviour around an asyptote if you let floating point division deal with the pole, and use
double c = cos(x); tan(x) = sqrt(1 / c / c - 1);
This might be more numerically stable, as IEEE754 defines a division by zero.

Float precision

Due to precision of the microcontroller, I defined a symbol containing ratio of two flotants numbers, instead of writing the result directly.
#define INTERVAL (0.01F/0.499F)
instead of
#define INTERVAL 0.02004008016032064F
But the first solution add an other operation "/". If we reason by optimization and correct result, what is the best solution?

They are the same, your compiler will evaluate 0.01F/0.499F at compile-time.
There is a mistake in your constant value 0.01F/0.499F = 0.02004008016032064F.

0.01F/0.499F is evaluated at compile time. The precision used at compile time depends on the compiler and likely exceeds the micro-controller's. Thus either approach will typically provide the same code.
In the unlikelihood the compiler's precision is about the same as the micro-controller's float and typical binary floating-point, the values 0.01F and 0.499F will not be exact but within 0.5 ULP (unit in the last place). The quotient 0.01F/0.499F will be then within about sqrt(2)*0.5 ULP. Using 0.02004008016032064F will be within 0.5 ULP. So under select situations, the constant will be better than the quotient.
Under more rare circumstances, a float precision will be more than 0.02004008016032064F and the quotient would be better.
In the end, recommend coding to whatever values are used to drive the equation. e.g. If 0.01 0.499 are the value of two resistors, use those 2 values.

How unreliable are floating point values, operators and functions?

I don't want to introduce floating point when an inexact value would be a distaster, so I have a couple of questions about when you actually can use them safely.
Are they exact for integers as long as you don't overflow the number of significant digit? Are these two tests always true:
double d = 2.0;
if (d + 3.0 == 5.0) ...
if (d * 3.0 == 6.0) ...
What math functions can you rely on? Are these tests always true:
#include <math.h>
double d = 100.0;
if (log10(d) == 2.0) ...
if (pow(d, 2.0) == 10000.0) ...
if (sqrt(d) == 10.0) ...
How about this:
int v = ...;
if (log2((double) v) > 16.0) ... /* gonna need more than 16 bits to store v */
if (log((double) v) / log(2.0) > 16.0) ... /* C89 */
I guess you can summarize this question as: 1) Can floating point types hold the exact value of all integers up to the number of their significant digits in float.h? 2) Do all floating point operators and functions guarantee that the result is the closest to the actual mathematical result?

I too find incorrect results distasteful.
On common hardware, you can rely on +, -, *, /, and sqrt working and delivering the correctly-rounded result. That is, they deliver the floating-point number closest to the sum, difference, product, quotient, or square root of their argument or arguments.
Some library functions, notably log2 and log10 and exp2 and exp10, traditionally have terrible implementations that are not even faithfully-rounded. Faithfully-rounded means that a function delivers one of the two floating-point numbers bracketing the exact result. Most modern pow implementations have similar issues. Lots of these functions will even blow exact cases like log10(10000) and pow(7, 2). Thus equality comparisons involving these functions, even in exact cases, are asking for trouble.
sin, cos, tan, atan, exp, and log have faithfully-rounded implementations on every platform I've recently encountered. In the bad old days, on processors using the x87 FPU to evaluate sin, cos, and tan, you would get horribly wrong outputs for largish inputs and you'd get the input back for larger inputs. CRlibm has correctly-rounded implementations; these are not mainstream because, I'm told, they've got rather nastier worst cases than the traditional faithfully-rounded implementations.
Things like copysign and nextafter and isfinite all work correctly. ceil and floor and rint and friends always deliver the exact result. fmod and friends do too. frexp and friends work. fmin and fmax work.
Someone thought it would be a brilliant idea to make fma(x,y,z) compute x*y+z by computing x*y rounded to a double, then adding z and rounding the result to a double. You can find this behaviour on modern platforms. It's stupid and I hate it.
I have no experience with the hyperbolic trig, gamma, or Bessel functions in my C library.
I should also mention that popular compilers targeting 32-bit x86 play by a different, broken, set of rules. Since the x87 is the only supported floating-point instruction set and all x87 arithmetic is done with an extended exponent, computations that would induce an underflow or overflow in double precision may fail to underflow or overflow. Furthermore, since the x87 also by default uses an extended significand, you may not get the results you're looking for. Worse still, compilers will sometimes spill intermediate results to variables of lower precision, so you can't even rely on your calculations with doubles being done in extended precision. (Java has a trick for doing 64-bit math with 80-bit registers, but it is quite expensive.)
I would recommend sticking to arithmetic on long doubles if you're targeting 32-bit x86. Compilers are supposed to set FLT_EVAL_METHOD to an appropriate value, but I do not know if this is done universally.

Can floating point types hold the exact value of all integers up to the number of their significant digits in float.h?
Well, they can store the integers which fit in their mantissa (significand). So [-2^53, 2^53] for double. For more on this, see: Which is the first integer that an IEEE 754 float is incapable of representing exactly?
Do all floating point operators and functions guarantee that the result is the closest to the actual mathematical result?
They at least guarantee that the result is immediately on either side of the actual mathematical result. That is, you won't get a result which has a valid floating point value between itself and the "actual" result. But beware, because repeated operations may accumulate an error which seems counter to this, while it is not (because all intermediate values are subject to the same constraints, not just the inputs and output of a compound expression).

Different Truncation Results When Casting

I'm having some some difficulty predicting how my C code will truncate results. Refer to the following:
float fa,fb,fc;
short ia,ib;
fa=160
fb=0.9;
fc=fa*fb;
ia=(short)fc;
ib=(short)(fa*fb);
The results are ia=144, ib=143.
I can understand the reasoning for either result, but I don't understand why the two calculations are treated differently. Can anyone refer me to where this behaviour is defined or explain the difference?
Edit: the results are compiled with MS Visual C++ Express 2010 on Intel core i3-330m. I get the same results on gcc version 4.4.3 (Ubuntu 4.4.3-4ubuntu5) under Virtual Box on the same machine.

The compiler is allowed to use more precision for a subexpression like fa*fb than it uses when assigning to a float variable like fc. So it's the fc= part which is very slightly changing the result (and happening to then make a difference in the integer truncation).

aschepler explained the mechanics of what's going on well, but the fundamental problem with your code is using a value which does not exist as a float in code that depends upon the value of its approximation in an unstable way. If you want to multiply by 0.9 (the actual number 0.9=9/10, not the floating point value 0.9 or 0.9f) you should multiply by 9 then divide by 10, or forget about floating point types and use a decimal arithmetic library.
A cheap and dirty way around the problem, when the unstable points are isolated as in your example here, is to just add a value (typically 0.5) which you know will be larger than the error but smaller than the difference from the next integer before truncating.

This is compiler dependent. On mine (gcc 4.4.3) it produces the same result for both expressions, namely -144, probably because the identical expression is optimized away.
Others explained well what happened. In other words I would say that the differences probably happens because your compiler internally promotes floats to 80 bits fpu registers before performing the multiplication, then convert back either to float or to short.
If my hypothesis is true if you write ib = (short)(float)(fa * fb); you should get the same result than when casting fc to short.

Floating-point precision when moving from i386 to x86_64

I have an application that was developed for Linux x86 32 bits. There are lots of floating-point operations and a lot of tests depending on the results. Now we are porting it to x86_64, but the test results are different in this architecture. We don't want to keep a separate set of results for each architecture.
According to the article An Introduction to GCC - for the GNU compilers gcc and g++ the problem is that GCC in X86_64 assumes fpmath=sse while x86 assumes fpmath=387. The 387 FPU uses 80 bit internal precision for all operations and only convert the result to a given floating-point type (float, double or long double) while SSE uses the type of the operands to determine its internal precision.
I can force -mfpmath=387 when compiling my own code and all my operations work correctly, but whenever I call some library function (sin, cos, atan2, etc.) the results are wrong again. I assume it's because libm was compiled without the fpmath override.
I tried to build libm myself (glibc) using 387 emulation, but it caused a lot of crashes all around (don't know if I did something wrong).
Is there a way to force all code in a process to use the 387 emulation in x86_64? Or maybe some library that returns the same values as libm does on both architectures? Any suggestions?
Regarding the question of "Do you need the 80 bit precision", I have to say that this is not a problem for an individual operation. In this simple case the difference is really small and makes no difference. When compounding a lot of operations, though, the error propagates and the difference in the final result is not so small any more and makes a difference. So I guess I need the 80 bit precision.

I'd say you need to fix your tests. You're generally setting yourself up for disappointment if you assume floating point math to be accurate. Instead of testing for exact equality, test whether it's close enough to the expected result. What you've found isn't a bug, after all, so if your tests report errors, the tests are wrong. ;)
As you've found out, every library you rely on is going to assume SSE floating point, so unless you plan to compile everything manually, now and forever, just so you can set the FP mode to x87, you're better off dealing with the problem now, and just accepting that FP math is not 100% accurate, and will not in general yield the same result on two different platforms. (I believe AMD CPU's yield slightly different results in x87 math as well).
Do you absolutely need 80-bit precision? (If so, there obviously aren't many alternatives, other than to compile everything yourself to use 80-bit FP.)
Otherwise, adjust your tests to perform comparisons and equality tests within some small epsilon. If the difference is smaller than that epsilon, the values are considered equal.

80 bit precision is actually dangerous. The problem is that it is actually preserved as long as the variable is stored in the CPU register. Whenever it is forced out to RAM, it is truncated to the type precision. So you can have a variable actually change its value even though nothing happened to it in the code.

If you want long double precision, use long double for all of your floating point variables, rather than expecting float or double to have extra magic precision. This is really a no-brainer.

SSE floating point and 387 floating point use entirely different instructions, and so there's no way to convince SSE fp instructions to use the 387. Probably the best way to deal with this is resign your test suite to getting slightly different results, and not depend on results being the same to the last bit.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight