Due to precision of the microcontroller, I defined a symbol containing ratio of two flotants numbers, instead of writing the result directly.
#define INTERVAL (0.01F/0.499F)
instead of
#define INTERVAL 0.02004008016032064F
But the first solution add an other operation "/". If we reason by optimization and correct result, what is the best solution?
They are the same, your compiler will evaluate 0.01F/0.499F at compile-time.
There is a mistake in your constant value 0.01F/0.499F = 0.02004008016032064F.
0.01F/0.499F is evaluated at compile time. The precision used at compile time depends on the compiler and likely exceeds the micro-controller's. Thus either approach will typically provide the same code.
In the unlikelihood the compiler's precision is about the same as the micro-controller's float and typical binary floating-point, the values 0.01F and 0.499F will not be exact but within 0.5 ULP (unit in the last place). The quotient 0.01F/0.499F will be then within about sqrt(2)*0.5 ULP. Using 0.02004008016032064F will be within 0.5 ULP. So under select situations, the constant will be better than the quotient.
Under more rare circumstances, a float precision will be more than 0.02004008016032064F and the quotient would be better.
In the end, recommend coding to whatever values are used to drive the equation. e.g. If 0.01 0.499 are the value of two resistors, use those 2 values.
Related
I am using the GMP and MPFR libraries to work with large numbers and I need to calculate the power of a number quickly. The result of the potentiation will always be an integer, but the potency may or may not be a floating number. the GMP library calculates powers very quickly but does not accept floating powers (Using the mpz_pow_ui function), the MPFR library accepts floating powers but is extremely slow as it requires high precision to calculate integers correctly (using the function mpfr_pow).
Is there any solution for this? How can GMP accept floating powers, or MPFR calculate whole numbers quickly (and correctly)?
//Ex:
mpz_pow_ui(mpz_power, base, 4790) //Fast
// power = 4790.60
mpfr_pow(mpfr_power, base, power, MPFR_RNDN) //Slow
The mpz_pow_ui function is fast since the computation can be done with multiplications. Note that mpfr_pow_ui with MPFR_RNDN should be faster if you don't need the exact integer result (which will be huge if the exponent is large), as the multiplications can be done with a smaller precision and rounded.
Unless the exponent is a not-too-large integer, mpfr_pow will be slower because it needs to compute a logarithm and an exponential, which are more complex than multiplications. I don't think that you can avoid it, even if you know that the result will be an integer. But if you know that the result will be an integer, you can compute it with an error less than 1/2. Thus, with an error analysis, you may be able to reduce the precision of the input and output variables, so that mpfr_pow will be faster.
Note: You are saying
the MPFR library accepts floating powers but is extremely slow as it requires high precision to calculate integers correctly (using the function mpfr_pow)
This is not true. MPFR will carefully chose the intermediate precisions to provide an answer with the required accuracy (though it may sometimes do wrong choices).
However, if the exact result is very close to a "machine number", there may be an issue due to the fact that MPFR needs to return the sign of the error (faithful rounding MPFR_RNDF instead of MPFR_RNDN could help, but it is not optimized yet for mpfr_pow): the Table Maker's Dilemma will occur, requiring internal computations in a higher precision. But if you have carefully chosen the precisions of the inputs, this is unlikely to occur in your case; indeed, reducing the precisions of the input will tend to add some error to the exact result, and this will move the exact result away from the expected integer (by exact result, I mean the exact result with the approximate inputs). You can also use some tricks. For instance, if you know that your integer is not a perfect square, then instead of computing xy, you can compute xy/2 (which does not correspond to a machine number, thus should not have an issue due to the Table Maker's Dilemma), then square the result.
It all depends on the order of magnitude of the numbers you are calculating with. The following assumes both the base and the exponent are positive non-complex numbers.
First note that exponentiation of an integer b by a decimal number like 4790.60 can be rewritten like this (just can't beleive I can't write Latex-style math equations in StackOverflow):
b ^ 4790.60 = b ^ (4790 + 0.60) = (b ^ 4790) * (b ^ 0.60)
Then, the first term (b ^ 4790) can clearly be calculated with GMP and results in an integer value.
The second term has an exponent smaller than 1, so its value will be smaller than b. If b is not a huge integer (say << FLINTMAX, the largest integer that can be represented consecutively in a floating point value, is 2^53 for double), then you can use the native double pow function to calculate it and safely round it to integer as desired, and then multiply by the first term. If b is a huge integer, you have too options: convert it to a double if it's within the double range and then use the double pow function (and perhaps loose some precision in the resulting integer with a speed gain), or you can use the mpfr_pow function to calculate this second term, but noting that it will be an smaller number than b, and so you can adjust MPFR precision in that calculation according to the exponent b: if b is closer to zero use a small precision, if b is close to 1 use a precision not too smaller than that of b.
On the end, all of it is a tradeoff between accuracy and speed, given the machine resource limits you are bound to.
I want to use the trigonometric functions of math.h to use 0,1,-1 as sign identifiers for a plane coordinate system.
cos(pi)*length = -1*length
sin(pi)*length = 0
But the functions in math.h requires radian values while I have degree values. Although these can be converted in the formulas I don't know it it will affect the accuracy of the answers i.e. cos(pi) = -0.99999..
Will this affect the operations on my code?
EDIT: What can I do to get my desired results?
They probably will affect the accuracy by the precision to which you have pi - usually one unit in the last place. For cos and sin, |d f(x)/dx| is less than one so your value should be within the same error. For tan, |d f(x)/dx| is not bounded so a small input change can create a large change of output.
Whether such small changes will affect the operation of your code depends largely on whether your code assumes than the results are exact or not. If your code makes faulty assumptions about floating point values, then it will fail, if it allows some small tolerance on equality comparisons, then it wont.
Will this affect the operations on my code?
When conversion is exact between degree/radians, not a problem. Of course this only happens when the angle is +/- 0.0.
To use trig function and get exact trig function result (or at least nearly as good as one can get) with degree arguments, insure the degree argument is reduced (mod 360) first and then converted to radians. Further improvements can be had using trig identities: high precession and Sin and Cos
I don't want to introduce floating point when an inexact value would be a distaster, so I have a couple of questions about when you actually can use them safely.
Are they exact for integers as long as you don't overflow the number of significant digit? Are these two tests always true:
double d = 2.0;
if (d + 3.0 == 5.0) ...
if (d * 3.0 == 6.0) ...
What math functions can you rely on? Are these tests always true:
#include <math.h>
double d = 100.0;
if (log10(d) == 2.0) ...
if (pow(d, 2.0) == 10000.0) ...
if (sqrt(d) == 10.0) ...
How about this:
int v = ...;
if (log2((double) v) > 16.0) ... /* gonna need more than 16 bits to store v */
if (log((double) v) / log(2.0) > 16.0) ... /* C89 */
I guess you can summarize this question as: 1) Can floating point types hold the exact value of all integers up to the number of their significant digits in float.h? 2) Do all floating point operators and functions guarantee that the result is the closest to the actual mathematical result?
I too find incorrect results distasteful.
On common hardware, you can rely on +, -, *, /, and sqrt working and delivering the correctly-rounded result. That is, they deliver the floating-point number closest to the sum, difference, product, quotient, or square root of their argument or arguments.
Some library functions, notably log2 and log10 and exp2 and exp10, traditionally have terrible implementations that are not even faithfully-rounded. Faithfully-rounded means that a function delivers one of the two floating-point numbers bracketing the exact result. Most modern pow implementations have similar issues. Lots of these functions will even blow exact cases like log10(10000) and pow(7, 2). Thus equality comparisons involving these functions, even in exact cases, are asking for trouble.
sin, cos, tan, atan, exp, and log have faithfully-rounded implementations on every platform I've recently encountered. In the bad old days, on processors using the x87 FPU to evaluate sin, cos, and tan, you would get horribly wrong outputs for largish inputs and you'd get the input back for larger inputs. CRlibm has correctly-rounded implementations; these are not mainstream because, I'm told, they've got rather nastier worst cases than the traditional faithfully-rounded implementations.
Things like copysign and nextafter and isfinite all work correctly. ceil and floor and rint and friends always deliver the exact result. fmod and friends do too. frexp and friends work. fmin and fmax work.
Someone thought it would be a brilliant idea to make fma(x,y,z) compute x*y+z by computing x*y rounded to a double, then adding z and rounding the result to a double. You can find this behaviour on modern platforms. It's stupid and I hate it.
I have no experience with the hyperbolic trig, gamma, or Bessel functions in my C library.
I should also mention that popular compilers targeting 32-bit x86 play by a different, broken, set of rules. Since the x87 is the only supported floating-point instruction set and all x87 arithmetic is done with an extended exponent, computations that would induce an underflow or overflow in double precision may fail to underflow or overflow. Furthermore, since the x87 also by default uses an extended significand, you may not get the results you're looking for. Worse still, compilers will sometimes spill intermediate results to variables of lower precision, so you can't even rely on your calculations with doubles being done in extended precision. (Java has a trick for doing 64-bit math with 80-bit registers, but it is quite expensive.)
I would recommend sticking to arithmetic on long doubles if you're targeting 32-bit x86. Compilers are supposed to set FLT_EVAL_METHOD to an appropriate value, but I do not know if this is done universally.
Can floating point types hold the exact value of all integers up to the number of their significant digits in float.h?
Well, they can store the integers which fit in their mantissa (significand). So [-2^53, 2^53] for double. For more on this, see: Which is the first integer that an IEEE 754 float is incapable of representing exactly?
Do all floating point operators and functions guarantee that the result is the closest to the actual mathematical result?
They at least guarantee that the result is immediately on either side of the actual mathematical result. That is, you won't get a result which has a valid floating point value between itself and the "actual" result. But beware, because repeated operations may accumulate an error which seems counter to this, while it is not (because all intermediate values are subject to the same constraints, not just the inputs and output of a compound expression).
Regarding minimising the error in floating-point operations, if I have an operation such as the following in C:
float a = 123.456;
float b = 456.789;
float r = 0.12345;
a = a - (r * b);
Will the result of the calculation change if I split the multiplication and subtraction steps out, i.e.:
float c = r * b;
a = a - c;
I am wondering whether a CPU would then treat these calculations differently and thereby the error may be smaller in one case?
If not, which I presume anyway, are there any good rules-of-thumb to mitigate against floating-point error? Can I massage data in a way that will help?
Please don't just say "use higher precision" - that's not what I'm after.
EDIT
For information about the data, in the general sense errors seem to be worse when the operation results in a very large number like 123456789. Small numbers, such as 1.23456789, seem to yield more accurate results after operations. Am I imagining this, or would scaling larger numbers help accuracy?
Note: this answer starts with a lengthy discussion of the distinction between a = a - (r * b); and float c = r * b; a = a - c; with a c99-compliant compiler. The part of the question about the goal of improving accuracy while avoiding extended precision is covered at the end.
Extended floating-point precision for intermediate results
If your C99 compiler defines FLT_EVAL_METHOD as 0, then the two computations can be expected to produce exactly the same result. If the compiler defines FLT_EVAL_METHOD to 1 or 2, then a = a - (r * b); will be more precise for some values of a, r and b, because all intermediate computations will be done at an extended precision (double for the value 1 and long double for the value 2).
The program cannot set FLT_EVAL_METHOD, but you can use commandline options to change the way your compiler computes with floating-point, and that will make it change its definition accordingly.
Contraction of some intermediate results
Depending whether you use #pragma fp_contract in your program and on your compiler's default value for this pragma, some compound floating-point expressions can be contracted into single instructions that behave as if the intermediate result was computed with infinite precision. This happens to be a possibility for your example when targeting a modern processor, as the fused-multiply-add instruction will compute a directly and as accurately as allowed by the floating-point type.
However, you should bear in mind that the contraction only take place at the compiler's option, without any guarantees. The compiler uses the FMA instruction to optimize speed, not accuracy, so the transformation may not take place at lower optimization levels. Sometimes several transformations are possible (e.g. a * b + c * d can be computed either as fmaf(c, d, a*b) or as fmaf(a, b, c*d)) and the compiler may choose one or the other.
In short, the contraction of floating-point computations is not intended to help you achieve accuracy. You might as well make sure it is disabled if you like reproducible results.
However, in the particular case of the fused-multiply-add compound operation, you can use the C99 standard function fmaf() to tell the compiler to compute the multiplication and addition in a single step with a single rounding. If you do this, then the compiler will not be allowed to produce anything else than the best result for a.
float fmaf(float x, float y, float z);
DESCRIPTION
The fma() functions compute (x*y)+z, rounded as one ternary operation:
they compute the value (as if) to infinite precision and round once to
the result format, according to the current rounding mode.
Note that if the FMA instruction is not available, your compiler's implementation of the function fmaf() will at best just use higher precision, and if this happens on your compilation platform, your might just as well use the type double for the accumulator: it will be faster and more accurate than using fmaf(). In the worst case, a flawed implementation of fmaf() will be provided.
Improving accuracy while only using single-precision
Use Kahan summation if your computation involves a long chain of additions. Some accuracy can be gained by simply summing the r*b terms computed as single-precision products, assuming there are many of them. If you wish to gain more accuracy, you might want to compute r*b itself exactly as the sum of two single-precision numbers, but if you do this you might as well switch to double-single arithmetics entirely. Double-single arithmetics would be the same as the double-double technique succinctly described here, but with single-precision numbers instead.
This started suddenly today morning.
Original lines were this
float angle = (x+90)*(M_PI/180.0);
float xx = cosf(angle);
float yy = sinf(angle);
After putting a breakpoint and hovering cursor.. I get the correct answer for yy as 1. but xx is NOT zero.
I tried with cosf(M_PI_2); still no luck.. it was working fine till yesterday.. I did not change any compiler setting etc..
I am using Xcode latest version as of todays date
The first thing to notice is that you're using floats. These are inherently inaccurate, and for most calculations give you only a close approximation of the mathematically-correct answer. Assuming that x in your code has value 0, angle will have a close approximation to π/2. xx will therefore have an approximation to cos(π/2). However, this is unlikely to be exactly zero due to approximation and rounding issues.
If you were able to change your code to us doubles rather than floats you're likely to get more accuracy, and an answer nearer zero. However, if it is important for your code to produce a value of exactly zero at this point, you're going to have to rethink how you're doing the calculations.
If this doesn't answer your particular problem, give us some more details and we'll have another think.
Contrary to what others have said, this is not an x87 co-processor issue. XCode uses SSE for floating-point computation on Intel by default (except for long double arithmetic).
The "problem" is: when you write cosf(M_PI_2), you are actually telling the XCode compiler (gcc or llvm-gcc or clang) to do the following:
Look up the expansion of M_PI_2 in <math.h>. Per the POSIX standard, it is a double precision literal that converts to the correctly rounded value of π/2.
Round the converted double precision value to single precision.
Call the math library function cosf on the single precision value.
Note that, throughout this process, you are not operating on the actual value of π/2. You are instead operating on that value rounded to a representable floating-point number. While cos(π/2) is exactly zero, you are not telling the compiler to do that computation. You are instead telling the compiler to do cos(π/2 + tiny), where tiny is the difference between the rounded value (float)M_PI_2 and the (unrepresentable) exact value of π/2. If cos is computed with no error at all, the result of cos(π/2 + tiny) is approximately -tiny. If it returned zero, that would be an error.
edit: a step-by-step expansion of the computation on an Intel mac with the current XCode compiler:
M_PI_2 is defined to be
1.57079632679489661923132169163975144
but that's not actually a representable double precision number. When the compiler converts it to a double precision value it becomes exactly
1.5707963267948965579989817342720925807952880859375
This is the closest double-precision number to π/2, but it differs from the actual mathematical value of π/2 by about 6.12*10^(-17).
Step (2) rounds this number to single-precision, which changes the value to exactly
1.57079637050628662109375
Which is approximately π/2 + 4.37*10^(-8). When we compute cosf of this number then, we get:
-0.00000004371138828673792886547744274139404296875
which is very nearly the exact value of cosine evaluated at that point:
-0.00000004371139000186241438857289400265215231661...
In fact, it is the correctly rounded result; there is no value that the computation could have returned that would be more accurate. The only error here is that the computation that you asked the compiler to perform is different from the computation that you thought you were asking it to do.
I suspect the answer is as near as damnit to 0 as not to be worth worrying about.
If i run the same thing through I get the answer "-4.3711388e-008" which can also be written as "-0.000000043711388". Which is pretty damned close to 0. Definitely near enough to not worry about it being out at the 8th decimal place.
Edit: Further to what LiraLuna is saying I wrote the following piece of x87 assembler under visual studio
float fRes;
_asm
{
fld1
fld1
fadd st, st(1)
fldpi
fdiv st, st(1)
fcos
fstp [fRes]
}
char str[16];
sprintf( str, "%f", fRes );
Basically this uses the x87's fcos instruction to do a cosine of pi/2. the value held in str is "0.000000"
This, however, is not actually what fcos returned. It ACTUALLY returned 6.1230318e-017. This implies that the error occurs at the 17th decimal place and, lets be honest, thats far less significant than the standard debug cosf above.
As SSE3 has no specific cosine instruction I suspect (though i cannot confirm without seeing the assembler generated) that it is either using its own taylor series expansion or it is using the fcos instruction anyway. Either way you are still unlikely to get better precision than the error occurring at the 17th decimal place, in my opinion.
The only thing I can think of is a malicious macro substituion i.e. M_PI_2 is no longer 1.57079632679489661923.
Try calling cosf( 1.57079632679489661923 ) to test this.
The real thing you should be careful about is the sign of cosine. Make sure it is the same as you expected. E.g. if you operate with angles between 0 and pi/2. make sure that what you use as PI_2 is less that actual value of pi/2!
And the difference between 0.000001 and 0.0 is less than you think.
The reason
What you are experiencing is the infamous x87 math co-processor float truncate 'bug' - or rather - a feature. IEEE floats have an amazing range of numbers, but at a cost. They sacrifice precession for high range.
They are not inaccurate as you think, though - this is a semi-myth generate by Intel's x87 chip design, that internally uses 80bit internal representation for floats - they have far superior precession though a bit slower.
When you perform a float comparison, x87 caches the float as an 80bit float, then when it's stack is full, it saves the 32bit representation in RAM, decreasing accuracy by a large degree.
The solution
x87 is old, really old. It's replacement is SSE. SSE computes 32bit floats and 64bit floats natively, leading to minimal precession lost on math. Please note that precession issues with floats still exist, but printf("%f\n", cosf(M_PI_2)); should be zero. Heck - even float comparison with SSE is accurate again! (unlike x87).
Since latest Xcode is actually GCC 4.2.1, use the compiler switch -msse3 -mfpmath=sse and see how you get a perfectly round 0.00000 (Note: if you get -0.00000, do not worry, it's perfectly fine and still equals 0.00000 under the IEEE spec (read more at this wikipedia article)).
All Intel macs are guaranteed to have SSE3 support (OSx86 Macs excluded, if you want to support those, use -msse2).