Is there a clever/efficient algorithm for determining the hypotenuse of an angle (i.e. sqrt(a² + b²)), using fixed point math on an embedded processor without hardware multiply?
If the result doesn't have to be particularly accurate, you can get a crude
approximation quite simply:
Take absolute values of a and b, and swap if necessary so that you have a <= b. Then:
h = ((sqrt(2) - 1) * a) + b
To see intuitively how this works, consider the way that a shallow angled line is plotted on a pixel display (e.g. using Bresenham's algorithm). It looks something like this:
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | | | | | | | | | | | | | | | |*|*|*| ^
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| | | | | | | | | | | | |*|*|*|*| | | | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| | | | | | | | |*|*|*|*| | | | | | | | a pixels
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| | | | |*|*|*|*| | | | | | | | | | | | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
|*|*|*|*| | | | | | | | | | | | | | | | v
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
<-------------- b pixels ----------->
For each step in the b direction, the next pixel to be plotted is either immediately to the right, or one pixel up and to the right.
The ideal line from one end to the other can be approximated by the path which joins the centre of each pixel to the centre of the adjacent one. This is a series of a segments of length sqrt(2), and b-a segments of length 1 (taking a pixel to be the unit of measurement). Hence the above formula.
This clearly gives an accurate answer for a == 0 and a == b; but gives an over-estimate for values in between.
The error depends on the ratio b/a; the maximum error occurs when b = (1 + sqrt(2)) * a and turns out to be 2/sqrt(2+sqrt(2)), or about 8.24% over the true value. That's not great, but if it's good enough for your application, this method has the advantage of being simple and fast. (The multiplication by a constant can be written as a sequence of shifts and adds.)
For the record, here are a few more approximations, listed in roughly
increasing order of complexity and accuracy. All these assume 0 ≤ a ≤ b.
h = b + 0.337 * a // max error ≈ 5.5 %
h = max(b, 0.918 * (b + (a>>1))) // max error ≈ 2.6 %
h = b + 0.428 * a * a / b // max error ≈ 1.04 %
Edit: to answer Ecir Hana's question, here is how I derived these
approximations.
First step. Approximating a function of two variables can be a
complex problem. Thus I first transformed this into the problem of
approximating a function of one variable. This can be done by choosing
the longest side as a “scale” factor, as follows:
h = √(b2 + a2)
= b √(1 + (a/b)2)
= b f(a/b) where f(x) = √(1+x2)
Adding the constraint 0 ≤ a ≤ b means we are only concerned with
approximating f(x) in the interval [0, 1].
Below is the plot of f(x) in the relevant interval, together with the
approximation given by Matthew Slattery (namely (√2−1)x + 1).
Second step. Next step is to stare at this plot, while asking
yourself the question “how can I approximate this function cheaply?”.
Since the curve looks roughly parabolic, my first idea was to use a
quadratic function (third approximation). But since this is still
relatively expensive, I also looked at linear and piecewise linear
approximations. Here are my three solutions:
The numerical constants (0.337, 0.918 and 0.428) were initially free
parameters. The particular values were chosen in order to minimize the
maximum absolute error of the approximations. The minimization could
certainly be done by some algorithm, but I just did it “by hand”,
plotting the absolute error and tuning the constant until it is
minimized. In practice this works quite fast. Writing the code to
automate this would have taken longer.
Third step is to come back to the initial problem of approximating a
function of two variables:
h ≈ b (1 + 0.337 (a/b)) = b + 0.337 a
h ≈ b max(1, 0.918 (1 + (a/b)/2)) = max(b, 0.918 (b + a/2))
h ≈ b (1 + 0.428 (a/b)2) = b + 0.428 a2/b
Consider using CORDIC methods. Dr. Dobb's has an article and associated library source here. Square-root, multiply and divide are dealt with at the end of the article.
One possibility looks like this:
#include <math.h>
/* Iterations Accuracy
* 2 6.5 digits
* 3 20 digits
* 4 62 digits
* assuming a numeric type able to maintain that degree of accuracy in
* the individual operations.
*/
#define ITER 3
double dist(double P, double Q) {
/* A reasonably robust method of calculating `sqrt(P*P + Q*Q)'
*
* Transliterated from _More Programming Pearls, Confessions of a Coder_
* by Jon Bentley, pg. 156.
*/
double R;
int i;
P = fabs(P);
Q = fabs(Q);
if (P<Q) {
R = P;
P = Q;
Q = R;
}
/* The book has this as:
* if P = 0.0 return Q; # in AWK
* However, this makes no sense to me - we've just insured that P>=Q, so
* P==0 only if Q==0; OTOH, if Q==0, then distance == P...
*/
if ( Q == 0.0 )
return P;
for (i=0;i<ITER;i++) {
R = Q / P;
R = R * R;
R = R / (4.0 + R);
P = P + 2.0 * R * P;
Q = Q * R;
}
return P;
}
This still does a couple of divides and four multiples per iteration, but you rarely need more than three iterations (and two is often adequate) per input. At least with most processors I've seen, that'll generally be faster than the sqrt would be on its own.
For the moment it's written for doubles, but assuming you've implemented the basic operations, converting it to work with fixed point shouldn't be terribly difficult.
Some doubts have been raised by the comment about "reasonably robust". At least as originally written, this was basically a rather backhanded way of saying that "it may not be perfect, but it's still at least quite a bit better than a direct implementation of the Pythagorean theorem."
In particular, when you square each input, you need roughly twice as many bits to represent the squared result as you did to represent the input value. After you add (which needs only one extra bit) you take the square root, which gets you back to needing roughly the same number of bits as the inputs. Unless you have a type with substantially greater precision than the inputs, it's easy for this to produce really poor results.
This algorithm doesn't square either input directly. It is still possible for an intermediate result to underflow, but it's designed so that when it does so, the result still comes out as well as the format in use supports. Basically, the situation in which it happens is that you have an extremely acute triangle (e.g., something like 90 degrees, 0.000001 degrees, and 89.99999 degrees). If it's close enough to 90, 0, 90, we may not be able to represent the difference between the two longer sides, so it'll compute the hypotenuse as being the same length as the other long side.
By contrast, when the Pythagorean theorem fails, the result will often be a NaN (i.e., tells us nothing) or, depending on the floating point format in use, quite possibly something that looks like a reasonable answer, but is actually wildly incorrect.
You can start by reevaluating if you need the sqrt at all. Many times you are calculating the hypotenuse just to compare it to another value - if you square the value you're comparing against you can eliminate the square root altogether.
Unless you're doing this at >1kHz, multiply even on a MCU without hardware MUL isn't terrible. What's much worse is the sqrt. I would try to modify my application so it doesn't need to calculate it at all.
Standard libraries would probably be best if you actually need it, but you could look at using Newton's method as a possible alternative. It would require several multiply/divide cycles to perform, however.
AVR resources
Atmel App note AVR200: Multiply and Divide Routines (pdf)
This sqrt function on AVR Freaks forum
Another AVR Freaks post
Maybe you could use some of Elm Chans Assembler Libraries and adapt the ihypot-function to your ATtiny. You would need to replace the MUL and maybe (i haven't checked) some other instructions.
Related
I am going (1) to loop a regression over a certain criterion many times; and (2) to store a certain coefficient from each regression. Here is an example:
clear
sysuse auto.dta
local x = 2000
while `x' < 5000 {
xi: regress price mpg length gear_ratio i.foreign if weight < `x'
est sto model_`x'
local x = `x' + 100
}
est dir
I just care about one predictor, say mpg here. I want to extract coefficients of mpg from each result into one independent file (any file is OK, .dta would be great) to see if there is a trend as the threshold for weight increases. What I am doing now is to useestout to export the results, something like:
esttab * using test.rtf, replace se stats(r2_a N, labels(R-squared)) starl(* 0.10 ** 0.05 *** 0.01) nogap onecell title(regression tables)
estout will export everything and I need to edit the results. This works well for regressions with few predictors, but my real dataset has more than 30 variables and the regression will loop at least 100 times (I have a variable Distance with range from 0 to 30,000: it has the role of weight in the example). Therefore, it is really difficult for me to edit the results without making mistakes.
Is there any other efficient way to solve my problem? Since my case is not looping over a group variable, but over a certain criterion. the statsby function seems not working well here.
As #Todd has already suggested, you can just choose the particular results you care about and use postfile to store them as new variables in a new dataset. Note that a forval loop is more direct than your while code, while using xi: is superseded by factor variable notation in recent versions of Stata. (I have not changed that just in case you are using some older version.) Note evaluation of saved results such as _b[_cons] on the fly and the use of parentheses () to stop negative signs being evaluated. Some code examples elsewhere store results temporarily in local macros or scalars, which is quite unnecessary.
sysuse auto.dta, clear
tempname myresults
postfile `myresults' threshold intercept gradient se using myresults.dta
quietly forval x = 2000(200)4800 {
xi: regress price mpg length gear_ratio i.foreign if weight < `x'
post `myresults' (`x') (`=_b[_cons]') (`=_b[mpg]') (`=_se[mpg]')
}
postclose `myresults'
use myresults
list
+---------------------------------------------+
| thresh~d intercept gradient se |
|---------------------------------------------|
1. | 2000 -3699.55 -296.8218 215.0348 |
2. | 2200 -4175.722 -53.19774 54.51251 |
3. | 2400 -3918.388 -58.83933 42.19707 |
4. | 2600 -6143.622 -58.20153 38.28178 |
5. | 2800 -11159.67 -49.21381 44.82019 |
|---------------------------------------------|
6. | 3000 -6636.524 -51.28141 52.96473 |
7. | 3200 -7410.392 -58.14692 60.55182 |
8. | 3400 -2193.125 -57.89508 52.78178 |
9. | 3600 -1824.281 -103.4387 56.49762 |
10. | 3800 -1192.767 -110.9302 51.6335 |
|---------------------------------------------|
11. | 4000 5649.41 -173.9975 74.51212 |
12. | 4200 5784.363 -147.4454 71.89362 |
13. | 4400 6494.47 -93.81158 80.81586 |
14. | 4600 6494.47 -93.81158 80.81586 |
15. | 4800 5373.041 -95.25342 82.60246 |
+---------------------------------------------+
statsby (a command, not a function) is just not designed for this problem at all, so it is not a question of whether it works well.
I would suggest you look at help postfile for an example of how to aggregate the results. I agree that statsby may not be the best approach. Evaluating the interaction between mpg and weight on price may help address what would seem to be a classic question of interaction.
I have two 3D lines which lie on the same plane. line1 is defined by a point (x1, y1, z1) and its direction vector (a1, b1, c1) while line2 is defined by a point (x2, y2, z2) and its direction vector (a2, b2, c2). Then the parametric equations for both lines are
x = x1 + a1*t; x = x2 + a2*s;
y = y1 + b1*t; y = y2 + b2*s;
z = z1 + c1*t; z = z2 + c2*s;
If both direction vectors are nonzeros, we can find out the location of intersection node easily by equating the right-hand-side of the equations above and solving t and s from either two of the three. However, it's possible that a1 b1 c1 a2 b2 c2 are not all nonzero so that I can't solve those equations in the same way. My current thought is to deal with this issue case by case, like
case1: a1 = 0, others are nonzero
case2: a2 = 0, others are nonzero
case3: b1 = 0, others are nonzero
...
However, there are so many cases in total and the implementation would become messy. Is there any good ways to tackle this problem? Any reference? Thanks a lot!
It is much more practical to see this as a vector equation. A dot . is a scalar product and A,n,B,m are vectors describing the lines. Point A is on the first line of direction n. Directions are normalized : n.n=1 and m.m=1. The point of intersection C is such that :
C=A+nt=B+ms
where t and s are scalar parameters to be computed.
Therefore (.n) :
A.n+ t=B.n+m.n s
t= (B-A).n+m.n s
And (.m):
A.m+n.m t=B.m+ s
A.m+n.m (B-A).n+(m.n)^2 s=B.m+ s
n.m(B-A).n+(A-B).m=(1-(m.n)^2).s
Since n.n=m.m=1 and n and m are not aligned, (m.n)^2<1 :
s=[n.m(B-A).n+(A-B).m]/[1-(m.n)^2]
t= (B-A).n+m.n s
You can solve this as a linear system:
| 1 0 0 -a1 0 | | x | | x1 |
| 0 1 0 -b1 0 | | y | | y1 |
| 0 0 1 -c1 0 | | z | = | z1 |
| 1 0 0 0 -a2 | | s | | x2 |
| 0 1 0 0 -b2 | | t | | y2 |
| 0 0 1 0 -c2 | | z2 |
x y z is the intersection point, and s t are the coefficients of the vectors. This solves the same equation that #francis wrote, with the advantage that it also obtains the solution that minimizes the error in case your data are not perfect.
This equation is usually expressed as Ax=b, and can be solved by doing x = A^(-1) * b, where A^(-1) is the pseudo-inverse of A. All the linear algebra libraries implement some function to solve systems like this, so don't worry.
It might be vital to remember that calculations are never exact, and small deviations in your constants and calculations can make your lines not exactly intersect.
Therefore, let's solve a more general problem - find the values of t and s for which the distance between the corresponding points in the lines is minimal. This is clearly a task for calculus, and it's easy (because linear functions are the easiest ones in calculus).
So the points are
[xyz1]+[abc1]*t
and
[xyz2]+[abc2]*s
(here [xyz1] is a 3-vector [x1, y1, z1] and so on)
The (square of) the distance between them:
([abc1]*t - [abc2]*s + [xyz1]-[xyz2])^2
(here ^2 is a scalar product of a 3-vector with itself)
Let's find a derivative of this with respect to t:
[abc1] * ([abc1]*t - [abc2]*s + [xyz1]-[xyz2]) (multiplied by 2, but this doesn't matter)
(here the first * is a scalar product, and the other *s are regular multiplications between a vector and a number)
The derivative should be equal to zero at the minimum point:
[abc1] * ([abc1]*t - [abc2]*s + [xyz1]-[xyz2]) = 0
Let's use the derivative with respect to s too - we want it to be zero too.
[abc1]*[abc1]*t - [abc1]*[abc2]*s = -[abc1]*([xyz1]-[xyz2])
-[abc2]*[abc1]*t + [abc2]*[abc2]*s = [abc2]*([xyz1]-[xyz2])
From here, let's find t and s.
Then, let's find the two points that correspond to these t and s. If all calculations were ideal, these points would coincide. However, at this point you are practically guaranteed to get some small deviations, so take and of these points as your result (intersection of the two lines).
It might be better to take the average of these points, to make the result symmetrical.
For a tree layout that takes benefit of cache line prefetching (reading _next_ cacheline is cheap), I need to solve the address calculation in a really fast way. I was able to boil down the problem to:
newIndex = nowIndex + 1 + (localChildIndex*X)
x would be for example: X = 45 + 44 + 43 + 42 +40.
Note: 4 is the branching factor. In reality it will be 16, so a power of 2. This should be useful to use bitwise stuff?
It would be very bad if it would need a loop to calculate X (performancewise) and stuff like division/multiplication. This appeals to be an interesting problem which I wasn’t able to come up with some nice way of computing it.
Since its part of a tree traversal, 2 modes would be possible: absolute calculation, independent of prior calculations AND incremental calculation which starts with a high X being kept in a variable and then some minimal stuff done to it with every deeper level of the tree.
I hope I was able to make clear what the math should do. Not sure if there is a way to do this fast & without loop - but maybe somebody can come up with a really smart solution. I would like to thank everybody for their help - StackOverflow have been a great teacher to me in the past and I hope to be able to give back more in the future, as my knowledge increases.
I'll answer this in increasing complexity and generality.
If x is fixed to 16 then just use a constant value 1118481. Hooray! (Name it, using magical numbers is bad practice)
If you have a few cases known at compile time use a few constants or even defines, for example:
#define X_2 63
#define X_4 1365
#define X_8 37449
#define X_16 1118481
...
If you have several cases known at execution time initialize and use a lookup table indexed with the exponent.
int _X[MAX_EXPONENT]; // note: give it a more meaningful name :)
Initialize it and then just access with the known exponent of 2^exp at execution time.
newIndex = nowIndex + 1 + (localChildIndex*_X[exp]);
How are these values precalculated, or how to calculate them efficiently on the fly:
The sum X = x^n + x^(n - 1) + ... + x^1 + x^0 is a geometric serie and its finite sum is:
X = x^n + x^(n - 1) + ... + x^1 + x^0 = (1-x^(n + 1))/(1-x)
About the bitwise operations, as Oli Charlesworth has stated if x is a power of 2 (in binary 0..010..0) x^n is also a power of 2, and the sum of different powers of two is equivalent to the OR operation. Thus we could make an expression like:
Let exp be the exponent so that x = 2^exp. (For 16, exp = 4). Then,
X = x^5 + ... + x^1 + x^0
X = (2^exp)^5 + ... + (2^exp)^1 + 1
X = 2^(exp*5) + ... + 2^(exp*1) + 1
now using bitwise, 2^n = 1<<n
X = 1<<(exp*5) | ... | 1<<exp | 1
In C:
int X;
int exp = 4; //for x == 16
X = 1 << (exp*5) | 1 << (exp*4) | 1 << (exp*3) | 1 << (exp*2) | 1 << (exp*1) | 1;
And finally, I can't resist to say: if your expression were more complex and you had to evaluate an arbitrary polynomial a_n*x^n + ... + a_1*x^1 + a_0 in x, instead of implementing the obvious loop, a faster way to evaluate the polynomial is using the Horner's rule.
I've been looking for a while onto websearch, however, possibly or probably I am missing the right terminology.
I have arbitrary sized arrays of scalars ...
array = [n_0, n_1, n_2, ..., n_m]
I also have a function f->x->y, with 0<=x<=1, and y an interpolated value from array. Examples:
array = [1,2,9]
f(0) = 1
f(0.5) = 2
f(1) = 9
f(0.75) = 5.5
My problem is that I want to compute the average value for some interval r = [a..b], where a E [0..1] and b E [0..1], i.e. I want to generalize my interpolation function f->x->y to compute the average along r.
My mind boggles me slightly w.r.t. finding the right weighting. Imagine I want to compute f([0.2,0.8]):
array --> 1 | 2 | 9
[0..1] --> 0.00 0.25 0.50 0.75 1.00
[0.2,0.8] --> ^___________________^
The latter being the range of values I want to compute the average of.
Would it be mathematically correct to compute the average like this?: *
1 * (1-0.8) <- 0.2 'translated' to [0..0.25]
+ 2 * 1
avg = + 9 * 0.2 <- 0.8 'translated' to [0.75..1]
----------
1.4 <-- the sum of weights
This looks correct.
In your example, your interval's length is 0.6. In that interval, your number 2 is taking up (0.75-0.25)/0.6 = 0.5/0.6 = 10/12 of space. Your number 1 takes up (0.25-0.2)/0.6 = 0.05 = 1/12 of space, likewise your number 9.
This sums up to 10/12 + 1/12 + 1/12 = 1.
For better intuition, think about it like this: The problem is to determine how much space each array-element covers along an interval. The rest is just filling the machinery described in http://en.wikipedia.org/wiki/Weighted_average#Mathematical_definition .
This seems to be the #1 thing that is asked when dealing with Remainder/Mod, and I'm kind of hitting a wall with it. I'm teaching myself to program with a textbook and a chuck of C code.
Seeing as I don't really have an instructor to say, "No, no. It actually works like this", I thought I'd try my hand here. I haven't found a conclusive answer to the mathematical part of this, though.
So... I'm under the impression that this is a pretty rare occurrence, but I'd still like to know what it is that happens underneath the shiny compiling. Plus, this textbook would like for me to supply all values that are possible when using negative remainders, per the C89 Standard. Would it be much to ask if someone could check to see if this math is sound?
1) 9%4
9 - (2) * 4 = 1 //this is a value based on x - (x/y) * y
(2) * 4 + (1) = 9 //this is a check based on (x/y) * y + (x%y) = x
2) -9%4
9 - (2) * 4 = 1; 9 - (3) * 4 = -3 //these are the possible values
(2) * 4 + (1) = 9; (3) * 4 + (-3) = 9 //these are the checks
3) 9%-4
Same values as #2??
I tried computing with negatives in the expressions, and came up with ridiculous things such as 17 and -33. Are they 1 and -3 for #3 as well??
4) -9%-4
Same as #1??
In algebraic division, negative signs "cancel". Do they do the same here, or is there something else going on?
I think the thing that gets me confused the most is the negatives. The way I learned algebra in school (5-6 years ago), they are "attached" to their numbers. In programming, since they are unary operators, is that not so? Example: When filling in the value for x on #2, x = 9 instead of x = -9.
I sincerely appreciate any help.
Here you need the mathematical definition on remainder.
Given two integer numbers m, d, we say that r is the remainder of the division of m and d if r satisfies two conditions:
There exists another integer k such that m == k * d + r , and
0 <= r < d.
For positive numbers, in C, we have m % d == r and m / d == k, just by following the definition above.
From the definition, it can be obtainded that 3 % 2 == 1 and 3 / 2 == 1.
Other examples:
4 / 3 == 1 and 5 / 3 == 1, in despite of 5.0/3.0 == 1.6666 (which
would round to 2.0).
4 % 3 == 1 and 5 % 3 == 2.
You can trust also in the formula r = m - k * d, which in C is written as:
m % d == m - (m / d) * d
However, in the standard C, the integer division follows the rule: round to 0.
Thus, with negative operands C offer different results that the mathematical ones.
We would have:
(-4) / 3 == -1, (-4) % 3 == -1 (in C), but in plain maths: (-4) / 3 = -2, (-4) % 3 = 2.
In plain maths, the remainder is always nonnegative, and less than the abs(d).
In standard C, the remainder always has the sign of the first operand.
+-----------------------+
| m | d | / | % |
+-----+-----+-----+-----+
| 4 | 3 | 1 | 1 |
+-----+-----+-----+-----+
| -4 | 3 | -1 | -1 |
+-----+-----+-----+-----+
| 4 | -3 | -1 | 1 |
+-----+-----+-----+-----+
| -4 | -3 | 1 | -1 |
+-----------------------+
Remark: This description (in the negative case) is for standard C99/C11 only. You must be carefull with your compiler version, and do some tests.
Like Barmar's linked answer says modulus in a mathematical sense means that numbers are the same class for a ring (my algebra theory is a bit rusty so sorry the terms might be a bit loosely used:)).
So modulus 5 means that you have a ring of size 5. i.e. 0,1,2,3,4 when you add 1 to 4 you are back at zero. so -9,-4,1,6,11,16 are all the same modulo 5 because they are all equivalent. This is actually very important for various algebra theorems but for normal programmers it's pretty much useless.
Basically the standards were unspecified so the modulus returned for negative numbers just has to be one of those equivalent classes of numbers. It's not a remainder. Your best bet in situations like this is to operate on absolute values when doing modulo operators if you want basic integer division. If you are using more advanced techniques (like public key encryption) you'll probably need to brush up on your math a little more.
For now I'd say still with positive ints in this case and have fun programming something interesting.