Texas Instrument CLA float inverse trick - What is its purpose

Texas Instrument CLA float inverse trick - What is its purpose - c

I have this piece of code written by somebody else that runs on a TI TMS320 Command Law Accelerator. So it's optimized in size and speed.
In order to get 1/x, the code always does something like this.
float32 y = __meinvf32(x);
y = y * (2.0f - y*x);
y = y * (2.0f - y*x);
I found this thread that propose something similar, but in my case, there is no clamping at the end.
Can seomebody help me understand what is the intent behind this?

Isaac Newton figured this out.
__meinvf32(x) gives an approximation of 1/x, say 1/x • (1+e), where e is some small relative error.
Let y = 1/x • (1+e). Then, when we calculate y • (2 − y•x), we have:
y • (2 − y•x) =
(1/x • (1+e)) • (2 − (1/x • (1+e))•x) =
1/x • (1+e) • (2 − (1+e)) =
1/x • (2 + 2e − (1+e) − e(1+e)) =
1/x • (2 + 2e − 1 − e − e − e2) =
1/x • (1 − e2).
Since e is small, e2 is even smaller. Thus, by calculating y • (2 − y•x) we get an estimate of 1/x that is closer than before; the relative error is only −e2 instead of e. Repeating this improves the estimate again (up to the limits of floating-point precision).
With some knowledge of bounds on the initial e, we can calculate how many repetitions are needed to get the estimate as close as desired to the correct result.

y = e + 1/x where e is some small error.
So (2.0 - y*x) is close to 1.0 and has the effect of reducing e on each pass.

Related

Check subset sum for special array equation

I was trying to solve the following problem.
We are given N and A[0]
N <= 5000
A[0] <= 10^6 and even
if i is odd then
A[i] >= 3 * A[i-1]
if i is even
A[i]= 2 * A[i-1] + 3 * A[i-2]
element at odd index must be odd and at even it must be even.
We need to minimize the sum of the array.
and We are given a Q numbers
Q <= 1000
X<= 10^18
We need to determine is it possible to get subset-sum = X from our array.
What I have tried,
Creating a minimum sum array is easy. Just follow the equations and constraints.
The approach that I know for subset-sum is dynamic programming which has time complexity sum*sizeof(Array) but since sum can be as large as 10^18 that approach won't work.
Is there any equation relation that I am missing?

We can make it with a bit of math:
sorry for latex I am not sure it is possible on stack?
let X_n be the sequence (same as being defined by your A)
I assume X_0 is positive.
Thus sequence is strictly increasing and minimization occurs when X_{2n+1} = 3X_{2n}
We can compute the general term of X_{2n} and X_{2n+1}
v_0 =
X0
X1
v_1 =
X1
X2
the relation between v_0 and v_1 is
M_a =
0 1
3 2
the relation between v_1 and v_2 is
M_b =
0 1
0 3
hence the relation between v_2 and v_0 is
M = M_bM_a =
3 2
9 6
we deduce
v_{2n} =
X_{2n}
X_{2n+1}
v_{2n} = M^n v_0
Follow the classical diagonalization... and we (unless mistaken) get
X_{2n} = 9^n/3 X_0 + 2*9^{n-1}X_1
X_{2n+1} = 9^n X_0 + 2*9^{n-1}/3X_1
recall that X_1 = 3X_0 thus
X_{2n} = 9^n X_0
X_{2n+1} = 3.9^n X_0
Now if we represent the sum we want to check in base 9 we get
9^{n+1} 9^n
___ ________ ___ ___
X^{2n+2} X^2n
In the X^{2n} places we can only put a 1 or a 0 (that means we take the 2n-th elem from the A)
we may also put a 3 in the place of the X^{2n} place which means we selected the 2n+1th elem from the array
so we just have to decompose number in base 9, and check whether all its digits or either 0,1 or 3 (and also if its leading digit is not out of bound of our array....)

Variable elimination in Bayes Net on a node with single child

If we have a node X, that has a child Y in a Bayes net, why is it correct to express P(Y) as P(Y|X)P(X)? Does it then follow that X is a necessary condition for Y?

Bayes Networks
An edge in a Bayes Network means the variable is conditionally dependent. If nodes are not connected by any path, they are conditionally independent.
Having a node X with a child Y means you need to learn:
Given X is True, what is the probability of Y being True?
Given X is False, what is the probability of Y being True?
More generally: If X can have n values and Y can have m values, then you have to learn n * (m - 1) values. The - 1 is there because the probabilities need to sum up to 1.
Example
Let's stick with the simple case that both variables are binary and use the following from Wikipedia:
Say X is RAIN and Y is SPRINKLER. You want to express Y (SPRINKLER) in terms of X (RAIN).
The Bayes theorem states:
P(Y|X) = P(X|Y) * P(Y) / P(X)
<=> P(Y) = P(Y | X) * P(X) / P(X | Y)
Now we apply the Law of total probability for X. This means, for X we simply go through all possible values:
P(Y) = P(Y | X = true) * P(X = true) +
P(Y | X = false) * P(X = false)
I guess this is what you refer to. P(X=true | Y) = 1, because the X=true means we already know that X=true happened. It doesn't matter what Y is.
To continue our case, we now look up the values in the tables (X is RAIN, Y is SPRINKLER):
P(Y) = 0.01 * 0.2 + 0.4 * 0.8
= 0.322

How to find all possible options in C?

I'm trying to find a efficient algorithm in C, which provides me all options of a given equation.
I have equation AX + BY = M, where A, B and M i got on input (scanf).
For example lets have: 5X + 10Y = 45
1st option: 5 * 9 + 10 * 0
2nd option: 5 * 7 + 10 * 1
n-th option: 5 * 1 +
10 * 4
And also I need to count how many possible options exist?
Some tips, hints?
I forgot to say that X and Y are in Z and >= 0, so there is no infinite options.

The question makes sense if you restrict to non-negative unknowns.
Rewrite the equation as
AX = M - BY.
There can be positive solutions as long as the RHS is positive, i.e.
BY ≤ M,
or
Y ≤ M/B.
Then for a given Y, there is a solution iff
A|(M - BY)
You can code this in Python as
for Y in range(M / B + 1):
if (M - B * Y) % A == 0:
X= (M - B * Y) / A
The solutions are
9 0
7 1
5 2
3 3
1 4
The number of iterations equals M / B. If A > B, it is better to swap X and Y.

you can calcule every solution if you put some limit in your input value, for example: use X and Y in a value included from 0 to 9... in this way you can use for to calculate every solution.

The number of solution is infinite:
find a first solution like: X=9, Y=0.
you can create another solution by using:
X' = X+2*p
Y' = Y-p
For any p in Z.
This proves your program will never terminate.

How to generate a multiplicative space vector in Matlab?

I am trying to generate "automatically" a vector 0.01, 0.03, 0.1, 0.3, 1, 3, 10, 30 (in multiplicative space).
I know linspace and logspace functions, but I couldn't find any similar function for multiplicative space.
Is there any? Otherwise, how to generate a vector like the one I need?

An easy way with bsxfun, also considering multiplication to smaller spaces:
x = [0.01,0.03,0.05] % initial vector, works for various lengths
n = 12; % times it should get multiplied in rising direction
m = 3; % times it should get multiplied in falling direction
Z = bsxfun( #times, x(:), 10.^(-m:n) )
Z = Z(:)
% if preferred, bulky one-liner:
% Z = reshape( bsxfun( #times, x(:), 10.^(-m:n) ) , 1 , [])
I assumed a multiplication with the multiplication vector, e.g.:
10.^(0:n) = 1 10 100 1000 10000 100000 ....
But custom vectors Y are also possible:
Z = bsxfun( #times, x(:), Y(:)' ) Z = Z(:)

A function that might help you achieving this in a very easy and compact way is the Kronecker tensor product kron.
You can use it to rewrite thewaywewalk's answer as:
v = [0.01;0.03;0.05]; % initial vector
emin = -3; % minimal exponent
emax = 12; % maximal exponent
Z = kron(10.^(emin:emax)',v(:))
which should give you the exact same result.

not very efficient but this will generate what you want. inputvec is your initial vector [0.01 0.03] in this case, multiplier is 10. length of the required string n is 8. n should be a multiple of nn (length of the input vector)
function newvec=multispace(n,inputvec,multiplier)
nn=length(inputvec);
newvec=zeros(1,n);
newvec(1:nn)=inputvec;
for i=1:n/nn-1
newvec(i*nn+1:(i+1)*nn)=(newvec((i-1)*nn+1:(i)*nn)).*multiplier;
end
end

Fastest computation of sum x^5 + x^4 + x^3...+x^0 (Bitwise possible ?) with x=16

For a tree layout that takes benefit of cache line prefetching (reading _next_ cacheline is cheap), I need to solve the address calculation in a really fast way. I was able to boil down the problem to:
newIndex = nowIndex + 1 + (localChildIndex*X)
x would be for example: X = 45 + 44 + 43 + 42 +40.
Note: 4 is the branching factor. In reality it will be 16, so a power of 2. This should be useful to use bitwise stuff?
It would be very bad if it would need a loop to calculate X (performancewise) and stuff like division/multiplication. This appeals to be an interesting problem which I wasn’t able to come up with some nice way of computing it.
Since its part of a tree traversal, 2 modes would be possible: absolute calculation, independent of prior calculations AND incremental calculation which starts with a high X being kept in a variable and then some minimal stuff done to it with every deeper level of the tree.
I hope I was able to make clear what the math should do. Not sure if there is a way to do this fast & without loop - but maybe somebody can come up with a really smart solution. I would like to thank everybody for their help - StackOverflow have been a great teacher to me in the past and I hope to be able to give back more in the future, as my knowledge increases.

I'll answer this in increasing complexity and generality.
If x is fixed to 16 then just use a constant value 1118481. Hooray! (Name it, using magical numbers is bad practice)
If you have a few cases known at compile time use a few constants or even defines, for example:
#define X_2 63
#define X_4 1365
#define X_8 37449
#define X_16 1118481
...
If you have several cases known at execution time initialize and use a lookup table indexed with the exponent.
int _X[MAX_EXPONENT]; // note: give it a more meaningful name :)
Initialize it and then just access with the known exponent of 2^exp at execution time.
newIndex = nowIndex + 1 + (localChildIndex*_X[exp]);
How are these values precalculated, or how to calculate them efficiently on the fly:
The sum X = x^n + x^(n - 1) + ... + x^1 + x^0 is a geometric serie and its finite sum is:
X = x^n + x^(n - 1) + ... + x^1 + x^0 = (1-x^(n + 1))/(1-x)
About the bitwise operations, as Oli Charlesworth has stated if x is a power of 2 (in binary 0..010..0) x^n is also a power of 2, and the sum of different powers of two is equivalent to the OR operation. Thus we could make an expression like:
Let exp be the exponent so that x = 2^exp. (For 16, exp = 4). Then,
X = x^5 + ... + x^1 + x^0
X = (2^exp)^5 + ... + (2^exp)^1 + 1
X = 2^(exp*5) + ... + 2^(exp*1) + 1
now using bitwise, 2^n = 1<<n
X = 1<<(exp*5) | ... | 1<<exp | 1
In C:
int X;
int exp = 4; //for x == 16
X = 1 << (exp*5) | 1 << (exp*4) | 1 << (exp*3) | 1 << (exp*2) | 1 << (exp*1) | 1;
And finally, I can't resist to say: if your expression were more complex and you had to evaluate an arbitrary polynomial a_n*x^n + ... + a_1*x^1 + a_0 in x, instead of implementing the obvious loop, a faster way to evaluate the polynomial is using the Horner's rule.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Texas Instrument CLA float inverse trick - What is its purpose - c

y = e + 1/x where e is some small error. So (2.0 - y*x) is close to 1.0 and has the effect of reducing e on each pass.

Related

Check subset sum for special array equation

Variable elimination in Bayes Net on a node with single child

How to find all possible options in C?

How to generate a multiplicative space vector in Matlab?

Fastest computation of sum x^5 + x^4 + x^3...+x^0 (Bitwise possible ?) with x=16

Categories

Resources