Inverse matrix calculation in real time - c

I have been developing a C language control software working in real time. The software implements among others discrete state space observer of the controlled system. For implementation of the observer it is necessary to calculate inverse of the matrix with 4x4 dimensions. The inverse matrix calculation has to be done each 50 microseconds and it is worthwhile to say that during this time period also other pretty time consuming calculation will be done. So the inverse matrix calculation has to consume much less than 50 microseconds. It is also necessary to say that the DSP used does not have ALU with floating point operations support.
I have been looking for some efficient way how to do that. One idea which I have is to prepare general formula for calculation the determinant of the matrix 4x4 and general formula for calculation the adjoint matrix of the 4x4 matrix and then calculate the inverse matrix according to below given formula.
What do you think about this approach?

As I understand the consensus among those who study numerical linear algebra, the advice is to avoid computing matrix inverses unnecessarily. For example if the inverse of A appears in your controller only in expressions such as
z = inv(A)*y
then it is better (faster, more accurate) to solve for z the equation
A*z = y
than to compute inv(A) and then multiply y by inv(A).
A common method to solve such equations is to factorize A into simpler parts. For example if A is (strictly) positive definite then the cholesky factorization finds lower triangular matrix L so that
A = L*L'
Given that we can solve A*z=y for z via:
solve L*u = y for u
solve L'*z = u for z
and each of these is easy given the triangular nature of L
Another factorization (that again only applies to positive definite matrices) is the LDL which in your case may be easier as it does not involve square roots. It is described in the wiki article linked above.
More general factorizations include the LUD and QR These are more general in that they can be applied to any (invertible) matrix, but are somewhat slower than cholesky.
Such factorisations can also be used to compute inverses.
To be pedantic describing adj(A) in your post as the adjoint is, perhaps, a little old fashioned; I thing adjugate or adjunct is more modern. In any case adj(A) is not the transpose. Rather the (i,j) element of adj(A) is, up to a sign, the determinant of the matrix obtained from A by deleting the i'th row and j'th column. It is awkward to compute this efficiently.

Related

How to obtain the derivative of Rodrigues vector and perform update in nonlinear least square?

I am now interested in the bundle adjustment in SLAM, where the Rodrigues vectors $R$ of dimension 3 are used as part of variables. Assume, without loss of generality, we use Gauss-Newton method to solve it, then in each step we need to solve the following linear least square problem:
$$J(x_k)\Delta x = -F(x_k),$$
where $J$ is the Jacobi of $F$.
Here I am wondering how to calculate the derivative $\frac{\partial F}{\partial R}$. Is it just like the ordinary Jacobi in mathematic analysis? I have this wondering because when I look for papers, I find many other concepts like exponential map, quaternions, Lie group and Lie algebra. So I suspect if there is any misunderstanding.
This is not an answer, but is too long for a comment.
I think you need to give more information about how the Rodrigues vector appears in your F.
First off, is the vector assumed to be of unit length.? If so that presents some difficulties as now it doesn't have 3 independent components. If you know that the vector will lie in some region (eg that it's z component will always be positive), you can work round this.
If instead the vector is normalised before use, then while you could then compute the derivatives, the resulting Jacobian will be singular.
Another approach is to use the length of the vector as the angle through which you rotate. However this means you need a special case to get a rotation through 0, and the resulting function is not differentiable at 0. Of course if this can never occur, you may be ok.

Optimal combination of “baskets” to best approximate target basket?

Say I have some array of length n where arr[k] represents how much of object k I want. I also have some arbitrary number of arrays which I can sum integer multiples of in any combination - my goal being to minimise the sum of the absolute differences across each element.
So as a dumb example if my target was [2,1] and my options were A = [2,3] and B = [0,1], then I could take A - 2B and have a cost of 0
I’m wondering if there is an efficient algorithm for approximating something like this? It has a weird knapsack-y flavour to is it maybe just intractable for large n? It doesn’t seem very amenable to DP methods
This is the (NP-hard) closest vector problem. There's an algorithm due to Fincke and Pohst ("Improved methods for calculating vectors of short length in a lattice, including a complexity analysis"), but I haven't personally worked with it.

computing function of neighbors efficiently on lattice

I'm studying the Ising model, and I'm trying to efficiently compute a function H(σ) where σ is the current state of an LxL lattice (that is, σ_ij ∈ {+1, -1} for i,j ∈ {1,2,...,L}). To compute H for a particular σ, I need to perform the following calculation:
where ⟨i j⟩ indicates that sites σ_i and σ_j are nearest neighbors and (suppose) J is a constant.
A couple of questions:
Should I store my state σ as an LxL matrix or as an L2 list? Is one better than the other for memory accessing in RAM (which I guess depends on the way I'm accessing elements...)?
In either case, how can I best compute H?
Really I think this boils down to how can I access (and manipulate) the neighbors of every state most efficiently.
Some thoughts:
I see that if I loop through each element in the list or matrix that I'll be double counting, so is there a "best" way to return the unique neighbors?
Is there a better data structure that I'm not thinking of?
Your question is a bit broad and a bit confusing for me, so excuse me if my answer is not the one you are looking for, but I hope it will help (a bit).
An array is faster than a list when it comes to indexing. A matrix is a 2D array, like this for example (where N and M are both L for you):
That means that you first access a[i] and then a[i][j].
However, you can avoid this double access, by emulating a 2D array with a 1D array. In that case, if you want to access element a[i][j] in your matrix, you would now do, a[i * L + j].
That way you load once, but you multiply and add your variables, but this may still be faster in some cases.
Now as for the Nearest Neighbor question, it seems that you are using a square-lattice Ising model, which means that you are working in 2 dimensions.
A very efficient data structure for Nearest Neighbor Search in low dimensions is the kd-tree. The construction of that tree takes O(nlogn), where n is the size of your dataset.
Now you should think if it's worth it to build such a data structure.
PS: There is a plethora of libraries implementing the kd-tree, such as CGAL.
I encountered this problem during one of my school assignments and I think the solution depends on which programming language you are using.
In terms of efficiency, there is no better way than to write a for loop to sum neighbours(which are actually the set of 4 points{ (i+/-1,j+/-1)} for a given (i,j). However, when simd(sse etc) functions are available, you can re-express this as a convolution with a 2d kernel {0 1 0;1 0 1;0 1 0}. so if you use a numerical library which exploits simd functions you can obtain significant performance increase. You can see the example implementation of this here(https://github.com/zawlin/cs5340/blob/master/a1_code/denoiseIsingGibbs.py) .
Note that in this case, the performance improvement is huge because to evaluate it in python I need to write an expensive for loop.
In terms of work, there is in fact some waste as the unecessary multiplications and sum with zeros at corners and centers. So whether you can experience performance improvement depends quite a bit on your programming environment( if you are already in c/c++, it can be difficult and you need to use mkl etc to obtain good improvement)

How to remove apparent redundency in numpy vector operations?

New to python and not sure about efficiency issues here. For vectors x, y, and z that represent the coordinates of n particles I can do the following computation
import numpy as np
X=np.subtract.outer(x,x)
Y=np.subtract.outer(y,y)
Z=np.subtract.outer(z,z)
R=np.sqrt(X**2+Y**2+Z**2)
A=X/R
np.fill_diagonal(A,0)
a=np.sum(A,axis=0)
With this calculation there is about a factor of 2 in redundancy in so far as multiplications and divisions go as the diagonals are not needed and the lower diagonal is just the negative of the upper diagonal. I plan to use this kind of computation inside a function call that is used by odeint - i.e. it would be called a lot and the vectors will be large - as large as my computer will handle. To remove it, naively I would end up doing a for loop which presumably is a stupid thing to do. Can I get rid of this redundancy in a vectorized way or is it even worth the effort?
Update: Based on the suggestions below, the only way I could see to improve was
ut=np.triu_indices(n,1)
X=x[ut[0]]-x[ut[1]]
With similar expressions for Y and Z and using pdist to find R. This construction only calculates the upper triangular part. Looking at the source code for pdist I am not convinced it does anything particularly smart so I think my expression above would be equally good. The use of squareform only produces the symmetric form. For the antisymmetric may as well use
B=np.zeros((n,n),dtype=np.float64)
B(ut[0],ut[1])=A
B=B-B.T
This cannot be slower than square form because this is pretty much exactly what squareform does. Since the function is called often it would seem to me that ut should be made static along with storage for others (X,Y,Z,A,B). However being new to python I'm not sure how that is done.

complex conjugate transpose matlab to C

I am trying to convert the following matlab code to c
cX = (fft(inData, fftSize) / fftSize);
Power_X = (cX*cX')/50;
Questions:
why divide the results of fft (array of fftSize complex elements) by
fftSize?
I'm not sure at all how to convert the complex conjugate
transform to c, I just don't understand what that line does.
Peter
1. Why divide the results of fft (array of fftSize complex elements) by fftSize?
Because the "energy" (sum of squares) of the resultant fft grows as the number of points in the fft grows. Dividing by the number of points "N" normalizes it so that the sum of squares of the fft is equal to the sum of squares of the original signal.
2. I'm not sure at all how to convert the complex conjugate transform to c, I just don't understand what that line does.
That is what is actually calculating the sum of the squares. It is easy to verify cX*cX' = sum(abs(cX)^2), where cX' is the conjugate transpose.
Ideally a Discrete Fourier Transform (DFT) is purely a rotation, in that it returns the same vector in a different coordinate system (i.e., it describes the same signal in terms of frequencies instead of in terms of sound volumes at sampling times). However, the way the DFT is usually implemented as a Fast Fourier Transform (FFT), the values are added together in various ways that require multiplying by 1/N to keep the scale unchanged.
Often, these multiplications are omitted from the FFT to save computing time and because many applications are unconcerned with scale changes. The resulting FFT data still contains the desired data and relationships regardless of scale, so omitting the multiplications does not cause any problems. Additionally, the correcting multiplications can sometimes be combined with other operations in an application, so there is no point in performing them separately. (E.g., if an application performs an FFT, does some manipulations, and performs an inverse FFT, then the combined multiplications can be performed once during the process instead of once in the FFT and once in the inverse FFT.)
I am not familiar with Matlab syntax, but, if Stuart’s answer is correct that cX*cX' is computing the sum of the squares of the magnitudes of the values in the array, then I do not see the point of performing the FFT. You should be able to calculate the total energy in the same way directly from iData; the transform is just a coordinate transform that does not change energy, except for the scaling described above.

Resources