GSL eigenvalues order - c

I am using the functions gsl_eigen_nonsymm and/or gsl_eigen_symm from the GSL library to find the eigenvalues of an L x L matrix M[i][j] which is also a function of time t = 1,....,N so i have M[i][j][t] to get the eigenvalues for every t i allocate an L x L matrix E[i][j] = M[i][j][t] and diagonalize it for every t.
The problem is that the program gives the eigenvalues at different order after some iteration. For example (L = 3) if at t = 0 i get eigen[t = 0] = {l1,l2,l3}(0) at t = 1 i may get eigen[t = 1] = {l3,l2,l1}(1) while i need to always have {l1,l2,l3}(t)
To be more concrete: consider the matrix M (t) ) = {{0,t,t},{t,0,2t},{t,t,0}} the eigenvalues will always be (approximatevly) l1 = -1.3 t , l2 = -t , l3 = 2.3 t When i tried to diagonalize it (with the code below) i got several times a swap in the result of the eigenvalues. Is there a way to prevent it? I can't just sort them by magnitude i need them to be always in the same order (whatever it is) a priori. (the code below is just an example to elucidate my problem)
EDIT: I can't just sort them because a priori I don't know their value nor if they reliably have a structure like l1<l2<l3 at every time due to statistical fluctuations, that's why i wanted to know if there is a way to make the algorithm behave always in the same way so that the order of the eigenvalues is always the same or if there is some trick to make it happen.
Just to be clearer I'll try to re-describe the toy problem I presented here. We have a matrix that depends on time, I, maybe naively, expected to just get lambda_1(t).....lambda_N(t), instead what I see is that the algorithm often swaps the eigenvalues at different times, so if at t = 1 I've got ( lambda_1,lambda_2,lambda_3 )(1) at time t = 2 (lambda_2,lambda_1,lambda_3)(2) so if for instance I wanted to see how lambda_1 evolves in time I can't because the algorithm mixes the eigenvalues at different times. The program below is just an analytical-toy example of my problem: The eigenvalues of the matrix below are l1 = -1.3 t , l2 = -t , l3 = 2.3 t but the program may give me as an output(-1.3,-1,2.3)(1), (-2,-2.6,4.6)(2), etc As previously stated, I was wondering then if there is a way to make the program order the eigenvalues always in the same way despite of their actual numerical value, so that i always get the (l1,l2,l3) combination. I hope it is more clear now, please tell me if it is not.
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <gsl/gsl_linalg.h>
#include <gsl/gsl_eigen.h>
#include <gsl/gsl_sort_vector.h>
main() {
int L = 3, i, j, t;
int N = 10;
double M[L][L][N];
gsl_matrix *E = gsl_matrix_alloc(L, L);
gsl_vector_complex *eigen = gsl_vector_complex_alloc(L);
gsl_eigen_nonsymm_workspace * w = gsl_eigen_nonsymm_alloc(L);
for(t = 1; t <= N; t++) {
M[0][0][t-1] = 0;
M[0][1][t-1] = t;
M[0][2][t-1] = t;
M[1][0][t-1] = t;
M[1][1][t-1] = 0;
M[1][2][t-1] = 2.0 * t;
M[2][1][t-1] = t;
M[2][0][t-1] = t;
M[2][2][t-1] = 0;
for(i = 0; i < L; i++) {
for(j = 0; j < L; j++) {
gsl_matrix_set(E, i, j, M[i][j][t - 1]);
}
}
gsl_eigen_nonsymm(E, eigen, w); /*diagonalize E which is M at t fixed*/
printf("#%d\n\n", t);
for(i = 0; i < L; i++) {
printf("%d\t%lf\n", i, GSL_REAL(gsl_vector_complex_get(eigen, i)))
}
printf("\n");
}
}

Your question makes zero sense. Eigenvalues do not have any inherent order to them. It sounds to me like you want to define eigenvalues of M_t something akin to L_1(M_t),..., L_n(M_t) and then track how they change in time. Assuming your process driving M_t is continuous, then so will your eigenvalues be. In other words they will not significantly change when you make small changes to M_t. So if you define an ordering by enforcing L_1 < L_2... < L_n, then this ordering will not change for small changes in t. When you have two eigenvalues cross, you'll need to make a decision about how to assign changes. If you have "random fluctuations" which are larger than the typical distance between your eigenvalues, then this becomes essentially impossible.
Here's another way of tracking eigenvectors, which might prove better. To do this, suppose that your eigenvectors are v_i, with components v_ij . What you do is first "normalize" your eigenvectors such that v_i1 is nonnegative, i.e. just flip the sign of each eigenvector appropriately. This will define an ordering on your eigenvalues through an ordering on v_i1, the first component of each eigenvector. This way you can still keep track of eigenvalues that cross over each other. However if your eigenvectors cross on the first component, you're in trouble.

I don't think you can do what you want. As t changes the output changes.
My original answer mentioned ordering on the pointers but looking at the data structure it won't help. When the eigenvalues have been computed the values are stored in E. You can see them as follows.
gsl_eigen_nonsymm(E, eigen, w);
double *mdata = (double*)E->data;
printf("mdata[%i] \t%lf\n", 0, mdata[0]);
printf("mdata[%i] \t%lf\n", 4, mdata[4]);
printf("mdata[%i] \t%lf\n", 8, mdata[8]);
The following code is how the the data in the eigenvector is layed out...
double *data = (double*)eigen->data;
for(i = 0; i < L; i++) {
printf("%d n \t%zu\n", i, eigen->size);
printf("%d \t%lf\n", i, GSL_REAL(gsl_vector_complex_get(eigen, i)));
printf("%d r \t%lf\n", i, data[0]);
printf("%d i \t%lf\n", i, data[1]);
printf("%d r \t%lf\n", i, data[2]);
printf("%d i \t%lf\n", i, data[3]);
printf("%d r \t%lf\n", i, data[4]);
printf("%d i \t%lf\n", i, data[5]);
}
If, and you can check this when you see the order change, the order of the data in mdata changes AND the order in data changes then the algorithm does not have a fixed order ie you cannot do what you're asking it to do. If the order does not change in mdata and it changes in data then you have a solution but I really doubt that will be the case.

According to the docs, those functions return unordered:
https://www.gnu.org/software/gsl/manual/html_node/Real-Symmetric-Matrices.html
This function computes the eigenvalues of the real symmetric matrix A. Additional workspace of the appropriate size must be provided in w. The diagonal and lower triangular part of A are destroyed during the computation, but the strict upper triangular part is not referenced. The eigenvalues are stored in the vector eval and are unordered.
Even the functions that return ordered results, do so by simple ascending/descending magnitude:
https://www.gnu.org/software/gsl/manual/html_node/Sorting-Eigenvalues-and-Eigenvectors.html
This function simultaneously sorts the eigenvalues stored in the vector eval and the corresponding real eigenvectors stored in the columns of the matrix evec into ascending or descending order according to the value of the parameter sort_type as shown above.
If you're looking for the time evolution of the eigenvalues, just do like you have been doing and solve for the time-dependent representations, e.g.:
lambda_1(t).....lambda_N(t)
For your simple time-as-scalar example,
l1 = -1.3 t , l2 = -t , l3 = 2.3 t
You literally have a parameterization of all possible solutions and because you've assigned them identifiers ln you don't run into the issue of degeneracy. Even if any M[i][j] are nonlinear functions of t, it shouldn't matter because the system itself is linear and solutions are computed purely by the characteristic equation (which will hold t constant while solving for lambda).

Related

Is there a point in transforming Sparse Matrix Multiplication into block form?

In an assignment for a parallel computing class we have been assigned to program Sparse Binary Matrix-Matrix multiplication (SpGEMM) in C. Julia has a relatively easy to follow implementation based on Gustavson's algorithm that works great.
Thing is we also need to do the multiplication in block form, which I already did, but I don't really see any speedup in doing so. From what I understand you're supposed to use
the result of A(i,k)*B(k,j), where (i,j) are coordinates in the block matrix, as a mask/filter for the next block multiplication in the sum C(i,j) = Σ( A(i,k)*B(k,j) ).
Julia's implementation though, which I followed, already has a dense boolean array when computing each row that acts as a "flag" for when not to add something again in the resulting matrix.
My question is, is there any merit in turning this into block matrix multiplication or is there something that I might be doing wrong myself.
Keep in mind my C code currently runs in half the time Matlab takes in multiplying a 5,000,000 x 5,000,000 sparse matrix. The blocked version, which I really tried to optimize and I'm also doing in the Gustavson order, gets slower and slower the smaller the block-size is set.
Here is my current code
//C=D+(A*B) (basically OR)
bool SpGEMM_dor(int *Acol, int *Arow, int An,
int *Bcol, int *Brow, int Bm,
int **Ccol, int *Crow, int *Csize,//output
int *Dcol, int *Drow)//previous
{
//printCSR(Arow,Acol,An,An,An);
int nnzcum=0;
bool *xb = calloc(An,sizeof(bool)); //boolean flag
for(int i=0; i<An; i++){
int nnzpv = nnzcum;//nnz of previous row;
Crow[i] = nnzcum;
if(nnzcum + An > *Csize){ //make sure theres enough space
*Csize += MAX(An, *Csize/4);
*Ccol = realloc(*Ccol,*Csize*sizeof(int));
}
//---OR---
//add previous row items in order to exist in the next block
for(int jj=Drow[i]; jj<Drow[i+1]; jj++){
int j = Dcol[jj];
xb[j] = true;
(*Ccol)[nnzcum] = j;
nnzcum++;
}
//--------
//add new row items
for(int jj=Arow[i]; jj<Arow[i+1]; jj++){
int j = Acol[jj];
for(int kp=Brow[j]; kp<Brow[j+1]; kp++){
int k = Bcol[kp];
if(!xb[k]){
xb[k] = true;
(*Ccol)[nnzcum] = k;
nnzcum++;
}
}
}
if(nnzcum > nnzpv){
quickSort(*Ccol,nnzpv,nnzcum-1);
for(int p=nnzpv; p<nnzcum; p++){
xb[ (*Ccol)[p] ] = false;
}
}
}
Crow[An] = nnzcum;
free(xb);
return Crow[An];
}
The part of code that I have inside of the ----OR---- section only happens in the block version in order to add the previous block to the now-calculating one. It basically does C = D+(A*B). I've also tried calculating the next block and then merging the 2 sorted arrays of each row of the 2 CSR matrices, which seems to be slower. Also all matrices are in CSR format.

Shuffle an array while making each index have the same probability to be in any index

I want to shuffle an array, and that each index will have the same probability to be in any other index (excluding itself).
I have this solution, only i find that always the last 2 indexes will always ne swapped with each other:
void Shuffle(int arr[]. size_t n)
{
int newIndx = 0;
int i = 0;
for(; i > n - 2; ++i)
{
newIndx = rand() % (n - 1);
if (newIndx >= i)
{
++newIndx;
}
swap(i, newIndx, arr);
}
}
but in the end it might be that some indexes will go back to their first place once again.
Any thoughts?
C lang.
A permutation (shuffle) where no element is in its original place is called a derangement.
Generating random derangements is harder than generating random permutations, can be done in linear time and space. (Generating a random permutation can be done in linear time and constant space.) Here are two possible algorithms.
The simplest solution to understand is a rejection strategy: do a Fisher-Yates shuffle, but if the shuffle attempts to put an element at its original spot, restart the shuffle. [Note 1]
Since the probability that a random shuffle is a derangement is approximately 1/e, the expected number of shuffles performed is about e (that is, 2.71828…). But since unsuccessful shuffles are restarted as soon as the first fixed point is encountered, the total number of shuffle steps is less than e times the array size for a detailed analysis, see this paper, which proves the expected number of random numbers needed by the algorithm to be around (e−1) times the number of elements.
In order to be able to do the check and restart, you need to keep an array of indices. The following little function produces a derangement of the indices from 0 to n-1; it is necessary to then apply the permutation to the original array.
/* n must be at least 2 for this to produce meaningful results */
void derange(size_t n, size_t ind[]) {
for (size_t i = 0; i < n; ++i) ind[i] = i;
swap(ind, 0, randint(1, n));
for (size_t i = 1; i < n; ++i) {
int r = randint(i, n);
swap(ind, i, r);
if (ind[i] == i) i = 0;
}
}
Here are the two functions used by that code:
void swap(int arr[], size_t i, size_t j) {
int t = arr[i]; arr[i] = arr[j]; arr[j] = t;
}
/* This is not the best possible implementation */
int randint(int low, int lim) {
return low + rand() % (lim - low);
}
The following function is based on the 2008 paper "Generating Random Derangements" by Conrado Martínez, Alois Panholzer and Helmut Prodinger, although I use a different mechanism to track cycles. Their algorithm uses a bit vector of size N but uses a rejection strategy in order to find an element which has not been marked. My algorithm uses an explicit vector of indices not yet operated on. The vector is also of size N, which is still O(N) space [Note 2]; since in practical applications, N will not be large, the difference is not IMHO significant. The benefit is that selecting the next element to use can be done with a single call to the random number generator. Again, this is not particularly significant since the expected number of rejections in the MP&P algorithm is very small. But it seems tidier to me.
The basis of the algorithms (both MP&P and mine) is the recursive procedure to produce a derangement. It is important to note that a derangement is necessarily the composition of some number of cycles where each cycle is of size greater than 1. (A cycle of size 1 is a fixed point.) Thus, a derangement of size N can be constructed from a smaller derangement using one of two mechanisms:
Produce a derangement of the N-1 elements other than element N, and add N to some cycle at any point in that cycle. To do so, randomly select any element j in the N-1 cycle and place N immediately after j in the j's cycle. This alternative covers all possibilities where N is in a cycle of size > 3.
Produce a derangement of N-2 of the N-1 elements other than N, and add a cycle of size 2 consisting of N and the element not selected from the smaller derangement. This alternative covers all possibilities where N is in a cycle of size 2.
If Dn is the number of derangements of size n, it is easy to see from the above recursion that:
Dn = (n−1)(Dn−1 + Dn−2)
The multiplier is n−1 in both cases: in the first alternative, it refers to the number of possible places N can be added, and in the second alternative to the number of possible ways to select n−2 elements of the recursive derangement.
Therefore, if we were to recursively produce a random derangement of size N, we would randomly select one of the N-1 previous elements, and then make a random boolean decision on whether to produce alternative 1 or alternative 2, weighted by the number of possible derangements in each case.
One advantage to this algorithm is that it can derange an arbitrary vector; there is no need to apply the permuted indices to the original vector as with the rejection algorithm.
As MP&P note, the recursive algorithm can just as easily be performed iteratively. This is quite clear in the case of alternative 2, since the new 2-cycle can be generated either before or after the recursion, so it might as well be done first and then the recursion is just a loop. But that is also true for alternative 1: we can make element N the successor in a cycle to a randomly-selected element j even before we know which cycle j will eventually be in. Looked at this way, the difference between the two alternatives reduces to whether or not element j is removed from future consideration or not.
As shown by the recursion, alternative 2 should be chosen with probability (n−1)Dn−2/Dn, which is how MP&P write their algorithm. I used the equivalent formula Dn−2 / (Dn−1 + Dn−2), mostly because my prototype used Python (for its built-in bignum support).
Without bignums, the number of derangements and hence the probabilities need to be approximated as double, which will create a slight bias and limit the size of the array to be deranged to about 170 elements. (long double would allow slightly more.) If that is too much of a limitation, you could implement the algorithm using some bignum library. For ease of implementation, I used the Posix drand48 function to produce random doubles in the range [0.0, 1.0). That's not a great random number function, but it's probably adequate to the purpose and is available in most standard C libraries.
Since no attempt is made to verify the uniqueness of the elements in the vector to be deranged, a vector with repeated elements may produce a derangement where one or more of these elements appear to be in the original place. (It's actually a different element with the same value.)
The code:
/* Deranges the vector `arr` (of length `n`) in place, to produce
* a permutation of the original vector where every element has
* been moved to a new position. Returns `true` unless the derangement
* failed because `n` was 1.
*/
bool derange(int arr[], size_t n) {
if (n < 2) return n != 1;
/* Compute derangement counts ("subfactorials") */
double subfact[n];
subfact[0] = 1;
subfact[1] = 0;
for (size_t i = 2; i < n; ++i)
subfact[i] = (i - 1) * (subfact[i - 2] + subfact[i - 1]);
/* The vector 'todo' is the stack of elements which have not yet
* been (fully) deranged; `u` is the count of elements in the stack
*/
size_t todo[n];
for (size_t i = 0; i < n; ++i) todo[i] = i;
size_t u = n;
/* While the stack is not empty, derange the element at the
* top of the stack with some element lower down in the stack
*/
while (u) {
size_t i = todo[--u]; /* Pop the stack */
size_t j = u * drand48(); /* Get a random stack index */
swap(arr, i, todo[j]); /* i will follow j in its cycle */
/* If we're generating a 2-cycle, remove the element at j */
if (drand48() * (subfact[u - 1] + subfact[u]) < subfact[u - 1])
todo[j] = todo[--u];
}
return true;
}
Notes
Many people get this wrong, particularly in social occasions such as "secret friend" selection (I believe this is sometimes called "the Santa game" in other parts of the world.) The incorrect algorithm is to just choose a different swap if the random shuffle produces a fixed point, unless the fixed point is at the very end in which case the shuffle is restarted. This will produce a random derangement but the selection is biased, particularly for small vectors. See this answer for an analysis of the bias.
Even if you don't use the RAM model where all integers are considered fixed size, the space used is still linear in the size of the input in bits, since N distinct input values must have at least N log N bits. Neither this algorithm nor MP&P makes any attempt to derange lists with repeated elements, which is a much harder problem.
Your algorithm is only almost correct (which in algorithmics means unexpected results). Because of some little errors scattered along, it will not produce expected results.
First, rand() % N is not guaranteed to produce an uniformal distribution, unless N is a divisor of the number of possible values. In any other case, you will get a slight bias. Anyway my man page for rand describes it as a bad random number generator, so you should try to use random or if available arc4random_uniform.
But avoiding that an index come back at its original place is both incommon, and rather hard to achieve. The only way I can imagine is to keep an array of the numbers [0; n[ and swap it the same as the real array to be able to know the original index of a number.
The code could become:
void Shuffle(int arr[]. size_t n)
{
int i, newIndx;
int *indexes = malloc(n * sizeof(int));
for (i=0; i<n; i++) indexes[i] = i;
for(i=0; i < n - 1; ++i) // beware to the inequality!
{
int i1;
// search if index i is in the [i; n[ current array:
for (i1=i; i1 < n; ++i) {
if (indexes[i1] == i) { // move it to i position
if (i1 != i) { // nothing to do if already at i
swap(i, i1, arr);
swap(i, i1, indexes);
}
break;
}
}
i1 = (i1 == n) ? i : i+1; // we will start the search at i1
// to guarantee that no element keep its place
newIndx = i1 + arc4random_uniform(n - i1);
/* if arc4random is not available:
newIndx = i1 + (random() % (n - i1));
*/
swap(i, newIndx, arr);
swap(i, newIndx, indexes);
}
/* special case: a permutation of [0: n-1[ have left last element in place
* we will exchange the last element with a random one
*/
if (indexes[n-1] == n-1) {
newIndx = arc4random_uniform(n-1)
swap(n-1, newIndx, arr);
swap(n-1, newIndx, indexes);
}
free(indexes); // don't forget to free what we have malloc'ed...
}
Beware: the algorithm should be correct, but the code has not been tested and can contain typos...

Generating a connected graph and checking if it has eulerian cycle

So, I wanted to have some fun with graphs and now it's driving me crazy.
First, I generate a connected graph with a given number of edges. This is the easy part, which became my curse. Basically, it works as intended, but the results I'm getting are quite bizarre (well, maybe they're not, and I'm the issue here). The algorithm for generating the graph is fairly simple.
I have two arrays, one of them is filled with numbers from 0 to n - 1, and the other is empty.
At the beginning I shuffle the first one move its last element to the empty one.
Then, in a loop, I'm creating an edge between the last element of the first array and a random element from the second one and after that I, again, move the last element from the first array to the other one.
After that part is done, I have to create random edges between the vertexes until I get as many as I need. This is, again, very easy. I just random two numbers in the range from 0 to n - 1 and if there is no edge between these vertexes, I create one.
This is the code:
void generate(int n, double d) {
initMatrix(n); // <- creates an adjacency matrix n x n, filled with 0s
int *array1 = malloc(n * sizeof(int));
int *array2 = malloc(n * sizeof(int));
int j = n - 1, k = 0;
for (int i = 0; i < n; ++i) {
array1[i] = i;
array2[i] = 0;
}
shuffle(array1, 0, n); // <- Fisher-Yates shuffle
array2[k++] = array1[j--];
int edges = d * n * (n - 1) * .5;
if (edges % 2) {
++edges;
}
while (j >= 0) {
int r = rand() % k;
createEdge(array1[j], array2[r]);
array2[k++] = array1[j--];
--edges;
}
free(array1);
free(array2);
while (edges) {
int a = rand() % n;
int b = rand() % n;
if (a == b || checkEdge(a, b)) {
continue;
}
createEdge(a, b);
--edges;
}
}
Now, if I print it out, it's a fine graph. Then I want to find a Hammiltonian cycle. This part works. Then I get to my bane - Eulerian cycle. What's the problem?
Well, first I check if all vertexes are even. And they are not. Always. Every single time, unless I choose to generate a complete graph.
I now feel destroyed by my own code. Is something wrong? Or is it supposed to be like this? I knew that Eulerian circuits would be rare, but not that rare. Please, help.
Let's analyze the probability for having euleran cycle, and for simplicity - let's do it for all graphs with n vertices, no matter number of edges.
Given a graph G of size n, choose one arbitrary vertex. The probability of it's degree being even is roughly 1/2 (assuming for each u1,u2, P((v,u1) exists) = P((v,u2) exists)).
Now, remove v from G, and create a new graph G' with n-1 vertices, and without all edges connected to v.
Similarly, for any arbitrary vertex v' in G' - if (v,v') was an edge on G', we need d(v') to be odd. Otherwise, we need d(v') to be even (both in G'). Either way, probability of it is still roughly ~1/2. (independent from previous degree of v).
....
For the ith round, let #(v) be the number of discarded edges until reaching the current graph that are connected to v. If #(v) is odd, the probability of its current degree being odd is ~1/2, and if #(v) is even, the probability of its current degree being even is also ~1/2, and we remain with current probability of ~1/2
We can now understand how it works, and make a recurrence formula for the probability of the graph being eulerian cyclic:
P(n) ~= 1/2*P(n-1)
P(1) = 1
This is going to give us P(n) ~= 2^-n, which is very unlikely for reasonable n.
Note, 1/2 is just a rough estimation (and is correct when n->infinity), probability is in fact a bit higher, but it is still exponential in -n - which makes it very unlikely for reasonable size graphs.

C Code Wavelet Transform and Explanation

I am trying to implement a wavelet transform in C and I have never done it before. I have read some about Wavelets, and understand the 'growing subspaces' idea, and how Mallat's one sided filter bank is essentially the same idea.
However, I am stuck on how to actually implement Mallat's fast wavelet transform. This is what I understand so far:
The high pass filter, h(t), gives you the detail coefficients. For a given scale j, it is a reflected, dilated, and normed version of the mother wavelet W(t).
g(t) is then the low pass filter that makes up the difference. It is supposed to be the quadrature mirror of h(t)
To get the detail coefficients, or the approximation coefficients for the jth level, you need to convolve your signal block with h(t) or g(t) respectively, and downsample the signal by 2^{j} (ie take every 2^{j} value)
However these are my questions:
How can I find g(t) when I know h(t)?
How can I compute the inverse of this transform?
Do you have any C code that I can reference? (Yes I found the one on wiki but it doesn't help)
What I would like some code to say is:
A. Here is the filter
B. Here is the transform (very explicitly)
C.) Here is the inverse transform (again for dummies)
Thanks for your patience, but there doesn't seem to be a Step1 - Step2 - Step3 -- etc guide out there with explicit examples (that aren't HAAR because all the coefficients are 1s and that makes things confusing).
the Mallat recipe for the fwt is really simple. If you look at the matlab code, eg the script by Jeffrey Kantor, all the steps are obvious.
In C it is a bit more work but that is mainly because you need to take care of your own declarations and allocations.
Firstly, about your summary:
usually the filter h is a lowpass filter, representing the scaling function (father)
likewise, g is usually the highpass filter representing the wavelet (mother)
you cannot perform a J-level decomposition in 1 filtering+downsampling step. At each level, you create an approximation signal c by filtering with h and downsampling, and a detail signal d by filtering with g and downsampling, and repeat this at the next level (using the current c)
About your questions:
for a filter h of an an orthogonal wavelet basis, [h_1 h_2 .. h_m h_n], the QMF is [h_n -h_m .. h_2 -h_1], where n is an even number and m==n-1
the inverse transform does the opposite of the fwt: at each level it upsamples detail d and approximation c, convolves d with g and c with h, and adds the signals together -- see the corresponding matlab script.
Using this information, and given a signal x of len points of type double, scaling h and wavelet g filters of f coefficients (also of type double), and a decomposition level lev, this piece of code implements the Mallat fwt:
double *t=calloc(len+f-1, sizeof(double));
memcpy(t, x, len*sizeof(double));
for (int i=0; i<lev; i++) {
memset(y, 0, len*sizeof(double));
int len2=len/2;
for (int j=0; j<len2; j++)
for (int k=0; k<f; k++) {
y[j] +=t[2*j+k]*h[k];
y[j+len2]+=t[2*j+k]*g[k];
}
len=len2;
memcpy(t, y, len*sizeof(double));
}
free(t);
It uses one extra array: a 'workspace' t to copy the approximation c (the input signal x to start with) for the next iteration.
See this example C program, which you can compile with gcc -std=c99 -fpermissive main.cpp and run with ./a.out.
The inverse should also be something along these lines. Good luck!
The only thing that is missing is some padding for the filter operation.
The lines
y[j] +=t[2*j+k]*h[k];
y[j+len2]+=t[2*j+k]*g[k];
exceed the boundaries of the t-array during first iteration and exceed the approximation part of the array during the following iterations. One must add (f-1) elements at the beginning of the t-array.
double *t=calloc(len+f-1, sizeof(double));
memcpy(&t[f], x, len*sizeof(double));
for (int i=0; i<lev; i++) {
memset(t, 0, (f-1)*sizeof(double));
memset(y, 0, len*sizeof(double));
int len2=len/2;
for (int j=0; j<len2; j++)
for (int k=0; k<f; k++) {
y[j] +=t[2*j+k]*h[k];
y[j+len2]+=t[2*j+k]*g[k];
}
len=len2;
memcpy(&t[f], y, len*sizeof(double));
}

What sort of indexing method can I use to store the distances between X^2 vectors in an array without redundancy?

I'm working on a demo that requires a lot of vector math, and in profiling, I've found that it spends the most time finding the distances between given vectors.
Right now, it loops through an array of X^2 vectors, and finds the distance between each one, meaning it runs the distance function X^4 times, even though (I think) there are only (X^2)/2 unique distances.
It works something like this: (pseudo c)
#define MATRIX_WIDTH 8
typedef float vec2_t[2];
vec2_t matrix[MATRIX_WIDTH * MATRIX_WIDTH];
...
for(int i = 0; i < MATRIX_WIDTH; i++)
{
for(int j = 0; j < MATRIX_WIDTH; j++)
{
float xd, yd;
float distance;
for(int k = 0; k < MATRIX_WIDTH; k++)
{
for(int l = 0; l < MATRIX_WIDTH; l++)
{
int index_a = (i * MATRIX_LENGTH) + j;
int index_b = (k * MATRIX_LENGTH) + l;
xd = matrix[index_a][0] - matrix[index_b][0];
yd = matrix[index_a][1] - matrix[index_b][1];
distance = sqrtf(powf(xd, 2) + powf(yd, 2));
}
}
// More code that uses the distances between each vector
}
}
What I'd like to do is create and populate an array of (X^2) / 2 distances without redundancy, then reference that array when I finally need it. However, I'm drawing a blank on how to index this array in a way that would work. A hash table would do it, but I think it's much too complicated and slow for a problem that seems like it could be solved by a clever indexing method.
EDIT: This is for a flocking simulation.
performance ideas:
a) if possible work with the squared distance, to avoid root calculation
b) never use pow for constant, integer powers - instead use xd*xd
I would consider changing your algorithm - O(n^4) is really bad. When dealing with interactions in physics (also O(n^4) for distances in 2d field) one would implement b-trees etc and neglect particle interactions with a low impact. But it will depend on what "more code that uses the distance..." really does.
just did some considerations: the number of unique distances is 0.5*n*n(+1) with n = w*h.
If you write down when unique distances occur, you will see that both inner loops can be reduced, by starting at i and j.
Additionally if you only need to access those distances via the matrix index, you can set up a 4D-distance matrix.
If memory is limited we can save up nearly 50%, as mentioned above, with a lookup function that will access a triangluar matrix, as Code-Guru said. We would probably precalculate the line index to avoid summing up on access
float distanceArray[(H*W+1)*H*W/2];
int lineIndices[H];
searchDistance(int i, int j)
{
return i<j?distanceArray[i+lineIndices[j]]:distanceArray[j+lineIndices[i]];
}

Resources