GSL vs Numerical Recipes. Best way to handle matrices

GSL vs Numerical Recipes. Best way to handle matrices - c

In GSL a real n * m matrix M is represented internally as an array of size n*m. To access the (i,j) element of M, internally GSL has to access the (i-1) * n + j - 1 location of the array, which involves integer multiplications and additions.
In Numerical Recipes for C, they recommend the alternative method of declaring an array of n pointers, each pointing to an array of m numbers. Then to access the (i,j) element, one puts M[i-1][j-1]. They claim that this is more efficient because it avoids the integer multiplication. The downside is that one has to initialize each pointer separately.
I am wondering, what are the advantages/disadvantages of each approach?

In C:
#define n 2
#define m 3
int M[n*m];
is the same as
int M[n][m];
in C matrices are said to be stored in row-major order
http://en.wikipedia.org/wiki/Row-major_order
In C,
M[1][2]
is the same as
*(M + 1*m + 2) // if M is define as M[n][m]
You could define M as an array of n pointers, but you still have to put the data somewhere and the best place is probably a 2D array. I would suggest:
int M[n][m];
int* Mrows[n] = {M[0], M[1]};
You can then do a direct offset into rows to get to the row you want. Then:
Mrows[1][2]
is the same as
*((*(Mrows + 1)) + 2)
Its more work for the programmer and probably only worth it if you want to go really fast. In that case you may want to look into more optimizations such as specific machine instructions. Also, depending on your algorithm, you may be able to just use + operations (like if you are iterating over the matrix)

Related

Efficiently store an N-dimensional array of mostly zeros in Matlab

I implemented a finite differences algorithm to solve a PDE.
The grid is a structured 2D domain of size [Nx, Nz], solved Nt times.
I pre-allocate the object containing all solutions:
sol = zeros(Nx, Nz, Nt, 'single') ;
This becomes very easily too large and I get a 'out of memory' error.
Unfortunately sparse doesn't work for N-dimensional arrays.
For the sake of the question it's not important to know the values, it goes without saying that the RAM usage grows exponentially with decreasing the grid spacing and increasing the simulation time.
I am aware that I do not need to store each time instant for the purpose of the advancement of the solution. It would be sufficient to just store the previous two time steps. However, for post-processing reasons I need to access the solution at all time-steps (or at least at a submultiple of the total number).It might help to specify that, even after the solution, the grid remains predominantly populated by zeros.
Am I fighting a lost battle or is there a more efficient way to proceed (other type of objects, vectorization...)?
Thank you.

You could store the array in sparse, linear form; that is, a column vector with length equal to the product of dimensions:
sol = sparse([], [], [], Nx*Nz*Nt, 1); % sparse column vector containing zeros
Then, instead of indexing normally,
sol(x, z, t),
you need to translate the indices x, z, t into the corresponding linear index:
For scalar indices you use
sol(x + Nx*(z-1) + Nx*Nz*(t-1))
You can define a helper function for convenience:
ind = #(sol, x, y, t) sol(x + Nx*(z-1) + Nx*Nz*(t-1))
so the indexing becomes more readable:
ind(sol, x, z, t)
For general (array) indices you need to reshape the indices along different dimensions so that implicit expansion produces the appropriate linear index:
sol(reshape(x,[],1,1) + Nx*(reshape(z,1,[],1)-1) + Nx*Nz*(reshape(t,1,1,[])-1))
which of course could also be encapsulated into a function.
Check that the conversion to linear indexing works (general case, using non-sparse array to compare with normal indexing):
Nx = 15; Nz = 18; Nt = 11;
sol = randi(9, Nx, Nz, Nt);
x = [5 6; 7 8]; z = 7; t = [4 9 1];
isequal(sol(x, z, t), ...
sol(reshape(x,[],1,1) + Nx*(reshape(z,1,[],1)-1) + Nx*Nz*(reshape(t,1,1,[])-1)))
gives
ans =
logical
1

You can create a a cell array of sparse matrices to store the results. However computations can be performed on full matrices if working with a full matrix is faster than sparse matrix and convert the full matrix to sparse matrix and place it in the cell.

loop with array access is slow in Julia for high dimensional arrays

I have the same problem described and solved in this thread, where julia loops were considerably slowed down due to the repeated access to memory using a not optimal variable type. In my case however I have tensors (with dimensions higher than two) stored as multidimensional Array which need to be summed over and over in a for loop, something like:
function randomTensor(M,N)
T = fill( zeros( M, M, M, M) , N)
for i in 1 : N
Ti = rand( M, M, M, M)
T[i] += Ti
end
return T
end
I'm new to Julia (in fact I'm using it just for a specific project for which I need some julia modules), so I have only a general understanding of variable types and so forth, and till now I've not been able to define my tensors if not as general arrays. I would like a solution as the one proposed in the aforementioned link, where they use Matrix{Float64}[] to define variables, but apparently that works only for 1 or 2-dimensional arrays. Cannot use Tuples either, as I need to sum over the values. Any tip for different implementations?
Thank you all in advance.
Edit: the sample code I posted is just an example featuring the same structure of my code, I'm not interested in generating and summing random numbers : )
In other words, my question is: is there a better variable type than Array{Float, N} to do operations (e.g. tensor times scalar...) or that have "better access" to memory?

It's quite unclear what the question is. Your function is equivalent to:
randomTensor(M,N) = [rand(M, M, M, M) for i in 1:N]
Are you saying that this is too slow? The majority of the time should be in creating the random numbers and allocating the memory for the output.

Depending on the size of M, you might notice some advantage from using .+= instead of +=. The reason is that x += y is exactly equivalent to x = x + y, including allocating a new array to hold the result. In contrast, x .+= y is in-place. You can see that for yourself in the following way:
julia> a = [1,2,3]
3-element Vector{Int64}:
1
2
3
julia> pointer(a)
Ptr{Int64} #0x00007fa007d2c980
julia> a += a
3-element Vector{Int64}:
2
4
6
julia> pointer(a)
Ptr{Int64} #0x00007fa02a06eac0 # different pointer means new memory was allocated
julia> a .+= a
3-element Vector{Int64}:
4
8
12
julia> pointer(a)
Ptr{Int64} #0x00007fa02a06eac0 # same pointer means memory re-use

Binomial coefficients using dynamic programming and one dimensional array

Most implementations of binomial coefficient computation using dynamic programming makes use of 2-dimensional arrays, as in these examples:
http://www.csl.mtu.edu/cs4321/www/Lectures/Lecture%2015%20-%20Dynamic%20Programming%20Binomial%20Coefficients.htm
http://www.geeksforgeeks.org/dynamic-programming-set-9-binomial-coefficient/
My question is, why not just compute it using a single dimensional array like this:
def C(n, r):
memo = list()
if (r > int(n/2)):
r = n - r
memo.append(1.0)
for i in range(1,r+1):
now = ((n-i+1)*memo[i-1])/i
memo.append(now)
return memo[r]
Basically using the recursive formula:
C(n,r) = ((n-r+1)/r) * C(n,r-1)
This has a O(r) complexity, while the 2 dimensional logic has a O(nr) complexity.
Am I missing something here?

If you want all of the values, then the 2D logic is certainly more efficient. The 2D logic may be more efficient for some parameters on some hardware that, e.g., lacks hardware multiply and divide. You have to be careful about integer overflow when multiplying before dividing, whereas the integer addition in the 2D recurrence is always fine. Other than that, no, the 1D recurrence is better.

Matlab: Looping a function on the elements of a 3-array vs Passing a 3-array to the same function

In Matlab I have an array v of length m, a matrix of order n and a function F that takes as an input a single matrix and outputs a number. Starting from v I would like to apply the function to the whole array of matrices whose i-th element consists of a matrix M_i whose entries are obtained by multiplicating all the entries of M by v_i. The output would be itself an array of length n.
As far as I can see there are two ways of achieving this:
Looping on all i=1:n, computing F on all the M_is and store all the corresponding values in an array
Defining a 3-array structure that contains all the matrices M_i and correspondingly extending the function F as to act on 3-arrays instead of matrices. However this entails overloading some matrix operators and functions (transpose, exponential, logarithm, square root, inverse etc...) as to formally handle a 3-array.
I have done the simpler option 1. It takes a long time to execute. Number 2 promises to be faster- However, I am not sure if this is the case, and I am not familiar with overloading operators on Matlab. In particular: how to extend a matrix operator to a 3-array in such a way that it performs the related function on all of its entries.

A for loop is probably no slower than vectorising this, especially for larger problems where memory starts to limit speed. Nevertheless, here are two ways of doing it:
M=rand(3,3,5) % I'm using a 3x3x5 matrix
v=1:5
F=#sum % This is the function
M2=bsxfun(#times,M,permute(v.',[2 3 1])) % Multiply the M(:,:,i) matrix by v(i)
R=arrayfun(#(t) F(M2(:,:,t)),(1:size(M,3)).','UniformOutput',false) % applies the function F to the resulting matrices
cell2mat(R) % Convert from cell array to matrix, since my F function returns row vectors
R2=zeros(size(M,3),size(M,1)); % Initialise R2
for t=1:size(M,3)
R2(t,:)=F(M(:,:,t)*v(t)); % Apply F to M(:,:,i)*v(i)
end
R2
You should do some testing to see which will be more efficient for your actual problem. The vectorised version should be faster for small problems, but use more memory, whereas the for loop will be slower for small problems but use less memory, and so could be faster on larger problems.

find if two arrays contain the same set of integers without extra space and faster than NlogN

I came across this post, which reports the following interview question:
Given two arrays of numbers, find if each of the two arrays have the
same set of integers ? Suggest an algo which can run faster than NlogN
without extra space?
The best that I can think of is the following:
(a) sort each array, and then (b) have two pointers moving along the two arrays and check if you find different values ... but step (a) has already NlogN complexity :(
(a) scan shortest array and put values into a map, and then (b) scan second array and check if you find a value that is not in the map ... here we have linear complexity, but we I use extra space
... so, I can't think of a solution for this question.
Ideas?
Thank you for all the answers. I feel many of them are right, but I decided to choose ruslik's one, because it gives an interesting option that I did not think about.

You can try a probabilistic approach by choosing a commutative function for accumulation (eg, addition or XOR) and a parametrized hash function.
unsigned addition(unsigned a, unsigned b);
unsigned hash(int n, int h_type);
unsigned hash_set(int* a, int num, int h_type){
unsigned rez = 0;
for (int i = 0; i < num; i++)
rez = addition(rez, hash(a[i], h_type));
return rez;
};
In this way the number of tries before you decide that the probability of false positive will be below a certain treshold will not depend on the number of elements, so it will be linear.
EDIT: In general case the probability of sets being the same is very small, so this O(n) check with several hash functions can be used for prefiltering: to decide as fast as possible if they are surely different or if there is a probability of them being equivalent, and if a slow deterministic method should be used. The final average complexity will be O(n), but worst case scenario will have the complexity of the determenistic method.

You said "without extra space" in the question but I assume that you actually mean "with O(1) extra space".
Suppose that all the integers in the arrays are less than k. Then you can use in-place radix sort to sort each array in time O(n log k) with O(log k) extra space (for the stack, as pointed out by yi_H in comments), and compare the sorted arrays in time O(n log k). If k does not vary with n, then you're done.

I'll assume that the integers in question are of fixed size (eg. 32 bit).
Then, radix-quicksorting both arrays in place (aka "binary quicksort") is constant space and O(n).
In case of unbounded integers, I believe (but cannot proof, even if it is probably doable) that you cannot break the O(n k) barrier, where k is the number of digits of the greatest integer in either array.
Whether this is better than O(n log n) depends on how k is assumed to scale with n, and therefore depends on what the interviewer expects of you.

A special, not harder case is when one array holds 1,2,..,n. This was discussed many times:
How to tell if an array is a permutation in O(n)?
Algorithm to determine if array contains n...n+m?
mathoverflow
and despite many tries no deterministic solutions using O(1) space and O(n) time were shown. Either you can cheat the requirements in some way (reuse input space, assume integers are bounded) or use probabilistic test.
Probably this is an open problem.

Here is a co-rp algorithm:
In linear time, iterate over the first array (A), building the polynomial
Pa = A[0] - x)(A[1] -x)...(A[n-1] - x). Do the same for array B, naming this polynomial Pb.
We now want to answer the question "is Pa = Pb?" We can check this probabilistically as follows. Select a number r uniformly at random from the range [0...4n] and compute d = Pa(r) - Pb(r) in linear time. If d = 0, return true; otherwise return false.
Why is this valid? First of all, observe that if the two arrays contain the same elements, then Pa = Pb, so Pa(r) = Pb(r) for all r. With this in mind, we can easily see that this algorithm will never erroneously reject two identical arrays.
Now we must consider the case where the arrays are not identical. By the Schwart-Zippel Lemma, P(Pa(r) - Pb(r) = 0 | Pa != Pb) < (n/4n). So the probability that we accept the two arrays as equivalent when they are not is < (1/4).

The usual assumption for these kinds of problems is Theta(log n)-bit words, because that's the minimum needed to index the input.
sshannin's polynomial-evaluation answer works fine over finite fields, which sidesteps the difficulties with limited-precision registers. All we need are a prime of the appropriate (easy to find under the same assumptions that support a lot of public-key crypto) or an irreducible polynomial in (Z/2)[x] of the appropriate degree (difficulty here is multiplying polynomials quickly, but I think the algorithm would be o(n log n)).
If we can modify the input with the restriction that it must maintain the same set, then it's not too hard to find space for radix sort. Select the (n/log n)th element from each array and partition both arrays. Sort the size-(n/log n) pieces and compare them. Now use radix sort on the size-(n - n/log n) pieces. From the previously processed elements, we can obtain n/log n bits, where bit i is on if a[2*i] > a[2*i + 1] and off if a[2*i] < a[2*i + 1]. This is sufficient to support a radix sort with n/(log n)^2 buckets.

In the algebraic decision tree model, there are known Omega(NlogN) lower bounds for computing set intersection (irrespective of the space limits).
For instance, see here: http://compgeom.cs.uiuc.edu/~jeffe/teaching/497/06-algebraic-tree.pdf
So unless you do clever bit manipulations/hashing type approaches, you cannot do better than NlogN.
For instance, if you used only comparisons, you cannot do better than NlogN.

You can break the O(n*log(n)) barrier if you have some restrictions on the range of numbers. But it's not possible to do this if you cannot use any extra memory (you need really silly restrictions to be able to do that).
I would also like to note that even O(nlog(n)) with sorting is not trivial if you have O(1) space limit as merge sort uses O(n) space and quicksort (which is not even strict o(nlog(n)) needs O(log(n)) space for the stack. You have to use heapsort or smoothsort.
Some companies like to ask questions which cannot be solved and I think it is a good practice, as a programmer you have to know both what's possible and how to code it and also know what are the limits so you don't waste your time on something that's not doable.
Check this question for a couple of good techniques to use:
Algorithm to tell if two arrays have identical members

For each integer i check that the number of occurrences of i in the two arrays are either both zero or both nonzero, by iterating over the arrays.
Since the number of integers is constant the total runtime is O(n).
No, I wouldn't do this in practice.

Was just thinking if there was a way you could hash the cumulative of both arrays and compare them, assuming the hashing function doesn't produce collisions from two differing patterns.

why not i find the sum , product , xor of all the elements one array and compare them with the corresponding value of the elements of the other array ??
the xor of elements of both arrays may give zero if the it is like
2,2,3,3
1,1,2,2
but what if you compare the xor of the elements of two array to be equal ???
consider this
10,3
12,5
here xor of both arrays will be same !!! (10^3)=(12^5)=9
but their sum and product are different . I think two different set of elements cannot have same sum ,product and xor !
This can be analysed by simple bitvalue examination.
Is there anything wrong in this approach ??

I'm not sure that correctly understood the problem, but if you are interested in integers that are in both array:
If N >>>>> 2^SizeOf(int) (count of bit for integer (16, 32, 64)) there is one solution:
a = Array(N); //length(a) = N;
b = Array(M); //length(b) = M;
//x86-64. Integer consist of 64 bits.
for i := 0 to 2^64 / 64 - 1 do //very big, but CONST
for k := 0 to M - 1 do
if a[i] = b[l] then doSomething; //detected
for i := 2^64 / 64 to N - 1 do
if not isSetBit(a[i div 64], i mod 64) then
setBit(a[i div 64], i mod 64);
for i := 0 to M - 1 do
if isSetBit(a[b[i] div 64], b[i] mod 64) then doSomething; //detected
O(N), with out aditional structures

All I know is that comparison based sorting cannot possibly be faster than O(NlogN), so we can eliminate most of the "common" comparison based sorts. I was thinking of doing a bucket sort. Perhaps if this qn was asked in an interview, the best response would first be to clarify what sort of data those integers represent. For e.g., if they represent a persons age, then we know that the range of values of int is limited, and can use bucket sort at O(n). However, this will not be in place....

If the arrays have the same size, and there are guaranteed to be no duplicates, sum each of the arrays. If the sum of the values is different, then they contain different integers.
Edit: You can then sum the log of the entries in the arrays. If that is also the same, then you have the same entries in the array.