I implemented a finite differences algorithm to solve a PDE.
The grid is a structured 2D domain of size [Nx, Nz], solved Nt times.
I pre-allocate the object containing all solutions:
sol = zeros(Nx, Nz, Nt, 'single') ;
This becomes very easily too large and I get a 'out of memory' error.
Unfortunately sparse doesn't work for N-dimensional arrays.
For the sake of the question it's not important to know the values, it goes without saying that the RAM usage grows exponentially with decreasing the grid spacing and increasing the simulation time.
I am aware that I do not need to store each time instant for the purpose of the advancement of the solution. It would be sufficient to just store the previous two time steps. However, for post-processing reasons I need to access the solution at all time-steps (or at least at a submultiple of the total number).It might help to specify that, even after the solution, the grid remains predominantly populated by zeros.
Am I fighting a lost battle or is there a more efficient way to proceed (other type of objects, vectorization...)?
Thank you.
You could store the array in sparse, linear form; that is, a column vector with length equal to the product of dimensions:
sol = sparse([], [], [], Nx*Nz*Nt, 1); % sparse column vector containing zeros
Then, instead of indexing normally,
sol(x, z, t),
you need to translate the indices x, z, t into the corresponding linear index:
For scalar indices you use
sol(x + Nx*(z-1) + Nx*Nz*(t-1))
You can define a helper function for convenience:
ind = #(sol, x, y, t) sol(x + Nx*(z-1) + Nx*Nz*(t-1))
so the indexing becomes more readable:
ind(sol, x, z, t)
For general (array) indices you need to reshape the indices along different dimensions so that implicit expansion produces the appropriate linear index:
sol(reshape(x,[],1,1) + Nx*(reshape(z,1,[],1)-1) + Nx*Nz*(reshape(t,1,1,[])-1))
which of course could also be encapsulated into a function.
Check that the conversion to linear indexing works (general case, using non-sparse array to compare with normal indexing):
Nx = 15; Nz = 18; Nt = 11;
sol = randi(9, Nx, Nz, Nt);
x = [5 6; 7 8]; z = 7; t = [4 9 1];
isequal(sol(x, z, t), ...
sol(reshape(x,[],1,1) + Nx*(reshape(z,1,[],1)-1) + Nx*Nz*(reshape(t,1,1,[])-1)))
gives
ans =
logical
1
You can create a a cell array of sparse matrices to store the results. However computations can be performed on full matrices if working with a full matrix is faster than sparse matrix and convert the full matrix to sparse matrix and place it in the cell.
Related
Suppose I have n sorted integer arrays (a_1, ..., a_n, there may be duplicated elements in a single array), and T is a threshold value between 0 and 1. I would like to find all pairs of arrays the similarity of which is larger than T. The similarity of array a_j w.r.t. array a_i is defined as follows:
sim(i, j) = intersection(i, j) / length(i)
where intersection(i, j) returns the number of elements shared in a_i and a_j, and length(i) returns the length of array a_i.
I can enumerate all pairs of arrays and compute the similarity value, but this takes too much time for a large n (say n=10^5). Is there any data structure, pruning strategy, or other techniques that can reduce the time cost of this procedure? I'm using Java so the technique should be easily applicable in Java.
There are (n^2 - n)/2 pairs of arrays. If n=10^5, then you have to compute the similarity of 5 billion pairs of arrays. That's going to take some time.
One potential optimization is to shortcut your evaluation of two arrays if it becomes clear that you won't reach T. For example, if T is 0.5, you've examined more than half of the array and haven't found any intersections, then it's clear that that pair of arrays won't meet the threshold. I don't expect this optimization to gain you much.
It might be possible to make some inferences based on prior results. That is, if sim(1,2) = X and sim(1,3) < T, there's probably a value of X (likely would have to be very high) at which you can say definitively that sim(2,3) < T.
If I have a vector v = {10,9,8}, and a vector y = {10,5,7}. How can I write this so that it results in a vector x = {1,0,0}. In other Words, set ones where elements match, and zeroes if not? How would one write this in a mathematical way, or by using functional language terms like filter, map or such.
Although the question might be considered off-topic, the Kronecker delta comes to mind. If n is a nonnegative integer, and v,y in R^n, one can define the desired vector as x:={x_1,...,x_n} where x_i = delta_v_i,y_i for each i in {1,...n}.
In Matlab I have an array v of length m, a matrix of order n and a function F that takes as an input a single matrix and outputs a number. Starting from v I would like to apply the function to the whole array of matrices whose i-th element consists of a matrix M_i whose entries are obtained by multiplicating all the entries of M by v_i. The output would be itself an array of length n.
As far as I can see there are two ways of achieving this:
Looping on all i=1:n, computing F on all the M_is and store all the corresponding values in an array
Defining a 3-array structure that contains all the matrices M_i and correspondingly extending the function F as to act on 3-arrays instead of matrices. However this entails overloading some matrix operators and functions (transpose, exponential, logarithm, square root, inverse etc...) as to formally handle a 3-array.
I have done the simpler option 1. It takes a long time to execute. Number 2 promises to be faster- However, I am not sure if this is the case, and I am not familiar with overloading operators on Matlab. In particular: how to extend a matrix operator to a 3-array in such a way that it performs the related function on all of its entries.
A for loop is probably no slower than vectorising this, especially for larger problems where memory starts to limit speed. Nevertheless, here are two ways of doing it:
M=rand(3,3,5) % I'm using a 3x3x5 matrix
v=1:5
F=#sum % This is the function
M2=bsxfun(#times,M,permute(v.',[2 3 1])) % Multiply the M(:,:,i) matrix by v(i)
R=arrayfun(#(t) F(M2(:,:,t)),(1:size(M,3)).','UniformOutput',false) % applies the function F to the resulting matrices
cell2mat(R) % Convert from cell array to matrix, since my F function returns row vectors
R2=zeros(size(M,3),size(M,1)); % Initialise R2
for t=1:size(M,3)
R2(t,:)=F(M(:,:,t)*v(t)); % Apply F to M(:,:,i)*v(i)
end
R2
You should do some testing to see which will be more efficient for your actual problem. The vectorised version should be faster for small problems, but use more memory, whereas the for loop will be slower for small problems but use less memory, and so could be faster on larger problems.
In GSL a real n * m matrix M is represented internally as an array of size n*m. To access the (i,j) element of M, internally GSL has to access the (i-1) * n + j - 1 location of the array, which involves integer multiplications and additions.
In Numerical Recipes for C, they recommend the alternative method of declaring an array of n pointers, each pointing to an array of m numbers. Then to access the (i,j) element, one puts M[i-1][j-1]. They claim that this is more efficient because it avoids the integer multiplication. The downside is that one has to initialize each pointer separately.
I am wondering, what are the advantages/disadvantages of each approach?
In C:
#define n 2
#define m 3
int M[n*m];
is the same as
int M[n][m];
in C matrices are said to be stored in row-major order
http://en.wikipedia.org/wiki/Row-major_order
In C,
M[1][2]
is the same as
*(M + 1*m + 2) // if M is define as M[n][m]
You could define M as an array of n pointers, but you still have to put the data somewhere and the best place is probably a 2D array. I would suggest:
int M[n][m];
int* Mrows[n] = {M[0], M[1]};
You can then do a direct offset into rows to get to the row you want. Then:
Mrows[1][2]
is the same as
*((*(Mrows + 1)) + 2)
Its more work for the programmer and probably only worth it if you want to go really fast. In that case you may want to look into more optimizations such as specific machine instructions. Also, depending on your algorithm, you may be able to just use + operations (like if you are iterating over the matrix)
I am looking for a fast algorithm:
I have a int array of size n, the goal is to find all patterns in the array that
x1, x2, x3 are different elements in the array, such that x1+x2 = x3
For example I know there's a int array of size 3 is [1, 2, 3] then there's only one possibility: 1+2 = 3 (consider 1+2 = 2+1)
I am thinking about implementing Pairs and Hashmaps to make the algorithm fast. (the fastest one I got now is still O(n^2))
Please share your idea for this problem, thank you
Edit: The answer below applies to a version of this problem in which you only want one triplet that adds up like that. When you want all of them, since there are potentially at least O(n^2) possible outputs (as pointed out by ex0du5), and even O(n^3) in pathological cases of repeated elements, you're not going to beat the simple O(n^2) algorithm based on hashing (mapping from a value to the list of indices with that value).
This is basically the 3SUM problem. Without potentially unboundedly large elements, the best known algorithms are approximately O(n^2), but we've only proved that it can't be faster than O(n lg n) for most models of computation.
If the integer elements lie in the range [u, v], you can do a slightly different version of this in O(n + (v-u) lg (v-u)) with an FFT. I'm going to describe a process to transform this problem into that one, solve it there, and then figure out the answer to your problem based on this transformation.
The problem that I know how to solve with FFT is to find a length-3 arithmetic sequence in an array: that is, a sequence a, b, c with c - b = b - a, or equivalently, a + c = 2b.
Unfortunately, the last step of the transformation back isn't as fast as I'd like, but I'll talk about that when we get there.
Let's call your original array X, which contains integers x_1, ..., x_n. We want to find indices i, j, k such that x_i + x_j = x_k.
Find the minimum u and maximum v of X in O(n) time. Let u' be min(u, u*2) and v' be max(v, v*2).
Construct a binary array (bitstring) Z of length v' - u' + 1; Z[i] will be true if either X or its double [x_1*2, ..., x_n*2] contains u' + i. This is O(n) to initialize; just walk over each element of X and set the two corresponding elements of Z.
As we're building this array, we can save the indices of any duplicates we find into an auxiliary list Y. Once Z is complete, we just check for 2 * x_i for each x_i in Y. If any are present, we're done; otherwise the duplicates are irrelevant, and we can forget about Y. (The only situation slightly more complicated is if 0 is repeated; then we need three distinct copies of it to get a solution.)
Now, a solution to your problem, i.e. x_i + x_j = x_k, will appear in Z as three evenly-spaced ones, since some simple algebraic manipulations give us 2*x_j - x_k = x_k - 2*x_i. Note that the elements on the ends are our special doubled entries (from 2X) and the one in the middle is a regular entry (from X).
Consider Z as a representation of a polynomial p, where the coefficient for the term of degree i is Z[i]. If X is [1, 2, 3, 5], then Z is 1111110001 (because we have 1, 2, 3, 4, 5, 6, and 10); p is then 1 + x + x2 + x3 + x4 + x5 + x9.
Now, remember from high school algebra that the coefficient of xc in the product of two polynomials is the sum over all a, b with a + b = c of the first polynomial's coefficient for xa times the second's coefficient for xb. So, if we consider q = p2, the coefficient of x2j (for a j with Z[j] = 1) will be the sum over all i of Z[i] * Z[2*j - i]. But since Z is binary, that's exactly the number of triplets i,j,k which are evenly-spaced ones in Z. Note that (j, j, j) is always such a triplet, so we only care about ones with values > 1.
We can then use a Fast Fourier Transform to find p2 in O(|Z| log |Z|) time, where |Z| is v' - u' + 1. We get out another array of coefficients; call it W.
Loop over each x_k in X. (Recall that our desired evenly-spaced ones are all centered on an element of X, not 2*X.) If the corresponding W for twice this element, i.e. W[2*(x_k - u')], is 1, we know it's not the center of any nontrivial progressions and we can skip it. (As argued before, it should only be a positive integer.)
Otherwise, it might be the center of a progression that we want (so we need to find i and j). But, unfortunately, it might also be the center of a progression that doesn't have our desired form. So we need to check. Loop over the other elements x_i of X, and check if there's a triple with 2*x_i, x_k, 2*x_j for some j (by checking Z[2*(x_k - x_j) - u']). If so, we have an answer; if we make it through all of X without a hit, then the FFT found only spurious answers, and we have to check another element of W.
This last step is therefore O(n * 1 + (number of x_k with W[2*(x_k - u')] > 1 that aren't actually solutions)), which is maybe possibly O(n^2), which is obviously not okay. There should be a way to avoid generating these spurious answers in the output W; if we knew that any appropriate W coefficient definitely had an answer, this last step would be O(n) and all would be well.
I think it's possible to use a somewhat different polynomial to do this, but I haven't gotten it to actually work. I'll think about it some more....
Partially based on this answer.
It has to be at least O(n^2) as there are n(n-1)/2 different sums possible to check for other members. You have to compute all those, because any pair summed may be any other member (start with one example and permute all the elements to convince yourself that all must be checked). Or look at fibonacci for something concrete.
So calculating that and looking up members in a hash table gives amortised O(n^2). Or use an ordered tree if you need best worst-case.
You essentially need to find all the different sums of value pairs so I don't think you're going to do any better than O(n2). But you can optimize by sorting the list and reducing duplicate values, then only pairing a value with anything equal or greater, and stopping when the sum exceeds the maximum value in the list.