efficient way to remember array index of two large arrays - arrays

I have two Fortran arrays in 2 and 3 dimensions, say a(nx,ny) and b(nx,ny,nz). In array a, I need to find out the satisfied points, say values > 0. Then I need to locate the vectors in array b having the same indexes of x and y of those satisfied points in a. What is the easiest and fast way to do it? The two arrays are big, and I don't want to search one element by one element. Hope I explain my problem clearly! thanks!

I'm not sure that this is the best method, but here's what I would do:
Put a where clause inside a do loop over the z-values. You can first get a 2D map of valid indices into a logical array if you don't want to recalculate the points every time:
program indices
implicit none
integer, parameter :: nx = 3000, ny = 400, nz = 500
integer, dimension(nx, ny) :: a
integer, dimension(nx, ny, nz) :: b
logical, dimension(nx, ny) :: valid_points
integer :: x, y, z
do y = 1, ny
do x = 1, nx
a(x, y) = x - y
end do
end do
valid_points = (a > 0)
do z = 1, nz
where(valid_points)
b(:, :, z) = z
else where
b(:, :, z) = 0
end where
end do
end program indices

Related

handle function Matlab

I'm starting to use functions handles in Matlab and I have a question,
what Matlab computes when I do:
y = (0:.1:1)';
fun = #(x) x(1) + x(2).^2 + exp(x(3)*y)
and what Matlab computes when I do:
fun = #(x) x + x.^2 + exp(x*y)
Because I'm evaluating the Jacobian of these functions (from this code ) and it gives different results. I don't understand the difference of putting x(i) or only x
Let's define a vector vec as vec = [1, 2, 3].
When you use this vec in your first function as results = fun(vec), the program will take only the particular elements of the vector, meaning x(1) = vec(1), x(2) = vec(2) and x(3) = vec(3). The whole expression then will look as
results = vec(1) + vec(2).^2 + exp(vec(3)*y)
or better
results = 1 + 2^2 + exp(3*y)
However, when you use your second expression as results = fun(vec), it will use the entire vector vec in all the cases like this
results = vec + vec.^2 + exp(vec*y)
or better
results = [1, 2, 3] + [1^2, 2^2, 3^2] + exp([1, 2, 3]*y)
You can also clearly see that in the first case, I don't really need to care about matrix dimensions, and the final dimensions of the results variable are the same as the dimensions of your y variable. This is not the case in the second example, because you multiply matrices vec and y, which (in this particular example) results in error, as the vec variable has dimensions 1x3 and the y variable 11x1.
If you want to investigate this, I recommend you split this up into subexpressions and debug, e.g.
f1 = #(x) x(1);
f2 = #(x) x(2).^2;
f3 = #(x) exp(x(3)*y);
f = #(x) f1(x) + f1(x) + f3(x)
You can split it up even further if any subexpression is unclear.
The distinction is that one is an array array multiplication (x * y, I'm assuming x is an array with 11 columns in order for the matrix multiplication to be consistent) and the other is a scalar array multiplication (x(3) * y). The subscript operator (n) for any matrix extracts the n-th value from that matrix. For a scalar, the index can only be 1. For a 1D array, it extracts the n-th element of the column/row vector. For a 2D array, its the n-th element when traversed columnwise.
Also, if you only require the first derivative, I suggest using complex-step differentiation. It provides reduced numerical error and is computationally efficient.

Vectorising (or speeding up) a double loop with summation over non-identical indices in R

I am trying to optimise the code designed to compute double sums of product of the elements of two square matrices. Let’s say we have two square matrices of size n, W and V. The object that needs to be computed is a vector B with elements
In simple terms: compute element-by-element products of two different rows in two different matrices and take their sum, then take an extra sum over all rows of the second matrix (sans identical indices).
The problem is, the computational complexity of this task seemingly O(n3) because the length of this object we are creating, B, is n, and each element requires two summations. This is what I have come up with:
For given i and j (i≠j), start with the inner sum over k. Sum for all k, then subtract the terms for k=i and k=j, and multiply by the indicator of j≠i.
Since the restriction j≠i has been taken care of in the inner sum, the outer sum is taken just for j=1,...,n.
If we denote , then the two steps will look like
and .
However, writing a loop turned out to be very inefficient. n=100 works quickly (0.05 seconds). But, for instance, when n=500 (we are talking about real-world applications here), the average computation time is 3 seconds, and for n=1000, it jumps to 22 s.
The inner loop over k can be easily replaced by a sum, but the outer one... In this question, the suggested solution is sapply, but it implies that the summation must be done over all elements.
This is the code I am trying to evaluate before the heat death of the Universe for large n.
set.seed(1)
N <- 500
x1 <- rnorm(N)
x2 <- rchisq(N, df=3)
bw1 <- bw.nrd(x1)
bw2 <- bw.nrd(x2)
w <- outer(x1, x1, function(x, y) dnorm((x-y)/bw1) )
w <- w/rowSums(w)
v <- outer(x2, x2, function(x, y) dnorm((x-y)/bw2) )
v <- v/rowSums(v)
Bij <- matrix(NA, ncol=N, nrow=N)
for (i in 1:N) { # Around 22 secs for N=1000
for (j in 1:N) {
Bij[i, j] <- (sum(w[i, ]*v[j, ]) - w[i, i]*v[j, i] - w[i, j]*v[j, j]) * (i!=j)
}
}
Bi <- rowSums(Bij)
How would an expert R programmer vectorise such kind of loops?
Update:
In fact, given your expression for B_{ij}, we may also do the following
diag(w) <- diag(v) <- 0
BBij <- tcrossprod(w, v)
diag(BBij) <- 0
range(rowSums(BBij) - Bi)
# [1] -2.220446e-16 0.000000e+00
range(BBij - Bij)
# [1] -6.938894e-18 5.204170e-18
Hence, while somewhat obvious, it may also be an interesting observation for your purposes that neither B_{ij} nor B_i depend on the diagonals of W and V.
Initial answer:
Since
where the diagonals of W and V can be set to zero and V_{.k} denotes the sum of the k-th column of V, we have
diag(w) <- diag(v) <- 0
A <- w * v
rowSums(sweep(w, 2, colSums(v), `*`)) - rowSums(A) + diag(A)
where
range(rowSums(sweep(w, 2, colSums(v), `*`)) - rowSums(A) + diag(A) - Bi)
# [1] -1.110223e-16 1.110223e-16
Without looking into the content of your matrices w and v, your double for-loop can be replaced with simple matrix operations, using one matrix multiplication (tcrossprod), transpose (t) and diagonal extraction:
Mat.ij <- tcrossprod(w, v) -
matrix(rep(diag(w), times = N), nrow = N) * t(v) -
w * matrix(rep(diag(v), each = N), nrow = N)
diag(Mat.ij) <- 0
all.equal(Bij, Mat.ij)
[1] TRUE

How to obtain elements of an array close to another array in MATLAB?

There must be an easier way to do this, optimization method is also welcome. I have an array 'Y' and many parameters that has to be adjusted such that Y nears zero (= 'X') as given in the MWE. Is there a much better procedure to minimize this difference? This is just an example equation, there can be 6 coefficients to optimized.
x = zeros(10,1)
y = rand(10,1)
for a=1:0.1:4
for b=2:0.1:5
for c = 3:0.1:6
z = (a * y .^ 3 + b * y + c) - x
if -1<= range(z) <= 1
a, b, c
break
end
end
end
end
I believe
p = polyfit(y,x,2);
is what you are looking for.
where p will be an array of your [a, b, c] coefficients.

Extract array dimensions in Julia

Given a vector A defined in Matlab by:
A = [ 0
0
1
0
0 ];
we can extract its dimensions using:
size(A);
Apparently, we can achieve the same things in Julia using:
size(A)
Just that in Matlab we are able to extract the dimensions in a vector, by using:
[n, m] = size(A);
irrespective to the fact whether A is one or two-dimensional, while in Julia A, size (A) will return only one dimension if A has only one dimension.
How can I do the same thing as in Matlab in Julia, namely, extracting the dimension of A, if A is a vector, in a vector [n m]. Please, take into account that the dimensions of A might vary, i.e. it could have sometimes 1 and sometimes 2 dimensions.
A = zeros(3,5)
sz = size(A)
returns a tuple (3,5). You can refer to specific elements like sz[1]. Alternatively,
m,n = size(A,1), size(A,2)
This works even if A is a column vector (i.e., one-dimensional), returning a value of 1 for n.
This will achieve what you're expecting:
n, m = size(A); #or
(n, m) = size(A);
If size(A) is a one dimensional Tuple, m will not be assigned, while n will receive length(A). Just be sure to catch that error, otherwise your code may stop if running from a script.

Data.Map vs. Data.Array for symmetric matrices?

Sorry for the vague question, but I hope for an experienced Haskeller this is a no-brainer.
I have to represent and manipulate symmetric matrices, so there are basically three different choices for the data type:
Complete matrix storing both the (i,j) and (j,i) element, although m(i,j) = m(j,i)
Data.Array (Int, Int) Int
A map, storing only elements (i,j) with i <= j (upper triangular matrix)
Data.Map (Int, Int) Int
A vector indexed by k, storing the upper triangular matrix given some vector order f(i,j) = k
Data.Array Int Int
Many operations are going to be necessary on the matrices, updating a single element, querying for rows and columns etc. However, they will mainly act as containers, no linear algebra operations (inversion, det, etc) will be required.
Which one of the options would be the fastest one in general if the dimensionality of the matrices is going to be at around 20x20? When I understand correctly, every update (with (//) in the case of array) requires full copies, so going from 20x20=400 elements to 20*21/2 = 210 elements in the cases 2. or 3. would make a lot of sense, but access is slower for case 2. and 3. needs conversion at some point.
Are there any guidelines?
Btw: The 3rd option is not a really good one, as computing f^-1 requires square roots.
You could try using Data.Array using a specialized Ix class that only generates the upper half of the matrix:
newtype Symmetric = Symmetric { pair :: (Int, Int) } deriving (Ord, Eq)
instance Ix Symmetric where
range ((Symmetric (x1,y1)), (Symmetric (x2,y2))) =
map Symmetric [(x,y) | x <- range (x1,x2), y <- range (y1,y2), x >= y]
inRange (lo,hi) i = x <= hix && x >= lox && y <= hiy && y >= loy && x >= y
where
(lox,loy) = pair lo
(hix,hiy) = pair hi
(x,y) = pair i
index (lo,hi) i
| inRange (lo,hi) i = (x-loy)+(sum$take(y-loy)[hix-lox, hix-lox-1..])
| otherwise = error "Error in array index"
where
(lox,loy) = pair lo
(hix,hiy) = pair hi
(x,y) = pair i
sym x y
| x < y = Symmetric (y,x)
| otherwise = Symmetric (x,y)
*Main Data.Ix> let a = listArray (sym 0 0, sym 6 6) [0..]
*Main Data.Ix> a ! sym 3 2
14
*Main Data.Ix> a ! sym 2 3
14
*Main Data.Ix> a ! sym 2 2
13
*Main Data.Ix> length $ elems a
28
*Main Data.Ix> let b = listArray (sym 0 0, sym 19 19) [0..]
*Main Data.Ix> length $ elems b
210
There is a fourth option: use an array of decreasingly-large arrays. I would go with either option 1 (using a full array and just storing every element twice) or this last one. If you intend to be updating a lot of elements, I strongly recommend using a mutable array; IOArray and STArray are popular choices.
Unless this is for homework or something, you should also take a peek at Hackage. A quick look suggests the problem of manipulating matrices has been solved several times already.

Resources