How to obtain elements of an array close to another array in MATLAB? - arrays

There must be an easier way to do this, optimization method is also welcome. I have an array 'Y' and many parameters that has to be adjusted such that Y nears zero (= 'X') as given in the MWE. Is there a much better procedure to minimize this difference? This is just an example equation, there can be 6 coefficients to optimized.
x = zeros(10,1)
y = rand(10,1)
for a=1:0.1:4
for b=2:0.1:5
for c = 3:0.1:6
z = (a * y .^ 3 + b * y + c) - x
if -1<= range(z) <= 1
a, b, c

I believe
p = polyfit(y,x,2);
is what you are looking for.
where p will be an array of your [a, b, c] coefficients.


handle function Matlab

I'm starting to use functions handles in Matlab and I have a question,
what Matlab computes when I do:
y = (0:.1:1)';
fun = #(x) x(1) + x(2).^2 + exp(x(3)*y)
and what Matlab computes when I do:
fun = #(x) x + x.^2 + exp(x*y)
Because I'm evaluating the Jacobian of these functions (from this code ) and it gives different results. I don't understand the difference of putting x(i) or only x
Let's define a vector vec as vec = [1, 2, 3].
When you use this vec in your first function as results = fun(vec), the program will take only the particular elements of the vector, meaning x(1) = vec(1), x(2) = vec(2) and x(3) = vec(3). The whole expression then will look as
results = vec(1) + vec(2).^2 + exp(vec(3)*y)
or better
results = 1 + 2^2 + exp(3*y)
However, when you use your second expression as results = fun(vec), it will use the entire vector vec in all the cases like this
results = vec + vec.^2 + exp(vec*y)
or better
results = [1, 2, 3] + [1^2, 2^2, 3^2] + exp([1, 2, 3]*y)
You can also clearly see that in the first case, I don't really need to care about matrix dimensions, and the final dimensions of the results variable are the same as the dimensions of your y variable. This is not the case in the second example, because you multiply matrices vec and y, which (in this particular example) results in error, as the vec variable has dimensions 1x3 and the y variable 11x1.
If you want to investigate this, I recommend you split this up into subexpressions and debug, e.g.
f1 = #(x) x(1);
f2 = #(x) x(2).^2;
f3 = #(x) exp(x(3)*y);
f = #(x) f1(x) + f1(x) + f3(x)
You can split it up even further if any subexpression is unclear.
The distinction is that one is an array array multiplication (x * y, I'm assuming x is an array with 11 columns in order for the matrix multiplication to be consistent) and the other is a scalar array multiplication (x(3) * y). The subscript operator (n) for any matrix extracts the n-th value from that matrix. For a scalar, the index can only be 1. For a 1D array, it extracts the n-th element of the column/row vector. For a 2D array, its the n-th element when traversed columnwise.
Also, if you only require the first derivative, I suggest using complex-step differentiation. It provides reduced numerical error and is computationally efficient.

Split, group and mean: computation with arrays

A is a given N x R xT array. I must split it horizontally to N sub-arrays of size L x M and then group each z together in an array K and take a mean.
For Example: A is the array rand(N,R,T)= rand( 16, 3 ,3); Now I am going to split it:
A=rand( 16, 3 ,3) : A(1,:,:), A(2,:,:), A(3,:,:), A(4,:,:), ... , A(16,:,:).
I have 16 slices.
B_1=A(1,:,:); B_2=A(2,:,:); B_3=A(3,:,:); ... ; B_16=A(16,:,:);
The next step is grouping together every 3 ( for example).
Now I am going create K_i as :
The average array is found as:
C_1=[B_1 + B_2 + B_3]/3
C_8= [ B_14 + B_15 + B_16] /3
I have implemented it as:
A_reshape = reshape(squeeze(A), size(A,2), size(A,3),2, []);
mean_of_all_slices = permute(mean(A_reshape , 3), [1 2 4 3]);
Question 1 I have checked by hand. It gives me a wrong result. How to fix it? [SOLVED]
EDIT 2 I need to simulate the following computation:
take a product each slice of the array K_i with another array P_p: It means:
for `K_1` is given `P_1`): `B_1 * P_1` , `B_2 * P_1`, `B_3 * P_1`
for `K_8` is given `P_8`): `B_14 * P_8` , `B_15 * P_8`, `B_16 * P_8`
I have solved!!!
Disclaimer: this answers a previous version of the question.
In cases such as this I would suggest relying on built-ins, which have a predictable behavior. In your case, this would be movmean (introduced in R2016a):
WIN_SZ = 2; % Window size for averaging
AVG_DIM = 1; % Dimension for averaging
tmp = movmean(A, WIN_SZ , AVG_DIM ,'Endpoints', 'discard');
C = tmp(1:WINDOW_SZ:end, :, :); % This only selects A1+A2, A3+A4 etc.
If your MATLAB is a bit older, this can also be done using convolution (convn, introduced before R2006):
WIN_SZ = 3;
tmp = convn(A, ones(WIN_SZ ,1)./WIN_SZ, 'valid'); % Shorter than A in dim1 by (WIN_SZ-1)
C = tmp(1:WINDOW_SZ:end, :, :); % dim1 size is: ceil((size(A,1)-(WIN_SZ-1))/3)
BTW, the step where you create B from slices of A can be done using
B = num2cell(A,[2,3]); % yields a 16x1 cell array of 1x3x3 double arrays

Matlab parfor slice correctly

I have two nested loops which I want to parallelize.
r = zeros(1,m);
for i=1:n
q = ones(1,m);
for j=1:n
q = q .* (xx-x(j))/(x(i)-x(j));
r = r + q;
In order to prepare this function for palatalization, I changed local variables to global ones.
r = ones(n,m);
for i=1:n
for j=1:n
r(i,:) = r(i,:) .* (xx-x(j))/x(i)-x(j))
r = sum(r,1);
Instead of transforming a whole vector at once, let's try it with only one scalar. Also use the simplest element of x which depends on i and j. I also removed the sum in the end. We can add it back later.
r = ones(n,1);
for i=1:n
for j=1:n
y = x(i)+x(j);
r(i) = r(i) * y;
The code above is the example function, I want to parallelize.
The inner loop always needs to access the same vector r(i) for one iteration of the outer loop i. This access is a write operation (*=), but the order doesn't matter for this operation.
Since nested parfor loops are not allowed in Matlab, I tried to pack everything in one parfor loop.
r = ones(n,1);
parfor k=1:(n*n)
%i = floor((k-1)/n)+1; % outer loop
%j = mod(k-1,n)+1; % inner loop
[j,i] = ind2sub([n,n],k);
y = x(i)+x(j);
r(i) = r(i) * y; % ERROR here
Since indies are calculated, Matlab still doesn't know hot to slice it.
So, I decided to move the multiplication operation outside and use linear indices.
r = ones(n,n);
parfor k=1:(n*n)
[j,i] = ind2sub([n,n],k);
y = x(i)+x(j);
r(k) = y;
r = prod(r,1);
r = squeeze(r); % remove singleton dimensions
While this does work for scalar values in the inner loop, it doesn't work for vectors in the inner loop since indices must be again calculated.
r = ones(n,n,m);
parfor k=1:(n*n)
[j,i] = ind2sub([n,n],k);
y = x(i)+x(j);
r((k-1)*m+1:k*m) = y.*(1:m); % ERROR here
r = prod(r,1);
r = squeeze(r); % remove singleton dimensions
Although it does work, when I reshape the array.
r = ones(n*n,m);
parfor k=1:(n*n)
[j,i] = ind2sub([n,n],k);
y = x(i)+x(j);
r(k,:) = y.*(1:m); % ERROR here
r = reshape(r,n,n,m);
r = prod(r,2);
r = squeeze(r); % remove singleton dimensions
This way, I can transform a vector xx to another vector r.
r = ones(n*n,m);
parfor k=1:(n*n)
[j,i] = ind2sub([n,n],k);
y = x(i)+x(j);
r(k,:) = y.*xx; % ERROR here
r = reshape(r,n,n,m);
r = prod(r,2);
r = sum(r,1);
r = reshape(r,size(xx)); % reshape output vector to input vector
For my parallel solution, I need an n*n*m array instead of a n*m array which seems quite inefficient.
Is there a better way of doing what I want?
What are the advantages of other ways (prettier code, less CPU, less RAM, ...)?
In the order of trying to simplify the task and reduce it to the minimum working example of the problem, I omitted the check of i~=j to make it easier, although resulting in an all NaN result. Further, the nature of the code results in an all 1 result when adding this check. In order for the code to make sense, the factors are just weights for another vector z.
The more elaborate problem looks as follows:
r = zeros(1,m);
for i=1:n
q = ones(1,m);
for j=1:n
if i~=j
q = q .* (xx-x(j))/(x(i)-x(j));
r = r + z(i) .* q;
This problem does not need any parallel for loop to execute. One problem is that x(i)-x(j) is redundandly calculated a lot of times. This is inefficient. The approach suggested calculates every number exactly once and it vectorize the operations for each element in xx. Since xx is the shortest vector by far it is almost completely vectorized. In case you want to vectorize the last loop as well this will probably just be like a hidden for loop as well, it will much more memory and the code would be more complicated (like 3D matrices and so). I took the freedom to switch minus to plus in the denominator just for testing. Minus would generate NaN for all numbers. The last approach is slightly faster. About 10 times for n=10000. I suggest you try a bit more elaborate benchmark.
function test()
% Initiate variables
% Alternative 1
r = zeros(1,m);
for i=1:n
q = ones(1,m);
for j=1:n
q = q .* (xx-x(j))/(x(i)+x(j));
r = r + q;
% Alternative 2
xden = bsxfun(#plus, x, x.'); % Calculate denominator
xnom = repmat(x,n,1); % Calculate nominator
xfull = (xnom./xden).'; % calculate right term on rhs.
for (k = 1:m)
tmp= prod(xx(k)./xden - xfull); % Split in 2 calculations
r2(k) = sum(tmp); % "r = r + xx(k)"
Just a note in the end. Alternative 2 is faster but it is also memory expensive, so in case of memory issues a loop is to prefer. Further, there is no need for global variables in case of parallelization. In case you need this you probably have to look over your design (but in case the code is short there is not some critical, so then you should not need to bother so much).

efficient way to remember array index of two large arrays

I have two Fortran arrays in 2 and 3 dimensions, say a(nx,ny) and b(nx,ny,nz). In array a, I need to find out the satisfied points, say values > 0. Then I need to locate the vectors in array b having the same indexes of x and y of those satisfied points in a. What is the easiest and fast way to do it? The two arrays are big, and I don't want to search one element by one element. Hope I explain my problem clearly! thanks!
I'm not sure that this is the best method, but here's what I would do:
Put a where clause inside a do loop over the z-values. You can first get a 2D map of valid indices into a logical array if you don't want to recalculate the points every time:
program indices
implicit none
integer, parameter :: nx = 3000, ny = 400, nz = 500
integer, dimension(nx, ny) :: a
integer, dimension(nx, ny, nz) :: b
logical, dimension(nx, ny) :: valid_points
integer :: x, y, z
do y = 1, ny
do x = 1, nx
a(x, y) = x - y
end do
end do
valid_points = (a > 0)
do z = 1, nz
b(:, :, z) = z
else where
b(:, :, z) = 0
end where
end do
end program indices

R: Aggregate on Group 1 and NOT Group 2

I am trying to create two data sets, one which summarizes data by 2 groups which I have done using the following code:
x = rnorm(1:100)
g1 = sample(LETTERS[1:3], 100, replace = TRUE)
g2 = sample(LETTERS[24:26], 100, replace = TRUE)
aggregate(x, list(g1, g2), mean)
The second needs to summarize the data by the first group and NOT the second group.
If we consider the possible pairs from the previous example:
A - X B - X C - X
A - Y B - Y C - Y
A - Z B - Z C - Z
The second dataset should to summarize the data as the average of the outgroup.
A - not X
A - not Y
A - not Z etc.
Is there a way to manipulate aggregate functions in R to achieve this?
Or I also thought there could be dummy variable that could represent the data in this way, although I am unsure how it would look.
I have found this answer here:
R using aggregate to find a function (mean) for "all other"
I think this indicates that a dummy variable for each pairing is necessary. However if there is anyone who can offer a better or more efficient way that would be appreciated, as there are many pairings in the true data set.
Thanks in advance
First let us generate the data reproducibly (using set.seed):
# same as question but added set.seed for reproducibility
x = rnorm(1:100)
g1 = sample(LETTERS[1:3], 100, replace = TRUE)
g2 = sample(LETTERS[24:26], 100, replace = TRUE)
Now we have two solutions both of which use aggregate:
1) ave
# x equals the sums over the groups and n equals the counts
ag = cbind(aggregate(x, list(g1, g2), sum),
n = aggregate(x, list(g1, g2), length)[, 3])
ave.not <- function(x, g) ave(x, g, FUN = sum) - x
x = NULL, # don't need x any more
n = NULL, # don't need n any more
mean = x/n,
mean.not = ave.not(x, Group.1) / ave.not(n, Group.1)
This gives:
Group.1 Group.2 mean mean.not
1 A X 0.3155084 -0.091898832
2 B X -0.1789730 0.332544353
3 C X 0.1976471 0.014282465
4 A Y -0.3644116 0.236706489
5 B Y 0.2452157 0.099240545
6 C Y -0.1630036 0.179833987
7 A Z 0.1579046 -0.009670734
8 B Z 0.4392794 0.033121335
9 C Z 0.1620209 0.033714943
To double check the first value under mean and under mean.not:
> mean(x[g1 == "A" & g2 == "X"])
[1] 0.3155084
> mean(x[g1 == "A" & g2 != "X"])
[1] -0.09189883
2) sapply Here is a second approach which gives the same answer:
ag <- aggregate(list(mean = x), list(g1, g2), mean)
f <- function(i) mean(x[g1 == ag$Group.1[i] & g2 != ag$Group.2[i]]))
ag$mean.not = sapply(1:nrow(ag), f)
REVISED Revised based on comments by poster, added a second approach and also made some minor improvements.
