Summing a fortran array with mask - arrays

I have a fortran array a(i,j). I wish to sum it on dimension 2(j) with a mask that j is not equal to i.
i.e,
a1=0
do j=1,n
if(j.ne.i) then
a1=a1+a(i,j)
endif
enddo
What is the way of doing this using the intrinsic sum function in fortran as I found the intrinsic to be much faster than the explicit loop.
I thought of trying sum(a(i,:),j.ne.i), but this is naturally giving error. Also if one can suggest how to only some the values of a(i,:) where abs(a(i,j)) is greater than, say 0.01, it would be helpful.

You can easily avoid any branching for the off-diagonal case. It should be much faster than creating any mask array and checking the mask. Branching (conditional jumps) is costly even when branch prediction can be very efficient.
do j=1,n
do i = 1,j-1
a1=a1+a(i,j)
end do
do i = j+1,n
a1=a1+a(i,j)
end do
end do
If you need your code to be fast and not short, you should test this kind of approach. In my tests it is much faster.

To answer your last question, you can use the WHERE construct to build a mask. For example,
logical :: y(3,3) = .false.
real x(3,3)
x = 1
x(1,1) = 0.1
x(2,2) = 0.1
x(3,3) = 0.1
print * , sum(x)
where(abs(x) > 0.25) y = .true.
print *, sum(x,y)
end
Whether this is better than nested do-loops is questionable.

I find that summing the whole array then subtracting sum of diagonal elements can be 2x faster.
a1 = 0
do i = 1, n
a1 = a1 + a(i,i)
end do
a1 = sum(a) - a1
end do

Related

How to optimize conditional statement in for loop over image?

I'm wondering if there's an indexable way of doing the following code on Octave, as it's iterative and thus really slow compared to working with indexation.
for i = [1:size(A, 1)]
for j = [1:size(A, 2)]
if (max(A(i, j, :)) == 0)
A(i, j, :) = B(i, j, :);
endif
endfor
endfor
A and B are two RGB images that overlaps and I want A(i,j) to have B(i,j) value if A(i,j) is 0 on all of the three channels. It is very slow in this form but I'm not experimented enough with this language to vectorize it.
Your code can be vectorized as follows:
I = max(A,[],3) == 0;
I = repmat(I,1,1,3);
A(I) = B(I);
The first line is a direct copy of your max conditional statement within the loop, but vectorized across all of A. This returns a 2D array, which we cannot directly use to index into the 3D arrays A and B, so we apply repmat to replicate it along the 3rd dimension (the 3 here is the number of repetitions, we're assuming A and B are RGB images with 3 elements along the 3rd dimension). Finally, an indexed assignment copies the relevant values over from B to A.
To generalize this to any array size, replace the "3" in the repmat statement with size(A,3).
Not adding much here, but perhaps this will give you a better understanding so worth adding another solution.
% example data
A = randi( 255, [2,4,3] ); A(2,2,:) = [0,0,0];
B = randi( 255, [2,4,3] );
% Logical array with size [Dim1, Dim2], such that Dim3 is 'squashed' into a
% single logical value at each position, indicating whether the third dimension
% at that position does 'not' have 'any' true (i.e. nonzero) values.
I = ~any(A, 3);
% Use this to index A and B for assignment.
A([I,I,I]) = B([I,I,I])
This approach may be more efficient than the repmat one, which is a slightly more expensive operation, but may be slightly less obvious to understand why it works. But. Understanding how this works teaches you something about matlab/octave, so it's a nice learning point.
Matlab and Octave store arrays in column major order (as opposed to, say, Python). This is also the reason that doing A(:) will return A as a vector, constructed in a column-by-column basis. It is also the reason that you can index a 3-dimensional array using a single index (called a "linear index"), which will correspond to the element you reach when you count that number of elements going down columns.
When performing logical indexing, matlab/octave effectively takes a logical vector, matches each linear index of that vector to the equivalent linear index of A and decides whether to return it or not, based on whether the boolean value of the logical index at that linear index is true or false. If you provide a logical array I that is of a smaller size than A, the indexing will simply stop at the last linear index of I. Specifically, note that the shape of I is irrelevant, since it will be interpreted in a linear indexing manner anyway.
In other words, logical indexing with I is the same as logical indexing with I(:), and logical indexing with [I,I,I] is the same as logical indexing with [ I(:); I(:); I(:) ].
And if I is of size A(:,:,1) then [I,I,I] is of size A(:,:,:), such that in a linear indexing sense it can be used as a valid logical index matching each linear index of I to the equivalent linear index of A.
The max() function can take a single matrix and return the maximum value along a dimension
There's also the all() function that tells you if all values along a dimension are nonzero, and the any() function that tells you if any of the values along a dimension are nonzero
A = reshape(1:75, 5, 5, 3)
A(2, 3, :) = 0;
B = ones(size(A)) * 1000
use_pixel_from_A = any(A, 3)
use_pixel_from_B = ~use_pixel_from_A
Now for each element of the 3rd axis, you know which pixels to take from A and which to take from B. Since our use_pixel... matrices contain 0 and 1, we can element-wise multiply them to A and B to filter out elements of A and B as required.
C = zeros(size(A));
for kk = 1:size(A, 3)
C(:, :, kk) = A(:, :, kk) .* use_pixel_from_A + B(:, :, kk) .* use_pixel_from_B
end

Trace of an array using intrinsic SUM function in FORTRAN

Is it possible to use the intrinsic SUM function to calculate the trace of an array (of rank > 1)?
Currently, I am using a do loop to calculate trace.
trace = 0.0d0
do i = 1, 10
trace = trace + a(i,i)
end do
TL/DR: Your method is fine, use that.
Slightly longer:
You can use a mask, but that is less readable, slower, and far more error prone:
sum(a, mask = &
reshape((/ (mod(i, size(a, 1)+1) == 1, i=1, size(a)) /), &
shape(a) ))
You can use an implied do loop to create a new temporary array of just the diagonal elements:
sum( (/ (a(i,i), i=1, size(a, 1)) /) )
Again, this is less efficient, as the program has to create a new array, and I don't think that it's more readable than your version.

a faster way to compute the error of a vector

For a given vector $(x_1,x_2,\ldots, x_n)$ I am trying to compute
I wrote the following code
for l = 1:n
for k = 1:n
error = error + norm(x(i)-x(j))
end
end
This code is not fast, especially when $n$ is large. I am aware that I am double counting actually... But how may I avoid it? How can I speed up my code?
Thank you!
You can do it with bsxfun, which is fast:
d = (abs(bsxfun(#minus, x, x.')));
result = sum(d(:));
Or alternatively use pdist with 'cityblock' distance (which for one-dimensional observations reduces to absolute difference). This computes each distance once, so you need to multiply the sum by 2:
result = 2*sum(pdist(x(:),'cityblock'));
How about a simple speed up?
for a=1:n
for b=a+1:n
error = error + 2*norm(x(a)-x(b))
end
end
For a scalar, norm just gives abs.
So,
error = sum(abs( bsxfun(#minus, error,error') ))
will do the same thing.
also check out pdist which will do this for vectors, using vector norms, in an even faster way.

How to improve the execution time of this function?

Suppose that f(x,y) is a bivariate function as follows:
function [ f ] = f(x,y)
UN=(g)1.6*(1-acos(g)/pi)-0.8;
f= 1+UN(cos(0.5*pi*x+y));
end
How to improve execution time for function F(N) with the following code:
function [VAL] = F(N)
x=0:4/N:4;
y=0:2*pi/1000:2*pi;
VAL=zeros(N+1,3);
for i = 1:N+1
val = zeros(1,N+1);
for j = 1:N+1
val(j) = trapz(y,f(0,y).*f(x(i),y).*f(x(j),y))/2/pi;
end
val = fftshift(fft(val))/N;
l = (length(val)+1)/2;
VAL(i,:)= val(l-1:l+1);
end
VAL = fftshift(fft(VAL,[],1),1)/N;
L = (size(VAL,1)+1)/2;
VAL = VAL(L-1:L+1,:);
end
Note that N=2^p where p>10, so please consider the memory limitations while optimizing the code using ndgrid, arrayfun, etc.
FYI: The code intends to find the central 3-by-3 submatrix of the fftn of
fun=#(a,b) trapz(y,f(0,y).*f(a,y).*f(b,y))/2/pi;
where a,b are in [0,4]. The key idea is that we can save memory using the code above specially when N is very large. But the execution time is still an issue because of nested loops. See the figure below for N=2^2:
This is not a full answer, but some possibly helpful hints:
0) The trivial: Are you sure you need numerics? Can't you do the computation analytically?
1) Do not use function handles:
function [ f ] = f(x,y)
f= 1+1.6*(1-acos(cos(0.5*pi*x+y))/pi)-0.8
end
2) Simplify analytically: acos(cos(x)) is the same as abs(mod(x + pi, 2 * pi) - pi), which should compute slightly faster. Or, instead of sampling and then numerically integrating, first integrate analytically and sample the result.
3) The FFT is a very efficient algorithm to compute the full DFT, but you don't need the full DFT. Since you only want the central 3 x 3 coefficients, it might be more efficient to directly apply the DFT definition and evaluate the formula only for those coefficients that you want. That should be both fast and memory-efficient.
4) If you repeatedly do this computation, it might be helpful to precompute DFT coefficients. Here, dftmtx from the Signal Processing toolbox can assist.
5) To get rid of the loops, think about the problem not in the form of computation instructions, but a single matrix operation. If you consider your input N x N matrix as a vector with N² elements, and your output 3 x 3 matrix as a 9-element vector, then the whole operation you apply (numerical integration via trapz and DFT via fft) appears to be a simple linear transform, which it should be possible to express as an N² x 9 matrix.

Indicies of zero ranges in a zero-one matrix

I am using Matlab for one of my projects. I am actually stuck at a point since some time now. Tried searching on google, but, not much success.
I have an array of 0s and 1s. Something like:
A = [0,0,0,1,1,1,1,1,0,0,1,1,1,1,1,1,0,0,0,0,0,1,1,1,0,0,0,0];
I want to extract an array of indicies: [x_1, x_2, x_3, x_4, x_5, ..]
Such that x_1 is the index of start of first range of zeros. x_2 is the index of end of first range of zeros.
x_3 is the index of start of second range of zeros. x_4 is the index of end of second range of zeros.
For the above example:
x_1 = 1, x_2 = 3
x_3 = 9, x_4 = 10
and so on.
Of course, I can do it by writing a simple loop. I am wondering if there is a more elegant (vectorized) way to solve this problem. I was thinking about something like prefix some, but, no luck as of now.
Thanks,
Anil.
The diff function is great for this sort of stuff and pretty quick.
temp = diff(A);
Starts = find([A(1) == 0, temp==-1]);
Ends = find([temp == 1,A(end)==0])
Edit: Fixed the error in the Ends calculation caught by gnovice.
Zeros not preceded by other zeros: A==0 & [true A(1:(end-1))~=0]
Zeros not followed by other zeros: A==0 & [A(2:end)~=0 true]
Use each of these plus find to get starts and ends of runs of zeros. Then, if you really want them in a single vector as you described, interleave them.
If you want to get your results in a single vector like you described above (i.e. x = [x_1 x_2 x_3 x_4 x_5 ...]), then you can perform a second-order difference using the function DIFF and find the points greater than 0:
x = find(diff([1 A 1],2) > 0);
EDIT:
The above will work for the case when there are at least 2 zeroes in every string of zeroes. If you will have single zeroes appearing in A, the above can be modified to handle them like so:
diffA = diff([1 A 1],2);
[~,x] = find([diffA > 0; diffA == 2]);
In this case, a single zero value will create repeated indices in x (i.e. if A starts with a single zero, then x(1) and x(2) will both be 1).

Resources