Parallel vectorized reduction of subarrays in array without explicit iteration - arrays

I have a linear array arr which is is the combination of the elements of a number of smaller sub-arrays. The position of each sub-array in the large array is given by an offsets array.
For example
If arr = [1,2,4,3,6,4,8,9,12,3] and offsets = [0,2,4,9], then the sub-arrays are [1,2,4], [3,6] and [4,8,9,12,3].
Now, given the array arr and offsets, I would like to perform a parallel reduction on each subarray. Let's assume that we want to find the max() of each subarray. So, the result, in our example case, will be [4,6,12]. The important constraint in the case is that the operations should be strictly vectorized, and no explicit iteration in loops is allowed. For instance, in python, something like this isn't allowed:
import numpy
arr = numpy.array([1,2,4,3,6,4,8,9,12,3])
offsets = numpy.array([0,2,4,9])
max_array = numpy.empty(num_sub_arrays)
for i in range(num_sub_arrays): # Eliminate this loop!
max_array[i] = numpy.max(arr[offsets[i]:offsets[i+1]])
How can I go about achieving this? Also, please note that the code should be as general as possible, as I plan on implementing it across multiple hardware (CUDA, AVX etc).
Any implementable hints regarding it will be highly helpful!

Related

Pre-allocation of array of array

In julia, one can pre-allocate an array of a given type and dims with
A = Array{<type>}(undef,<dims>)
example for a 10x10 matrix of floats
A = Array{Float64,2}(undef,10,10)
However, for array of array pre-allocation, it does not seem to be possible to provide a pre-allocation for the underlying arrays.
For instance, if I want to initialize a vector of n matrices of complex floats I can only figure this syntax,
A = Vector{Array{ComplexF64,2}}(undef, n)
but how could I preallocate the size of each Array in the vector, except with a loop afterwards ? I tried e.g.
A = Vector{Array{ComplexF64,2}(undef,10,10)}(undef, n)
which obviously does not work.
Remember that "allocate" means "give me a contiguous chunk of memory, of size exactly blah". For an array of arrays, which is really a contiguous chunk of pointers to other contiguous chunks, this doesn't really make sense in general as a combined operation -- the latter chunks might just totally differ.
However, by stating your problem, you make clear that you actually have more structural information: you know that you have n 10x10 arrays. This really is a 3D array, conceptually:
A = Array{Float64}(undef, n, 10, 10)
At that point, you can just take slices, or better: views along the first axis, if you need an array of them:
[#view A[i, :, :] for i in axes(A, 1)]
This is a length n array of AbstractArrays that in all respects behave like the individual 10x10 arrays you wanted.
In the cases like you have described you need to use comprehension:
a = [Matrix{ComplexF64}(undef, 2,3) for _ in 1:4]
This allocates a Vector of Arrays. In Julia's comprehension you can iterate over more dimensions so higher dimensionality is also available.

Vectorizing access to a slice of a three-dimensional matrix in MATLAB

I have a three-dimensional matrix of these sizes, approximately
A = rand(20, 1000, 20);
where the first and third dimensions are always the same length. I want to zero the elements in a main diagonal slice. This does what I mean
for ii = 1:size(A, 1)
A(ii, :, ii) = 0;
end
Is there a vectorized or otherwise faster way to do this? This code runs about 100,000 times, with these approximate sizes, but not the exact same sizes each time.
You can use logical indexing for multible tailing dimensions while using subscript indexing for all previous dimensions individually. This way you can easily do the operation on an 1000 20 20 matrix. To apply this to your matrix, permute is required which might be slow:
n=size(A,3)
A=permute(A,[2,1,3]);
A(:,diag(true(n,1)))=0;
A=permute(A,[2,1,3]);
If it would be possible to permanently swap the dimensions of A in your code and avoid the permute, this would lead to the fastest solution.
Alternatively you can use repmat to expand the index to the dimensions of A
ix=repmat(reshape(diag(true(n,1)),n,1,n),[1,size(A,2),1])
A(ix)=0
For matrices of the same size you could keep ix. Not having access to MATLAB right now, I don't know which solution is faster.
You can use bsxfun to build a linear index of the elements to be zeroed:
ind = bsxfun(#plus, (0:size(A,2)-1).'*size(A,1), 1:size(A,1)*size(A,2)+1:numel(A) );
A(ind) = 0;

Algorithm complexity in 2D arrays

Let's suppose that I have an array M of n*m elements, so if I want to print its elements I can do something like:
for i=1 to m
for j=1 to n
print m[i,j]
next j
next i
I know that the print instruction is done in constant time, so in this case I would have an algorithm complexity of:
\sum_{i=1}^{m}\sum_{j=1}^{n}c=m.n.c
so I suppose is in the order of O(n)
But what happens if the array has the same number of rows and columns, I suppose the complexity is:
\sum_{i=1}^{n}\sum_{j=1}^{n}c=n.n.c
so it is of order O(n^{2})
are my assumptions correct?
I'm assuming that m and n are variables and not constants. In that case, the runtime of the algorithm should be O(mn), not O(n), since the runtime is directly proportional to the number of elements in the array. You derived this with a summation, but it might be easier to see by just looking at how much work is done per array element. Given this, you're correct that if m = n, the runtime is quadratic on n.
Hope this helps!

how to make matlab loop over 2d array faster

I have the above loop running on the above variables:
A is a 2d array of size mxn.
mask is a 1d logical array of size 1xn
results is a 1d array of size 1xn
B is a vector of the form mx1
C is a mxm matrix, m is the same as the above.
Edit: expanded foo(x) into the function.
here is the code:
temp = (B.'*C*B);
for k = 1:n
x = A(:,k);
if(mask(k) == 1)
result(k) = (B.'*C*x)^2 / (temp*(x.'*C*x)); %returns scalar
end
end
take note, I am already successfully using the above code as a parfor loop instead of for. I was hoping you would be able to suggest some way to use meshgrid or the sort to yield better performance improvement. I don't think I have RAM problems so a solution can also be expensive memory wise.
Many thanks.
try this:
result=(B.'*C*A).^2./diag(temp*(A.'*C*A))'.*mask;
This vectorization via matrix multiplication will also make sure that result is a 1xn vector. In the code you provided there can be a case where the last elements in mask are zeros, in this case your code will truncate result to a smaller length, whereas, in the answer it'll keep these elements zero.
If your foo admits matrix input, you could do:
result = zeros(1,n); % preallocate result with zeros
mask = logical(mask); % make mask logical type
result(mask) = foo(A(mask),:); % compute foo for all selected columns

How to "invert" an array in linear time functionally rather than procedurally?

Say I have an array of integers A such that A[i] = j, and I want to "invert it"; that is, to create another array of integers B such that B[j] = i.
This is trivial to do procedurally in linear time in any language; here's a Python example:
def invert_procedurally(A):
B = [None] * (max(A) + 1)
for i, j in enumerate(A):
B[j] = i
return B
However, is there any way to do this functionally (as in functional programming, using map, reduce, or functions like those) in linear time?
The code might look something like this:
def invert_functionally(A):
# We can't modify variables in FP; we can only return a value
return map(???, A) # What goes here?
If this is not possible, what is the best (most efficient) alternative when doing functional programming?
In this context are arrays mutable or immutable? Generally I'd expect the mutable case to be about as straightforward as your Python implementation, perhaps aside from a few wrinkles with types. I'll assume you're more interested in the immutable scenario.
This operation inverts the indices and elements, so it's also important to know something about what constitutes valid array indices and impose those same constraints on the elements. Haskell has a class for index constraints called Ix. Any Ix type is ordered and has a range implementation to make an ordered list of indices ranging from one specified index to another. I think this Haskell implementation does what you want.
import Data.Array.IArray
invertArray :: (Ix x) => Array x x -> Array x x
invertArray arr = listArray (low,high) newElems
where oldElems = elems arr
newElems = indices arr
low = minimum oldElems
high = maximum oldElems
Under the hood listArray uses zipWith and range to associate indices in the specified range to the listed elements. That part ought to be linear time, and so is the one-time operation of extracting elements and indices from an array.
Whenever the sets of the input arrays indices and elements differ some elements will be undefined, which for better or worse blow up faster than Python's None. I believe you could overcome the undefined issue by implementing new Ix a instances over the Maybe monad, for instance.
Quick side-note: check out the invPerm example in the Haskell 98 Library Report. It does something similar to invertArray, but assumes up front that input array's elements are a permutation of its indices.
A solution needing mapand 3 operations:
toTuples views an the array as a list of tuples (i,e) where i is the index and e the element in the array at that index.
fromTuples creates and loads an array from a list of tuples.
swap which takes a tuple (a,b) and returns (b,a)
Hence the solution would be (in Haskellish notation):
invert = fromTuples . map swap . toTuples

Resources