Fastest way to multiply arrays of matrices in Python (numpy) - arrays

I have two arrays of 2-by-2 complex matrices, and I was wondering what would be the fastest method of multiplying them. (I want to do matrix multiplication on the elements of the matrix arrays.) At present, I have
numpy.array(map(lambda i: numpy.dot(m1[i], m2[i]), range(l)))
But can one do better than this?
Thanks,
v923z

numpy.einsum is the optimal solution for this problem, and it is mentioned way down toward the bottom of DaveP's reference. The code is clean, very easy to understand, and an order of magnitude faster than looping through the array and doing the multiplication one by one. Here is some sample code:
import numpy
l = 100
m1 = rand(l,2,2)
m2 = rand(l,2,2)
m3 = numpy.array(map(lambda i: numpy.dot(m1[i], m2[i]), range(l)))
m3e = numpy.einsum('lij,ljk->lik', m1, m2)
%timeit numpy.array(map(lambda i: numpy.dot(m1[i], m2[i]), range(l)))
%timeit numpy.einsum('lij,ljk->lik', m1, m2)
print np.all(m3==m3e)
Here are the return values when run in an ipython notebook:
1000 loops, best of 3: 479 µs per loop
10000 loops, best of 3: 48.9 µs per loop
True

I think the answer you are looking for is here. Unfortunately it is a rather messy solution involving reshaping.

If m1 and m2 are 1-dimensional arrays of 2x2 complex matrices, then they essentially have shape (l,2,2). So matrix multiplication on the last two axes is equivalent to summing the product of the last axis of m1 with the second-to-last axis of m2. That's exactly what np.dot does:
np.dot(m1,m2)
Or, since you have complex matrices, perhaps you want to take the complex conjugate of m1 first. In that case, use np.vdot.
PS. If m1 is a list of 2x2 complex matrices, then perhaps see if you can rearrange your code to make m1 an array of shape (l,2,2) from the outset.
If that is not possible, a list comprehension
[np.dot(m1[i],m2[i]) for i in range(l)]
will be faster than using map with lambda, but performing l np.dots is going to be slower than doing one np.dot on two arrays of shape (l,2,2) as suggested above.

If m1 and m2 are 1-dimensional arrays of 2x2 complex matrices, then they essentially have shape (l,2,2). So matrix multiplication on the last two axes is equivalent to summing the product of the last axis of m1 with the second-to-last axis of m2. That's exactly what np.dot does:
But that is not what np.dot does.
a = numpy.array([numpy.diag([1, 2]), numpy.diag([2, 3]), numpy.diag([3, 4])])
produces a (3,2,2) array of 2-by-2 matrices. However, numpy.dot(a,a) creates 6 matrices, and the result's shape is (3, 2, 3, 2). That is not what I need. What I need is an array holding numpy.dot(a[0],a[0]), numpy.dot(a[1],a[1]), numpy.dot(a[2],a[2]) ...
[np.dot(m1[i],m2[i]) for i in range(l)]
should work, but I haven't yet checked, whether it is faster that the mapping of the lambda expression.
Cheers,
v923z
EDIT: the for loop and the map runs at about the same speed. It is the casting to numpy.array that consumes a lot of time, but that would have to be done for both methods, so there is no gain here.

May be it is too old question but i was still searching for an answer.
I tried this code
a=np.asarray(range(1048576),dtype='complex');b=np.reshape(a//1024,(1024,1024));b=b+1J*b
%timeit c=np.dot(b,b)
%timeit d=np.einsum('ij, ki -> jk', b,b).T
The results are : for 'dot'
10 loops, best of 3: 174 ms per loop
for 'einsum'
1 loops, best of 3: 4.51 s per loop
I have checked that c and d are same
(c==d).all()
True
still 'dot' is the winner, I am still searching for a better method but no success

Related

Broadcast function that changes dimension of the input array

Given some function f that accepts 1D array and gives 2D array, is it possible to apply it efficiently for each row of the NxM array A?
More specifically, I want to apply np.triu for each of the row of the NxM array A and then concatenate all the results. I can achieve this by
B = np.dstack(map(np.triu, A))
which gives MxMxN matrix. However, this is not very efficiently for large N. Unfortunately, the function np.apply_along_axis cannot be employed here because f changes dimension.
Knowing the power of NumPy for efficient broadcasting, I am almost sure that there exists a better solution for my problem.
Here's a vectorized approach using broadcasting -
Bout = A.T*(np.tri(A.shape[1],dtype=bool).T[...,None])
Runtime test and output verification -
In [319]: A = np.random.randint(0,20,(400,100))
In [320]: %timeit np.dstack(map(np.triu, A))
10 loops, best of 3: 69.9 ms per loop
In [321]: %timeit A.T*(np.tri(A.shape[1],dtype=bool).T[...,None])
10 loops, best of 3: 24.8 ms per loop
In [322]: B = np.dstack(map(np.triu, A))
In [323]: Bout = A.T*(np.tri(A.shape[1],dtype=bool).T[...,None])
In [324]: np.allclose(B,Bout)
Out[324]: True

Conditional Sum in Array

I have 2 arrays, A and B. I want to form a new array C with same dimension as B where each element will show SUM(A) for A > B
Below is my working code
A = [1:1:1000]
B=[1:1:100]
for n = 1:numel(B)
C(n) = sum(A(A>B(n)));
end
However, when A has millions of rows and B has thousands, and I have to do similar calculations for 20 array-couples,it takes insane amount of time.
Is there any faster way?
For example, histcounts is pretty fast, but it counts, rather than summing.
Thanks
Depending on the size of your arrays (and your memory limitations), the following code might be slightly faster:
C = A*bsxfun(#gt,A',B);
Though it's vectorized, however, it seems to be bottlenecked (perhaps) by the allocation of memory. I'm looking to see if I can get a further speedup. Depending on your input vector size, I've seen up to a factor of 2 speedup for large vectors.
Here's a method that is a bit quicker, but I'm sure there is a better way to solve this problem.
a=sort(A); %// If A and B are already sorted then this isn't necessary!
b=sort(B);
c(numel(B))=0; %// Initialise c
s=cumsum(a,2,'reverse'); %// Get the partial sums of a
for n=1:numel(B)
%// Pull out the sum for elements in a larger than b(n)
c(n)=s(find(a>b(n),1,'first'));
end
According to some very rough tests, this seems to run a bit better than twice as fast as the original method.
You had the right ideas with histcounts, as you are basically "accumulating" certain A elements based on binning. This binning operation could be done with histc. Listed in this post is a solution that starts off with similar steps as listed in #David's answer and then uses histc to bin and sum up selective elements from A to get us the desired output and all of it in a vectorized manner. Here's the implementation -
%// Sort A and B and also get sorted B indices
sA = sort(A);
[sB,sortedB_idx] = sort(B);
[~,bin] = histc(sB,sA); %// Bin sorted B onto sorted A
C_out = zeros(1,numel(B)); %// Setup output array
%// Take care of the case when all elements in B are greater than A
if sA(1) > sB(end)
C_out(:) = sum(A);
end
%// Only do further processing if there is at least one element in B > any element in A
if any(bin)
csA = cumsum(sA,'reverse'); %// Reverse cumsum on sorted A
%// Get sum(A(A>B(n))) for every n, but for sorted versions
valid_mask = cummax(bin) - bin ==0;
valid_mask2 = bin(valid_mask)+1 <= numel(A);
valid_mask(1:numel(valid_mask2)) = valid_mask2;
C_out(valid_mask) = csA(bin(valid_mask)+1);
%// Rearrange C_out to get back in original unsorted version
[~,idx] = sort(sortedB_idx);
C_out = C_out(idx);
end
Also, please remember when comparing the result from this method with the one from the original for-loop version that there would be slight variations in output as this vectorized solution uses cumsum which computes a running summation and as such would have large cumulatively summed numbers being added to individual elements that are comparatively very small, whereas the for-loop version
would sum only selective elements. So, floating-precision issues would come up there.

Apply an R function over multiple arrays, returning an array of the same size

I have two arrays of 2x2 matrices, and I'd like to apply a function over each pair of 2x2 matrices. Here's a minimal example, multiplying each matrix in A by its corresponding matrix in B:
A <- array(1:20, c(5,2,2))
B <- array(1:20, c(5,2,2))
n <- nrow(A)
# Desired output: array with dimension 5x2x2 that contains
# the product of each pair of 2x2 matrices in A and B.
C <- aperm(sapply(1:n, function(i) A[i,,]%*%B[i,,], simplify="array"), c(3,1,2))
This takes two arrays, each with 5 2x2 matrices, and multiplies each pair of 2x2 matrices together, with the desired result in C.
My current code is this ugly last line, using sapply to loop through the first array dimension and pull out each 2x2 matrix separately from A and B. And then I need to permute the array dimensions with aperm() in order to have the same ordering as the original arrays (sapply(...,simplify="array") indexes each 2x2 matrix using the third dimension rather than the first one).
Is there a nicer way to do this? I hate that ugly function(i) in there, which is really just a way of faking a for loop. And the aperm() call makes this much less readable. What I have now works fine; I'm just searching for something that feels more like idiomatic R.
mapply() will take multiple lists or vectors, but it doesn't seem to work with arrays. aaply() from plyr is also close, but it doesn't take multiple inputs. The closest I've come is to use abind() with aaply() to pack A and B into one array work with 2 matrices at once, but this doesn't quite work (it only gets the first two entries; somewhere my indexing is off):
aaply(.data=abind(A,B,along=0), 1, function(ab) ab[1,,]%*%ab[2,,])
And this isn't exactly cleaner or clearer anyway!
I've tried to make this a minimal example, but my real use case requires a more complicated function of the matrix pairs (and I'd also love to scale this up to more than two arrays), so I'm looking for something that will generalize and scale.
D <- aaply(abind(A, B, along = 4), 1, function(x) x[,,1] %*% x[,,2])
This is a working solution using abind and aaply.
Sometimes a for loop is the easiest to follow. It also generalizes and scales:
n <- nrow(A)
C <- A
for(i in 1:n) C[i,,] <- A[i,,] %*% B[i,,]
R's infrastructure for lists is much better (it seems) than for arrays, so I could also approach it by converting the arrays into lists of matrices like this:
A <- alply(A, 1, function(a) matrix(a, ncol=2, nrow=2))
B <- alply(A, 1, function(a) matrix(a, ncol=2, nrow=2))
mapply(function(a,b) a%*%b, A, B, SIMPLIFY=FALSE)
I think this is more straightforward than what I have above, but I'd still love to hear better ideas.

Vectors operations in matlab

How subtract each element of row vector of size 1xN from column vector Mx1
without using loop in MatLab?
N = 1:100
M = ones(1000,1)
You can use bsxfun as suggested by Daniel
out = bsxfun(#minus, N,M);
but it might be more obvious to use meshgrid or ndgrid to get the matrix you want:
out = meshgrid(N-1,M);
These two functions internally use repmat which is slower than bsxfun, so rather go for the first approach. And bsxfun is always the fastest solution anyway ;)

Cross products of elements of 3D array and matrix columns without loop in R

I'm working on a fishery stock assessment model and want to speed it up by removing a loop (actually two loops of the same form).
I have an array, A, dim(A)=[L,L,Y], and a matrix, M, dim(M)=[L,Y].
These are used to make a matrix, mat, dim(mat)=[L,Y], by calculating matrix products. My loop looks like:
for(i in 1:Y){
mat[,i]<-(A[,,i]%*%M[,i])[,1]}
Can anyone help me out? I really need a speed gain.
Also, (don't know if it'll make a difference but) each A[,,i] matrix is lower triangular.
I'm pretty sure this will give you the results you want. Since there is no reproducible example, I can't be absolutely sure. Had to trace some of the linear algebra logic to see what you are trying to accomplish.
library(plyr) # We need this to split the array into a list of 9 matrices
B = lapply(alply(A, 3), function(x) (x%*%M)) # Perform 9 linear algebra multiplications
sapply(1:9, function(i) (B[[i]])[,i]) # Extract the 9 columns you actually want.
I used the following test data:
A = array(rnorm(225), dim = c(5,5,9))
M = matrix(rnorm(45), nrow = 5, ncol = 9)

Resources