Given some function f that accepts 1D array and gives 2D array, is it possible to apply it efficiently for each row of the NxM array A?
More specifically, I want to apply np.triu for each of the row of the NxM array A and then concatenate all the results. I can achieve this by
B = np.dstack(map(np.triu, A))
which gives MxMxN matrix. However, this is not very efficiently for large N. Unfortunately, the function np.apply_along_axis cannot be employed here because f changes dimension.
Knowing the power of NumPy for efficient broadcasting, I am almost sure that there exists a better solution for my problem.
Here's a vectorized approach using broadcasting -
Bout = A.T*(np.tri(A.shape[1],dtype=bool).T[...,None])
Runtime test and output verification -
In [319]: A = np.random.randint(0,20,(400,100))
In [320]: %timeit np.dstack(map(np.triu, A))
10 loops, best of 3: 69.9 ms per loop
In [321]: %timeit A.T*(np.tri(A.shape[1],dtype=bool).T[...,None])
10 loops, best of 3: 24.8 ms per loop
In [322]: B = np.dstack(map(np.triu, A))
In [323]: Bout = A.T*(np.tri(A.shape[1],dtype=bool).T[...,None])
In [324]: np.allclose(B,Bout)
Out[324]: True
Related
I have an n x p matrix that looks like this:
n = 100
p = 10
x <- matrix(sample(c(0,1), size = p*n, replace = TRUE), n, p)
I want to create an n x p x p array A whose kth item along the 1st dimension is a p x p diagonal matrix containing the elements of x[k,]. What is the most efficient way to do this in R? I'm looking for a way that uses outer (or some other vectorized approach) rather than one of the apply functions.
Solution using lapply:
A <- aperm(simplify2array(lapply(1:nrow(x), function(i) diag(x[i,]))), c(3,2,1))
I'm looking for something more efficient than this.
Thanks.
As a starting point, here is a humble for loop method with pre-allocation of the matrix.
# pre-allocate matrix of desired size
myArray <- array(0, dim=c(ncol(x), ncol(x), nrow(x)))
# fill in array
for(i in seq_len(nrow(x))) myArray[,,i] <- diag(x[i,])
It should run relatively fast. On my machine, for a 1000 X 100 matrix, the lapply method took 0.87 seconds, while the for loop (including the array pre-allocation) took 0.25 seconds to transform the matrix into to your desired array. So the for loop was about 3.5 times faster.
transpose your original matrix
Note also that row operations on R matrices tend to be slower than column operations. This is because matrices are stored in memory by column. If you transpose your matrix, and perform the operation this way, the time to complete the operation on 100X1000 matrix drops to 0.14, half that of the first for loop, and 7 times faster than the lapply method.
How subtract each element of row vector of size 1xN from column vector Mx1
without using loop in MatLab?
N = 1:100
M = ones(1000,1)
You can use bsxfun as suggested by Daniel
out = bsxfun(#minus, N,M);
but it might be more obvious to use meshgrid or ndgrid to get the matrix you want:
out = meshgrid(N-1,M);
These two functions internally use repmat which is slower than bsxfun, so rather go for the first approach. And bsxfun is always the fastest solution anyway ;)
Does there exist a function similar to that of numpy's * operator for two arrays to multiply their elements in an element-wise manner, returning an array of the similar type?
For example:
#Lets define:
a = [0,1,2,3]
b = [1,2,3,4]
d = [[1,2] , [3,4], [5,6]]
e = [3,4,5]
#I want:
a * 2 == [2*0, 1*2, 2*2, 2*3]
a * b == [0*1, 1*2, 2*3, 3*4]
d * e == [[1*3, 2*3], [3*4, 4*4], [5*5, 6*5]]
d * d == [[1*1, 2*2], [3*3, 4*4], [5*5, 6*6]]
Note how * IS NOT regular matrix multiplication it is element-wise multiplication.
My current best solution is to write some c code, which does this, and import a compiled dll.
There must exist a better solution.
EDIT:
Using LabVIEW 2011 - Needs to be fast.
The first two multiplications can be done by using the 'multiply' primitive. Make sure the arrays in the second case are of the same length.
For the third multipllication you can use a for loop (with auto-indexing). This is needed because you need to instruct LabVIEW what the basic index is.
The last multiplication can (again) be done using the multiply primitive.
My result is different (opposite) from the previous posters. I generated a 4x1000 array of random numbers (magnitude 1000) which I multiplied by a 4x4 array of integers (1,2,3,4,...). I did this 100,000 times using the matrix multiplication VI and also using for loops to perform the operation on the arrays. I'm seeing times on the order of 0.328s for the matrix VIs and 0.051s for the for loops. Using a compiled DLL may be faster than Labview, but this does not seem to be true for the built-in functions.
This is certainly not what I expected, but it is consistent over many cycles. The VI is standard execution thread. All data types are set before the timed operations - no coercion takes place in the loops. The operations are performed separately, staged by a flat sequence structure, as is the time measurement. Parallelism is turned off.
Searching around here one finds many questions how one can convert cell arrays of doubles into one big matrix.
In my application I have a two dimensional cell array (lets call it celldata of size m times n) of all same sized double matrices (lets say of size a times b).
I want to convert that data structure into one bit 4D double (m times n times a times b).
At the moment I do that by
reshape(cat(3,celldata{:}),m,n,a,b)
but maybe there are other methods doing that directly? Maybe with a call like
cat([3 4],celldata{:,:})
or similar.
I think
cell2mat(permute(celldata, [3 4 1 2]))
will do the trick. However,
%// create some bogus data
m = 1.1e2;
n = 1.2e2;
a = 1.3e2;
b = 1.4e2;
celldata = cellfun(#(~) randi(10, a,b, 'uint8'), cell(m,n), 'UniformOutput', false);
%// new method
tic
cell2mat(permute(celldata, [3 4 1 2]));
toc
%// your current method
tic
reshape(cat(3,celldata{:}),m,n,a,b);
toc
Results:
Elapsed time is 1.745495 seconds. % cell2mat/permute
Elapsed time is 0.305368 seconds. % reshape/cat
cell2mat is a matlab m-file (with necessary inefficiencies in the loop due to compatibility issues), while reshape and cat are built-ins. This is where that difference comes from.
I'd stick with your current method :)
Now, I'm asking you why you'd want to do this convesion in the first place. Is it an indexing problem? Because
celldata{x,y}(w,z)
prevents you from having to do the conversion, so you can index like
converted_celldata(x,y,w,z)
I don't see other reasons, because matrix/vector operations don't work anyway on 4D arrays...
I have two arrays of 2-by-2 complex matrices, and I was wondering what would be the fastest method of multiplying them. (I want to do matrix multiplication on the elements of the matrix arrays.) At present, I have
numpy.array(map(lambda i: numpy.dot(m1[i], m2[i]), range(l)))
But can one do better than this?
Thanks,
v923z
numpy.einsum is the optimal solution for this problem, and it is mentioned way down toward the bottom of DaveP's reference. The code is clean, very easy to understand, and an order of magnitude faster than looping through the array and doing the multiplication one by one. Here is some sample code:
import numpy
l = 100
m1 = rand(l,2,2)
m2 = rand(l,2,2)
m3 = numpy.array(map(lambda i: numpy.dot(m1[i], m2[i]), range(l)))
m3e = numpy.einsum('lij,ljk->lik', m1, m2)
%timeit numpy.array(map(lambda i: numpy.dot(m1[i], m2[i]), range(l)))
%timeit numpy.einsum('lij,ljk->lik', m1, m2)
print np.all(m3==m3e)
Here are the return values when run in an ipython notebook:
1000 loops, best of 3: 479 µs per loop
10000 loops, best of 3: 48.9 µs per loop
True
I think the answer you are looking for is here. Unfortunately it is a rather messy solution involving reshaping.
If m1 and m2 are 1-dimensional arrays of 2x2 complex matrices, then they essentially have shape (l,2,2). So matrix multiplication on the last two axes is equivalent to summing the product of the last axis of m1 with the second-to-last axis of m2. That's exactly what np.dot does:
np.dot(m1,m2)
Or, since you have complex matrices, perhaps you want to take the complex conjugate of m1 first. In that case, use np.vdot.
PS. If m1 is a list of 2x2 complex matrices, then perhaps see if you can rearrange your code to make m1 an array of shape (l,2,2) from the outset.
If that is not possible, a list comprehension
[np.dot(m1[i],m2[i]) for i in range(l)]
will be faster than using map with lambda, but performing l np.dots is going to be slower than doing one np.dot on two arrays of shape (l,2,2) as suggested above.
If m1 and m2 are 1-dimensional arrays of 2x2 complex matrices, then they essentially have shape (l,2,2). So matrix multiplication on the last two axes is equivalent to summing the product of the last axis of m1 with the second-to-last axis of m2. That's exactly what np.dot does:
But that is not what np.dot does.
a = numpy.array([numpy.diag([1, 2]), numpy.diag([2, 3]), numpy.diag([3, 4])])
produces a (3,2,2) array of 2-by-2 matrices. However, numpy.dot(a,a) creates 6 matrices, and the result's shape is (3, 2, 3, 2). That is not what I need. What I need is an array holding numpy.dot(a[0],a[0]), numpy.dot(a[1],a[1]), numpy.dot(a[2],a[2]) ...
[np.dot(m1[i],m2[i]) for i in range(l)]
should work, but I haven't yet checked, whether it is faster that the mapping of the lambda expression.
Cheers,
v923z
EDIT: the for loop and the map runs at about the same speed. It is the casting to numpy.array that consumes a lot of time, but that would have to be done for both methods, so there is no gain here.
May be it is too old question but i was still searching for an answer.
I tried this code
a=np.asarray(range(1048576),dtype='complex');b=np.reshape(a//1024,(1024,1024));b=b+1J*b
%timeit c=np.dot(b,b)
%timeit d=np.einsum('ij, ki -> jk', b,b).T
The results are : for 'dot'
10 loops, best of 3: 174 ms per loop
for 'einsum'
1 loops, best of 3: 4.51 s per loop
I have checked that c and d are same
(c==d).all()
True
still 'dot' is the winner, I am still searching for a better method but no success