How to sort a matrix by the norm of its rows efficiently(using numpy.ndarrays)?
I want to sort the matrix A:
A = np.array( ( [ 10, 1, 6, 3 ],
[ 1,12, 2, 4 ],
[ 6, 2,14, 5 ],
[ 3, 4, 5, 9 ] ) )
by the norm of its rows.
What I do now is to create a list of the norm and get the indexlist of that list and sort the matrix based on that indexlist. Is this the way to go?
indexlist = np.argsort( np.apply_along_axis( np.linalg.norm, 0, A))
#indexlist = array([3, 0, 1, 2])
then my sorted list.
sortedA = A[indexlist]
and the symmetric sorted list would then be
sym_sortedA = A[indexlist][:,indexlist]
Yes, this is the most common way to do that. A bit shorter would be to use
indexlist = np.argsort(np.linalg.norm(A,axis=1))
You need to use axis=1 if you want to sort by rows, but since the matrix is symmetric that doesn't matter.
Related
Assume that we are given two vectors:
A=(a₁,a₂,...,aₘ)and B=(b₁,b₂,...,bₘ)
and we need to do something for all the vectors between these two ones.
For example, for A=(1,1,0)and B=(1,2,2), all the vectors between A and B are: {(1,1,1),(1,1,2),(1,2,0),(1,2,1)}.
An obvious way to generate such vectors is using m loops (for loop), but probably it is not the best one. I would like to know if someone has some better idea.
Here's a fixed method. Returns a matrix where each row is one of the vectors of the result.
% Data
A = [0, 0, 1, 3, 5, 2]
B = [4, 8, 5, 7, 9, 6]
% Preallocate
b = cell(1,numel(A));
vec = cell(1,numel(A));
% Make a vector of values of each element of the result
for i = 1:numel(A)
vec{i} = A(i):B(i);
end
% Get all combinations using ndgrid
[b{:}] = ndgrid(vec{:});
b=cat(ndims(b{1})+1,b{:});
% Reshape the numel(A)+1 dimensional array into a 2D array
res = reshape(b,numel(b)/length(A),length(A));
I am trying to efficiently index a 2D array in Python and have the problem that it is really slow.
This is what I tried (simplified example):
xSize = veryBigNumber
ySize = veryBigNumber
a = np.ones((xSize,ySize))
N = veryBigNumber
const = 1
for t in range(N):
for i in range(xSize):
for j in range(ySize):
a[i,j] *= f(i,j)*const # f(i,j) is an arbitrary function of i and j.
Now I would like to substitute the nested loop by something more efficient. How do I do this?
Your 2D array could be produced using the following addition:
np.arange(200)[:,np.newaxis] + np.arange(200)
This type of vectorised operation is likely to be very fast:
>>> %timeit np.arange(200)[:,np.newaxis] + np.arange(200)
1000 loops, best of 3: 178 µs per loop
This method in not limited to addition. We can use the two arrays in the above operation as the arguments of any universal function (commonly abbreviated to ufunc).
For example:
>>> np.multiply(np.arange(5)[:,np.newaxis], np.arange(5))
array([[ 0, 0, 0, 0, 0],
[ 0, 1, 2, 3, 4],
[ 0, 2, 4, 6, 8],
[ 0, 3, 6, 9, 12],
[ 0, 4, 8, 12, 16]])
NumPy has built in ufuncs for all the basic arithmetic operations and some more interesting ones too. If you need a more exotic function, NumPy allows you to make your own ufunc.
Edit: To quickly explain the broadcasting happening in this method; you can think of it like this...
np.arange(5) produces 1D array which looks like this:
array([0, 1, 2, 3, 4])
The code np.arange(5)[:,np.newaxis] adds a second dimension (columns) to the range, producing this 2D array:
array([[0],
[1],
[2],
[3],
[4]])
To create the final 5x5 array using np.multiply (although we could use any ufunc or binary arithmetic operation), NumPy takes the 0 in the second array and mutliplies it with each elements it the first array making a row like this:
[ 0, 0, 0, 0, 0]
It then takes the second element in the second array, 1, and multiplies it with the first array, producing this row:
[ 0, 1, 2, 3, 4]
This continues until we have the final 5x5 matrix.
You could use the indices routine:
b=np.indices(a.shape)
a=b[0]+b[1]
Timings:
%%timeit
...: b=np.indices(a.shape)
...: c=b[0]+b[1]
1000 loops, best of 3: 370 µs per loop
%%timeit
for i in range(200):
for j in range(200):
a[i,j] = i + j
100 loops, best of 3: 10.4 ms per loop
Since your output matrix a is the element-wise power of N of a matrix F with elements f_ij = f(i,j) * const your code can simplify to
F = np.empty((xSize, ySize))
for i in range(xSize):
for j in range(ySize):
F[i,j] = f(i,j) * const
a = F ** n
For even more speed you can exchange the creation of the F matrix with something more efficient, given that the function f(i,j) is vectorized:
xmap, ymap = numpy.meshgrid(range(xSize), range(ySize))
F = f(xmap, ymap) * const
I'm building a decision tree algorithm. The sorting is very expensive in this algorithm because for every split I need to sort each column. So at the beginning - even before tree construction I'm presorting variables - I'm creating a matrix so for each column in the matrix I save its ranking. Then when I want to sort the variable in some split I don't actually sort it but use the presorted ranking array. The problem is that I don't know how to do it in a space efficient manner.
A naive solution of this is below. This is only for 1 variabe (v) and 1 split (split_ind).
import numpy as np
v = np.array([60,70,50,10,20,0,90,80,30,40])
sortperm = v.argsort() #1 sortperm = array([5, 3, 4, 8, 9, 2, 0, 1, 7, 6])
rankperm = sortperm.argsort() #2 rankperm = array([6, 7, 5, 1, 2, 0, 9, 8, 3, 4])
split_ind = np.array([3,6,4,8,9]) # this is my split (random)
# split v and sortperm
v_split = v[split_ind] # v_split = array([10, 90, 20, 30, 40])
rankperm_split = rankperm[split_ind] # rankperm_split = array([1, 9, 2, 3, 4])
vsorted_dummy = np.ones(10)*-1 #3 allocate "empty" array[N]
vsorted_dummy[rankperm_split] = v_split
vsorted = vsorted_dummy[vsorted_dummy!=-1] # vsorted = array([ 10., 20., 30., 40., 90.])
Basically I have 2 questions:
Is double sorting necessary to create ranking array? (#1 and #2)
In the line #3 I'm allocating array[N]. This is very inefficent in terms of space because even if split size n << N I have to allocate whole array. The problem here is how to calculate rankperm_split. In the example original rankperm_split = [1,9,2,3,4] while it should be really [1,5,2,3,4]. This problem can be reformulated so that I want to create a "dense" integer array that has maximum gap of 1 and it keeps the ranking of the array intact.
UPDATE
I think that second point is the key here. This problem can be redefined as
A[N] - array of size N
B[N] - array of size N
I want to transform array A to array B so that:
Ranking of the elements stays the same (for each pair i,j if A[i] < A[j] then B[i] < B[j]
Array B has only elements from 1 to N where each element is unique.
A few examples of this transformation:
[3,4,5] => [1,2,3]
[30,40,50] => [1,2,3]
[30,50,40] => [1,3,2]
[3,4,50] => [1,2,3]
A naive implementation (with sorting) can be defined like this (in Python)
def remap(a):
a_ = sorted(a)
b = [a_.index(e)+1 for e in a]
return b
In Python numpy it is possible to use arrays of indexes, as in (taken from the tutorial):
data = array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
i = array( [ [0,1], # indices for the first dim of data
[1,2] ] )
j = array( [ [2,1], # indices for the second dim
[3,3] ] )
Now, the invocation
data[i,j]
returns the array
array([[ 2, 5],
[ 7, 11]])
How can I get the same in Matlab?
I think you will have to use linear indexing which you'll get from the sub2ind function like this:
ind = sub2ind(size(data), I,J)
example:
data =[ 0, 1, 2, 3
4, 5, 6, 7
8, 9, 10, 11]
i = [0,1;
1,2];
j = [2,1;
3,3]
ind = sub2ind(size(data), i+1,j+1);
data(ind)
ans =
2 5
7 11
notice that I went i+1 and j+1, this is because unlike Python which starts indexing at 0, Matlab starts indexing from 1.
If I have several sets of numbers (just a 2D array where each row is a set):
[ 1, 3, -1, -1]
[ 2, 4, -1, -1]
[ 7, 8, 9, 10]
What would be an algorithm to create a list of sums (ignoring -1's)? the result for the above would be:
1+2+7,
1+2+8,
1+2+9,
1+2+10,
1+4+7,
1+4+8,
1+4+9,
1+4+10,
3+2+7,
3+2+8,
3+2+9,
3+2+10,
3+4+7,
3+4+8,
3+4+9,
3+4+10
For each number in the first list, generate all sums starting with that number and all sums recursively generated by applying the same method to all but the first list. When you have no lists left, that is the base case.
Pseudo-code:
function find_sums(lists):
if lists is empty:
return [""]
sums = []
for n in lists[0]:
if n != -1:
for sum in find_sums(lists from index 1 onwards):
sums.append(n + "+" + sum)
return sums
This is called the Cartesian product.