Suppose I have an empty m-by-n-by-p dimensional cell called "cellPoints", and I also have a D-by-3 dimensional array called "cellIdx" where each row i contains the subscripts in "cellPoints". Now I want to compute "cellPoints" so that cellPoints{x, y, z} contains an array of row numbers in "cellIdx".
A naive implementation could be
for i = 1:size(cellIdx, 1)
cellPoints{cellIdx(i, 1), cellIdx(i, 2), cellIdx(i, 3)} = ...
[cellPoints{cellIdx(i, 1), cellIdx(i, 2), cellIdx(i, 3)};i];
end
As an example, suppose
cellPoints = cell(10, 10, 10);% user defined, cannot change
cellIdx = [1, 3, 2;
3, 2, 1;
1, 3, 2;
1, 4, 2]
Then
cellPoints{1, 3, 2} = [1;3];
cellPoints{3, 2, 1} = [2];
cellPoints{1, 4, 2} = [4];
and other indices of cellPoints should be empty
Since cellIdx is a large matrix and this is clearly inefficient, are there any other better implementations?
I've tried using unique(cellIdx, 'rows') to find unique rows in cellIdx, and then writing a for-loop to compute cellPoints, but it's even slower than above.
See if this is faster:
cellPoints = cell(10,10,10); %// initiallize to proper size
[~, jj, kk] = unique(cellIdx, 'rows', 'stable')
sz = size(cellPoints);
sz = [1 sz(1:end-1)];
csz = cumprod(sz).'; %'// will be used to build linear index
ind = 1+(cellIdx(jj,:)-1)*csz; %// linear index to fill cellPoints
cellPoints(ind) = accumarray(kk, 1:numel(kk), [], #(x) {sort(x)});
Or remove sort from the last line if order within each cell is not important.
Related
I have two numpy arrays of shape arr1=(~140000, 3) and arr2=(~450000, 10). The first 3 elements of each row, for both the arrays, are coordinates (z,y,x). I want to find the rows of arr2 that have the same coordinates of arr1 (which can be considered a subgroup of arr2).
for example:
arr1 = [[1,2,3],[1,2,5],[1,7,8],[5,6,7]]
arr2 = [[1,2,3,7,66,4,3,44,8,9],[1,3,9,6,7,8,3,4,5,2],[1,5,8,68,7,8,13,4,53,2],[5,6,7,6,67,8,63,4,5,20], ...]
I want to find common coordinates (same first 3 elements):
list_arr = [[1,2,3,7,66,4,3,44,8,9], [5,6,7,6,67,8,63,4,5,20], ...]
At the moment I'm doing this double loop, which is extremely slow:
list_arr=[]
for i in arr1:
for j in arr2:
if i[0]==j[0] and i[1]==j[1] and i[2]==j[2]:
list_arr.append (j)
I also tried to create (after the 1st loop) a subarray of arr2, filtering it on the value of i[0] (arr2_filt = [el for el in arr2 if el[0]==i[0]). This speed a bit the operation, but it still remains really slow.
Can you help me with this?
Approach #1
Here's a vectorized one with views -
# https://stackoverflow.com/a/45313353/ #Divakar
def view1D(a, b): # a, b are arrays
a = np.ascontiguousarray(a)
b = np.ascontiguousarray(b)
void_dt = np.dtype((np.void, a.dtype.itemsize * a.shape[1]))
return a.view(void_dt).ravel(), b.view(void_dt).ravel()
a,b = view1D(arr1,arr2[:,:3])
out = arr2[np.in1d(b,a)]
Approach #2
Another with dimensionality-reduction for ints -
d = np.maximum(arr2[:,:3].max(0),arr1.max(0))
s = np.r_[1,d[:-1].cumprod()]
a,b = arr1.dot(s),arr2[:,:3].dot(s)
out = arr2[np.in1d(b,a)]
Improvement #1
We could use np.searchsorted to replace np.in1d for both of the approaches listed earlier -
unq_a = np.unique(a)
idx = np.searchsorted(unq_a,b)
idx[idx==len(a)] = 0
out = arr2[unq_a[idx] == b]
Improvement #2
For the last improvement on using np.searchsorted that also uses np.unique, we could use argsort instead -
sidx = a.argsort()
idx = np.searchsorted(a,b,sorter=sidx)
idx[idx==len(a)] = 0
out = arr2[a[sidx[idx]]==b]
You can do it with the help of set
arr = np.array([[1,2,3],[4,5,6],[7,8,9]])
arr2 = np.array([[7,8,9,11,14,34],[23,12,11,10,12,13],[1,2,3,4,5,6]])
# create array from arr2 with only first 3 columns
temp = [i[:3] for i in arr2]
aset = set([tuple(x) for x in arr])
bset = set([tuple(x) for x in temp])
np.array([x for x in aset & bset])
Output
array([[7, 8, 9],
[1, 2, 3]])
Edit
Use list comprehension
l = [list(i) for i in arr2 if i[:3] in arr]
print(l)
Output:
[[7, 8, 9, 11, 14, 34], [1, 2, 3, 4, 5, 6]]
For integers Divakar already gave an excellent answer. If you want to compare floats you have to consider e.g. the following:
1.+1e-15==1.
False
1.+1e-16==1.
True
If this behaviour could lead to problems in your code I would recommend to perform a nearest neighbour search and probably check if the distances are within a specified threshold.
import numpy as np
from scipy import spatial
def get_indices_of_nearest_neighbours(arr1,arr2):
tree=spatial.cKDTree(arr2[:,0:3])
#You can check here if the distance is small enough and otherwise raise an error
dist,ind=tree.query(arr1, k=1)
return ind
I need to populate a vector with elements of another, smaller vector. So say the vector I need to populate is of length ten and is currently all zeros, i.e.
vector = [0,0,0,0,0,0,0,0,0,0]
Now suppose I have already define a vector
p = [1, 2, 3, 4, 5]
How could I populate "vector" with the array "p" so that the result is [1, 2, 3, 4, 5, 0, 0, 0, 0, 0]? Bear in mind, I want the other positions in "vector" to remain unchanged. I have already tried using repmat(p, length(p)) but that ends up giving me something of the form [1,2,3,4,5,1,2,3,4,5]. Thanks!
Try a combination of vector slicing and concatenation:
vector = cat(1, p, vector(5:))
This is faster:
vector(1:5) = p
More generally,
vector(1:numel(p)) = p
I have an array
A = [3, 4; 5, 6; 4, 1];
Is there a way I could convert all coordinate pairs of the array into linear indices such that:
A = [1, 2, 3]'
whereby (3,4), (5,6), and (4,1) are represented by 1, 2, and 3, respectively.
Many thanks!
The reason I need is because I need to loop through array A such that I could make use of each coordinate pairs (3,4), (5,6), and (4,1) at the same time. This is because I will need to feed each of these pairs into a function so as to make another computation. See pseudo code below:
for ii = 1: length(A);
[x, y] = function_obtain_coord_pairs(A);
B = function_obtain_fit(x, y, I);
end
whereby, at ii = 1, x=3 and y=4. The next iteration takes the pair x=5, y=6, etc.
Basically what will happen is that my kx2 array will be converted to a kx1 array. Thanks for your help.
Adapting your code, what you want was suggested by #Ander in the comments...
Your code
for ii = 1:length(A);
[x, y] = function_obtain_coord_pairs(A);
B = function_obtain_fit(x, y, I);
end
Adapted code
for ii = 1:size(A,1);
x = A(ii, 1);
y = A(ii, 2);
B = function_obtain_fit(x, y, I); % is I here supposed to be ii? I not defined...
end
Your unfamiliarly with indexing makes me think your function_obtain_fit function could probably be vectorised to accept the entire matrix A, but that's a matter for another day!
For instance, you really don't need to define x or y at all...
Better code
for ii = 1:size(A,1);
B = function_obtain_fit(A(ii, 1), A(ii, 2), I);
end
Here is a corrected version for your code:
A = [3, 4; 5, 6; 4, 1];
for k = A.'
B = function_obtain_fit(k(1),k(2),I)
end
By iterating directly on A you iterate over the columns of A. Because you want to iterate over the rows we need to take A.'. So if we just display k it is:
for k = A.'
k
end
the output is:
k =
3
4
k =
5
6
k =
4
1
After searching I find no native way or current solution to change efficiently the position of an element in a numpy array, which seems to me quite natural operation. For example if I want to move the 3th element in the 1st position it should be like this:
x = np.array([1,2,3,4,5])
f*(x, 3, 1)
print x
array([1,4,2,3,5])
Im looking for a f* function here. This is different of rolling every elements, also for moves in big array I want to avoid copying operation that could be used by using insert and delete operation
Not sure about the efficiency, but here's an approach using masking -
def change_pos(in_arr, pick_idx, put_idx ):
range_arr = np.arange(in_arr.size)
tmp = in_arr[pick_idx]
in_arr[range_arr != put_idx ] = in_arr[range_arr != pick_idx]
in_arr[put_idx] = tmp
This would support both forward and backward movement.
Sample runs
1) Element moving backward -
In [542]: in_arr
Out[542]: array([4, 9, 3, 6, 8, 0, 2, 1])
*
In [543]: change_pos(in_arr,6,1)
In [544]: in_arr
Out[544]: array([4, 2, 9, 3, 6, 8, 0, 1])
^
2) Element moving forward -
In [546]: in_arr
Out[546]: array([4, 9, 3, 6, 8, 0, 2, 1])
*
In [547]: change_pos(in_arr,1,6)
In [548]: in_arr
Out[548]: array([4, 3, 6, 8, 0, 2, 9, 1])
^
With the small example, this wholesale copy tests faster than #Divakar's masked in-place copy:
def foo4(arr, i,j):
L=arr.shape[0]
idx=np.concatenate((np.arange(j),[i],np.arange(j,i),np.arange(i+1,L)))
return arr[idx]
I didn't try to make it work for forward moves. An analogous inplace function runs at about the same speed as Divakar's.
def foo2(arr, i,j):
L=arr.shape[0]
tgt=np.arange(j,i+1)
src=np.concatenate([[i],np.arange(j,i)])
arr[tgt]=arr[src]
But timings could well be different if the array was much bigger and the swap involved a small block in the middle.
Since the data for an array is stored in a contiguous block of memory, elements cannot change place without some sort of copy. You'd have implement lists as a linked list to have a no-copy form of movement.
It just occurred to me that there are some masked copyto and place functions, that might make this sort of copy/movement faster. But I haven't worked with those much.
https://stackoverflow.com/a/40228699/901925
================
np.roll does
idx = np.concatenate((np.arange(2,5),np.arange(2)))
# array([2, 3, 4, 0, 1])
np.take(a, idx) # or a[idx]
In the past I have found the simple numpy indexing i.e. a[:-1]=a[1:] to be faster than most alternatives (including np.roll()). Comparing the two other answers with an 'in place' shift I get:
for shift from 40000 to 100
1.015ms divakar
1.078ms hpaulj
29.7micro s in place shift (34 x faster)
for shift from 40000 to 39900
0.975ms divakar
0.985ms hpaulj
3.47micro s in place shift (290 x faster)
timing comparison using:
import timeit
init = '''
import numpy as np
def divakar(in_arr, pick_idx, put_idx ):
range_arr = np.arange(in_arr.size)
tmp = in_arr[pick_idx]
in_arr[range_arr != put_idx ] = in_arr[range_arr != pick_idx]
in_arr[put_idx] = tmp
def hpaulj(arr, fr, to):
L = arr.shape[0]
idx = np.concatenate((np.arange(to), [fr], np.arange(to, fr), np.arange(fr+1, L)))
return arr[idx]
def paddyg(arr, fr, to):
if fr >= arr.size or to >= arr.size:
return None
tmp = arr[fr].copy()
if fr > to:
arr[to+1:fr+1] = arr[to:fr]
else:
arr[fr:to] = arr[fr+1:to+1]
arr[to] = tmp
return arr
a = np.random.randint(0, 1000, (100000))
'''
fns = ['''
divakar(a, 40000, 100)
''', '''
hpaulj(a, 40000, 100)
''', '''
paddyg(a, 40000, 100)
''']
for f in fns:
print(timeit.timeit(f, setup=init, number=1000))
Hey, having a wee bit of trouble. Trying to assign a variable length 1d array to different values of an array, e.g.
a(1) = [1, 0.13,0.52,0.3];
a(2) = [1, 0, .268];
However, I get the error:
??? In an assignment A(I) = B, the number of elements in B and
I must be the same.
Error in ==> lab2 at 15
a(1) = [1, 0.13,0.52,0.3];
I presume this means that it's expecting a scalar value instead of an array. Does anybody know how to assign the array to this value?
I'd rather not define it directly as a 2d array as it is for are doing solutions to different problems in a loop
Edit: Got it!
a(1,1:4) = [1, 0.13,0.52,0.3];
a(2,1:3) = [1, 0, .268];
What you probably wanted to write was
a(1,:) = [1, 0.13,0.52,0.3];
a(2,:) = [1, 0, .268];
i.e the the first row is [1, 0.13,0.52,0.3] and the second row is [1, 0, .268]. This is not possible, because what would be the value of a(2,4) ?
There are two ways to fix the problem.
(1) Use cell arrays
a{1} = [1, 0.13,0.52,0.3];
a{2} = [1, 0, .268];
(2) If you know the maximum possible number of columns your solutions will have, you can preallocate your array, and write in the results like so (if you don't preallocate, you'll
get zero-padding. You also risk slowing down your loop a lot, if there are many iterations, because the array will have to be recreated at every iteration.
a = NaN(nIterations,maxNumCols); %# this fills the array with not-a-numbers
tmp = [1, 0.13,0.52,0.3];
a(1,1:length(tmp)) = tmp;
tmp = [1, 0, .268];
a(2,1:length(tmp)) = tmp;