Inplace changing position of an element in array by shifting others forward - NumPy - arrays

After searching I find no native way or current solution to change efficiently the position of an element in a numpy array, which seems to me quite natural operation. For example if I want to move the 3th element in the 1st position it should be like this:
x = np.array([1,2,3,4,5])
f*(x, 3, 1)
print x
array([1,4,2,3,5])
Im looking for a f* function here. This is different of rolling every elements, also for moves in big array I want to avoid copying operation that could be used by using insert and delete operation

Not sure about the efficiency, but here's an approach using masking -
def change_pos(in_arr, pick_idx, put_idx ):
range_arr = np.arange(in_arr.size)
tmp = in_arr[pick_idx]
in_arr[range_arr != put_idx ] = in_arr[range_arr != pick_idx]
in_arr[put_idx] = tmp
This would support both forward and backward movement.
Sample runs
1) Element moving backward -
In [542]: in_arr
Out[542]: array([4, 9, 3, 6, 8, 0, 2, 1])
*
In [543]: change_pos(in_arr,6,1)
In [544]: in_arr
Out[544]: array([4, 2, 9, 3, 6, 8, 0, 1])
^
2) Element moving forward -
In [546]: in_arr
Out[546]: array([4, 9, 3, 6, 8, 0, 2, 1])
*
In [547]: change_pos(in_arr,1,6)
In [548]: in_arr
Out[548]: array([4, 3, 6, 8, 0, 2, 9, 1])
^

With the small example, this wholesale copy tests faster than #Divakar's masked in-place copy:
def foo4(arr, i,j):
L=arr.shape[0]
idx=np.concatenate((np.arange(j),[i],np.arange(j,i),np.arange(i+1,L)))
return arr[idx]
I didn't try to make it work for forward moves. An analogous inplace function runs at about the same speed as Divakar's.
def foo2(arr, i,j):
L=arr.shape[0]
tgt=np.arange(j,i+1)
src=np.concatenate([[i],np.arange(j,i)])
arr[tgt]=arr[src]
But timings could well be different if the array was much bigger and the swap involved a small block in the middle.
Since the data for an array is stored in a contiguous block of memory, elements cannot change place without some sort of copy. You'd have implement lists as a linked list to have a no-copy form of movement.
It just occurred to me that there are some masked copyto and place functions, that might make this sort of copy/movement faster. But I haven't worked with those much.
https://stackoverflow.com/a/40228699/901925
================
np.roll does
idx = np.concatenate((np.arange(2,5),np.arange(2)))
# array([2, 3, 4, 0, 1])
np.take(a, idx) # or a[idx]

In the past I have found the simple numpy indexing i.e. a[:-1]=a[1:] to be faster than most alternatives (including np.roll()). Comparing the two other answers with an 'in place' shift I get:
for shift from 40000 to 100
1.015ms divakar
1.078ms hpaulj
29.7micro s in place shift (34 x faster)
for shift from 40000 to 39900
0.975ms divakar
0.985ms hpaulj
3.47micro s in place shift (290 x faster)
timing comparison using:
import timeit
init = '''
import numpy as np
def divakar(in_arr, pick_idx, put_idx ):
range_arr = np.arange(in_arr.size)
tmp = in_arr[pick_idx]
in_arr[range_arr != put_idx ] = in_arr[range_arr != pick_idx]
in_arr[put_idx] = tmp
def hpaulj(arr, fr, to):
L = arr.shape[0]
idx = np.concatenate((np.arange(to), [fr], np.arange(to, fr), np.arange(fr+1, L)))
return arr[idx]
def paddyg(arr, fr, to):
if fr >= arr.size or to >= arr.size:
return None
tmp = arr[fr].copy()
if fr > to:
arr[to+1:fr+1] = arr[to:fr]
else:
arr[fr:to] = arr[fr+1:to+1]
arr[to] = tmp
return arr
a = np.random.randint(0, 1000, (100000))
'''
fns = ['''
divakar(a, 40000, 100)
''', '''
hpaulj(a, 40000, 100)
''', '''
paddyg(a, 40000, 100)
''']
for f in fns:
print(timeit.timeit(f, setup=init, number=1000))

Related

numpy array difference to the largest value in another large array which less than the original array

numpy experts,
I'm using numpy.
I want to compare two arrays, get the largest value that is smaller than one of the arrays, and calculate the difference between them.
For example,
A = np.array([3, 5, 7, 12, 13, 18])
B = np.array([4, 7, 17, 20])
I want [1, 0, 4, 2] (4-3, 7-7, 17-13, 20-18) , in this case.
The problem is that the size of the A and B arrays is so large that it would take a very long time to do this by simple means. I can try to divide them to some size, but I wonder if there is a simple numpy function to solve this problem.
Or can I use numba?
For your information, This is my current very stupid codes.
delta = np.zeros_like(B)
for i in range(len(B)):
index_A = (A <= B[i]).argmin() - 1
delta[i] = B[i] - A[index_A]
I agree with #tarlen555 that the problem is mostly related to the for-loop. I guess this one is already much faster:
diff = B-A[:,np.newaxis]
diff[diff<0] = max(A.max(), B.max())
diff.min(axis=0)
In the second line, I wanted to fill all entries with negative values with something ridiculously large. Since your numbers are integer, np.inf doesn't work, but something like that could be more elegant.
EDIT:
Another way:
from scipy.spatial import cKDTree
tree = cKDTree(A.reshape(-1, 1))
k = 2
large_value = max(A.max(), B.max())
while True:
indices = tree.query(B.reshape(-1, 1), k=k)[1]
diff = B[:,np.newaxis]-A[indices]
if np.all(diff.max(axis=-1)>=0):
break
k += 1
diff[diff<0] = large_value
diff.min(axis=1)
This solution could be more memory-efficient but frankly I'm not sure how much more.

Find an algorithm to sort an array given its status after sorting

Let A be an array with n elements. A is not sorted, nonetheless, after sorting the array, the difference between any two adjacent elements would be either k1, k2 or k3.
It should be noted that k1, k2 and k3 are not given, and all of them are natural!
For example, given the array:
A = { 25, 7, 5, 9, 32, 23, 14, 21}
After sorting the array, we would get -
A = { 5, 7, 9, 14, 21, 23, 25, 32}
The difference between the first pair (5, 7) is 2; so k1=2, the difference between the third pair (9,14) is 5, so k2=5, whereas the difference between the fourth pair (14, 21) is 7, so k3=7. The difference between other adjacent pairs is also 2, 5 and 7.
The algorithm for sorting the array should be as best as possible (obviously below O(nlogn)).
I managed to answer a similar question where the difference between any two adjacent elements was either k, 2k or 3k, where k is real. But I couldn't find an appropriate algorithm following a similar method, by finding k, dividing by it and doing bucket sort.
By finding the minimum and second minimum we can find one of the k's. But k could be n2 — so finding the maximum does not help either... I am really lost!
Disclaimer: This question has been asked before, but no answer was given to the problem, and the question was not easy to understand.
Here is a O(n) that only doesn't look efficient.
The idea is simple. Given the minimum element and a list of values for k, you construct the biggest sorted set with the values of k that you already have found, find the smallest missing thing not in the set, and find a new value of k. If there are K values of k, this operation is O((1+K) * n).
Repeating this K times is therefore O((1+K)^2 * n).
In our case K is constant, so we get O(n).
Here it is in Python.
def special_sort (array):
# special cases first.
if 0 == len(array):
return array
elif 1 == len(array):
return array
elif 2 == len(array):
return [min(array), max(array)]
min_value = min(array)
values_of_k = []
leftovers = array
while len(leftovers):
values_of_k = sorted(values_of_k)
values = set(array)
sorted_array = [min_value]
values.remove(min_value)
found = True
while found:
found = False
for k in values_of_k:
if sorted_array[-1] + k in values:
found = True
sorted_array.append(sorted_array[-1] + k)
values.remove(sorted_array[-1])
break
leftovers = list(values)
if 0 == len(leftovers):
return sorted_array
else:
first_missing = min(leftovers)
# Find the first element of the array less than that.
i = -1
while i+1 < len(sorted_array) and sorted_array[i+1] < first_missing:
i = i+1
values_of_k.append(first_missing - sorted_array[i])
print(special_sort([25, 7, 5, 9, 32, 23, 14, 21]))

Find common elements in subarrays of arrays

I have two numpy arrays of shape arr1=(~140000, 3) and arr2=(~450000, 10). The first 3 elements of each row, for both the arrays, are coordinates (z,y,x). I want to find the rows of arr2 that have the same coordinates of arr1 (which can be considered a subgroup of arr2).
for example:
arr1 = [[1,2,3],[1,2,5],[1,7,8],[5,6,7]]
arr2 = [[1,2,3,7,66,4,3,44,8,9],[1,3,9,6,7,8,3,4,5,2],[1,5,8,68,7,8,13,4,53,2],[5,6,7,6,67,8,63,4,5,20], ...]
I want to find common coordinates (same first 3 elements):
list_arr = [[1,2,3,7,66,4,3,44,8,9], [5,6,7,6,67,8,63,4,5,20], ...]
At the moment I'm doing this double loop, which is extremely slow:
list_arr=[]
for i in arr1:
for j in arr2:
if i[0]==j[0] and i[1]==j[1] and i[2]==j[2]:
list_arr.append (j)
I also tried to create (after the 1st loop) a subarray of arr2, filtering it on the value of i[0] (arr2_filt = [el for el in arr2 if el[0]==i[0]). This speed a bit the operation, but it still remains really slow.
Can you help me with this?
Approach #1
Here's a vectorized one with views -
# https://stackoverflow.com/a/45313353/ #Divakar
def view1D(a, b): # a, b are arrays
a = np.ascontiguousarray(a)
b = np.ascontiguousarray(b)
void_dt = np.dtype((np.void, a.dtype.itemsize * a.shape[1]))
return a.view(void_dt).ravel(), b.view(void_dt).ravel()
a,b = view1D(arr1,arr2[:,:3])
out = arr2[np.in1d(b,a)]
Approach #2
Another with dimensionality-reduction for ints -
d = np.maximum(arr2[:,:3].max(0),arr1.max(0))
s = np.r_[1,d[:-1].cumprod()]
a,b = arr1.dot(s),arr2[:,:3].dot(s)
out = arr2[np.in1d(b,a)]
Improvement #1
We could use np.searchsorted to replace np.in1d for both of the approaches listed earlier -
unq_a = np.unique(a)
idx = np.searchsorted(unq_a,b)
idx[idx==len(a)] = 0
out = arr2[unq_a[idx] == b]
Improvement #2
For the last improvement on using np.searchsorted that also uses np.unique, we could use argsort instead -
sidx = a.argsort()
idx = np.searchsorted(a,b,sorter=sidx)
idx[idx==len(a)] = 0
out = arr2[a[sidx[idx]]==b]
You can do it with the help of set
arr = np.array([[1,2,3],[4,5,6],[7,8,9]])
arr2 = np.array([[7,8,9,11,14,34],[23,12,11,10,12,13],[1,2,3,4,5,6]])
# create array from arr2 with only first 3 columns
temp = [i[:3] for i in arr2]
aset = set([tuple(x) for x in arr])
bset = set([tuple(x) for x in temp])
np.array([x for x in aset & bset])
Output
array([[7, 8, 9],
[1, 2, 3]])
Edit
Use list comprehension
l = [list(i) for i in arr2 if i[:3] in arr]
print(l)
Output:
[[7, 8, 9, 11, 14, 34], [1, 2, 3, 4, 5, 6]]
For integers Divakar already gave an excellent answer. If you want to compare floats you have to consider e.g. the following:
1.+1e-15==1.
False
1.+1e-16==1.
True
If this behaviour could lead to problems in your code I would recommend to perform a nearest neighbour search and probably check if the distances are within a specified threshold.
import numpy as np
from scipy import spatial
def get_indices_of_nearest_neighbours(arr1,arr2):
tree=spatial.cKDTree(arr2[:,0:3])
#You can check here if the distance is small enough and otherwise raise an error
dist,ind=tree.query(arr1, k=1)
return ind

Algorithm of Minimum steps to transform a list to the desired array. (Using InsertAt and DeleteAt Only)

Situation
To begin with, you have an array/list A, then you want to transform it to an expected array/list B given. The only actions that you can apply on array are InsertAt and DeleteAt where they are able to insert and delete an element at certain index from list.
note: Array B is always sorted while Array A may not be.
For instance, you have an array A of [1, 4, 3, 6, 7]
and you want it to become [2, 3, 4, 5, 6, 6, 7, 8]
One way of doing that is let A undergo following actions:
deleteAt(0); // which will delete element 1, arrayA now [4, 3, 6, 7]
deleteAt(0); // delete element 4 which now at index 0
// array A now [3, 6, 7]
insertAt(0, 2); // Insert value to at index 0 of array A
// array A now [2, 3, 6, 7]
insertAt(2, 4); // array now [2, 3, 4, 6, 7]
insertAt(3, 5); // array Now [2, 3, 4, 5, 6, 7]
insertAt(5, 6); // array Now [2, 3, 4, 5, 6, 6, 7]
insertAt(7, 8); // array Now [2, 3, 4, 5, 6, 6, 7, 8]
On the above example, 7 operations were done on array A to transform it to array we wanted.
Hence, how do we find the what are the steps to transform A to B, as well as the minimum steps? Thanks!
btw, solution of deleting all element at A then add everything from B to A is only applicable when A & B have nothing in common.
My thoughts
What I have done so far:
Compare the array A and array B, then save delete all the elements in Array A that can't be found in array B.
Find the longest increasing subsequence from the common list of A and B.
delete All elements that are not in Longest increasing sub sequence.
compare what is left with B, then add elements accordingly.
However, I'm struggling from implementing that..
Change log
fixed a typo of missing out element 7, now least operation is 7.
Added MY THOUGHTS section
There was a answer that elaborated on Levenshtein distance (AKA min edit distance), somehow it disappeared.. But I found that really useful after reading git/git levenshtein.c file, it seems to be a faster algorithm then what I already have. However, I'm not sure will that algorithm give me the detailed steps, or it is only capable of giving min num of steps.
I have a python program that seems to work, but it is not very short
__version__ = '0.2.0'
class Impossible(RuntimeError): pass
deleteAt = 'deleteAt'
insertAt = 'insertAt'
incOffset = 'incOffset'
def remove_all(size):
return [(deleteAt, i, None) for i in range(size-1, -1, -1)]
def index_not(seq, val):
for i, x in enumerate(seq):
if x != val:
return i
return len(seq)
def cnt_checked(func):
"""Helper function to check some function's contract"""
from functools import wraps
#wraps(func)
def wrapper(src, dst, maxsteps):
nsteps, steps = func(src, dst, maxsteps)
if nsteps > maxsteps:
raise RuntimeError(('cnt_checked() failed', maxsteps, nsteps))
return nsteps, steps
return wrapper
#cnt_checked
def strategy_1(src, dst, maxsteps):
# get dst's first value from src
val = dst[0]
try:
index = src.index(val)
except ValueError:
raise Impossible
# remove all items in src before val's first occurrence
left_steps = remove_all(index)
src = src[index:]
n = min(index_not(src, val), index_not(dst, val))
score = len(left_steps)
assert n > 0
left_steps.append([incOffset, n, None])
right_steps = [[incOffset, -n, None]]
nsteps, steps = rec_find_path(src[n:], dst[n:], maxsteps - score)
return (score + nsteps, (left_steps + steps + right_steps))
#cnt_checked
def strategy_2(src, dst, maxsteps):
# do not get dst's first value from src
val = dst[0]
left_steps = []
src = list(src)
for i in range(len(src)-1, -1, -1):
if src[i] == val:
left_steps.append((deleteAt, i, None))
del src[i]
n = index_not(dst, val)
right_steps = [(insertAt, 0, val) for i in range(n)]
dst = dst[n:]
score = len(left_steps) + len(right_steps)
nsteps, steps = rec_find_path(src, dst, maxsteps - score)
return (score + nsteps, (left_steps + steps + right_steps))
#cnt_checked
def rec_find_path(src, dst, maxsteps):
if maxsteps <= 0:
if (maxsteps == 0) and (src == dst):
return (0, [])
else:
raise Impossible
# if destination is empty, clear source
if not dst:
if len(src) > maxsteps:
raise Impossible
steps = remove_all(len(src))
return (len(steps), steps)
found = False
try:
nsteps_1, steps_1 = strategy_1(src, dst, maxsteps)
except Impossible:
pass
else:
found = True
maxsteps = nsteps_1 - 1
try:
nsteps_2, steps_2 = strategy_2(src, dst, maxsteps)
except Impossible:
if found:
return (nsteps_1, steps_1)
else:
raise
else:
return (nsteps_2, steps_2)
def find_path(A, B):
assert B == list(sorted(B))
maxsteps = len(A) + len(B)
nsteps, steps = rec_find_path(A, B, maxsteps)
result = []
offset = 0
for a, b, c in steps:
if a == incOffset:
offset += b
else:
result.append((a, b + offset, c))
return result
def check(start, target, ops):
"""Helper function to check correctness of solution"""
L = list(start)
for a, b, c in ops:
print(L)
if a == insertAt:
L.insert(b, c)
elif a == deleteAt:
del L[b]
else:
raise RuntimeError(('Unexpected op:', a))
print(L)
if L != target:
raise RuntimeError(('Result check failed, expected', target, 'got:', L))
start = [1, 4, 3, 6, 7]
target = [2, 3, 4, 5, 6, 6, 7, 8]
ops = find_path(start, target)
print(ops)
check(start, target, ops)
After some tests with this code, it is now obvious that the result is
a two phases operation. There is a first phase where items are deleted from
the initial list, all but a increasing sequence of items all belonging to the
target list (with repetition). Missing items are then added to list until
the target list is built.
The temporary conclusion is that if we find an algorithm to determine the
longest subsequence of items from the target list initially present in the
first list, in the same order but not necessarily contiguous, then it gives the shortest path. This is a new
and potentially simpler problem. This is probably what you meant above, but it is much clearer from the program's output.
It seems almost obvious that this problem can be reduced to the problem of the longest increasing subsequence
We can prove quite easily that the problem reduces to the longest non-decreasing subsequence by noting that if there were a collection of elements in A that did not merit deletion and was greater in number than the longest non-decreasing subsequence in A of elements also in B, then all the elements of this collection must exist in the same order in B, which means it's a longer non-decreasing subsequence, and contradicts our assertion. Additionally, any smaller collection in A that does not merit deletion, necessarily exists in B in the same order and is therefore part of the longest non-decreasing subsequence in A of elements also in B.
The algorithm then is reduced to the longest increasing subsequence problem, O(n log n + m):
(1) Find the longest non-decreasing subsequence of elements
in A that have at least the same count in B.
(2) All other items in A are to be deleted and the
missing elements from B added.

indexing rows in matrix using matlab

Suppose I have an empty m-by-n-by-p dimensional cell called "cellPoints", and I also have a D-by-3 dimensional array called "cellIdx" where each row i contains the subscripts in "cellPoints". Now I want to compute "cellPoints" so that cellPoints{x, y, z} contains an array of row numbers in "cellIdx".
A naive implementation could be
for i = 1:size(cellIdx, 1)
cellPoints{cellIdx(i, 1), cellIdx(i, 2), cellIdx(i, 3)} = ...
[cellPoints{cellIdx(i, 1), cellIdx(i, 2), cellIdx(i, 3)};i];
end
As an example, suppose
cellPoints = cell(10, 10, 10);% user defined, cannot change
cellIdx = [1, 3, 2;
3, 2, 1;
1, 3, 2;
1, 4, 2]
Then
cellPoints{1, 3, 2} = [1;3];
cellPoints{3, 2, 1} = [2];
cellPoints{1, 4, 2} = [4];
and other indices of cellPoints should be empty
Since cellIdx is a large matrix and this is clearly inefficient, are there any other better implementations?
I've tried using unique(cellIdx, 'rows') to find unique rows in cellIdx, and then writing a for-loop to compute cellPoints, but it's even slower than above.
See if this is faster:
cellPoints = cell(10,10,10); %// initiallize to proper size
[~, jj, kk] = unique(cellIdx, 'rows', 'stable')
sz = size(cellPoints);
sz = [1 sz(1:end-1)];
csz = cumprod(sz).'; %'// will be used to build linear index
ind = 1+(cellIdx(jj,:)-1)*csz; %// linear index to fill cellPoints
cellPoints(ind) = accumarray(kk, 1:numel(kk), [], #(x) {sort(x)});
Or remove sort from the last line if order within each cell is not important.

Resources