Consider two given arrays: (in this sample, these arrays are based on n=5)
Given: array m has shape (n, 2n). When n = 5, each row of m holds a random arrangement of integers 0,0,1,1,2,2,3,3,4,4.
import numpy as np
m= np.array([[4, 2, 2, 3, 0, 1, 3, 1, 0, 4],
[2, 4, 0, 4, 3, 2, 0, 1, 1, 3],
[0, 2, 3, 1, 3, 4, 2, 1, 4, 0],
[2, 1, 2, 4, 3, 0, 0, 4, 3, 1],
[2, 0, 1, 0, 3, 4, 4, 3, 2, 1]])
Given: array t has shape (n^2, 4). When n = 5, the first two columns (m_row, val) hold all 25 permutations pairs of 0 to 4.
The 1st column refers to rows of array m. The 2nd column refers to values in array m.
For now, the last two columns hold dummy value 99 that will be replaced.
t = np.array([[0, 0, 99, 99],
[0, 1, 99, 99],
[0, 2, 99, 99],
[0, 3, 99, 99],
[0, 4, 99, 99],
[1, 0, 99, 99],
[1, 1, 99, 99],
[1, 2, 99, 99],
[1, 3, 99, 99],
[1, 4, 99, 99],
[2, 0, 99, 99],
[2, 1, 99, 99],
[2, 2, 99, 99],
[2, 3, 99, 99],
[2, 4, 99, 99],
[3, 0, 99, 99],
[3, 1, 99, 99],
[3, 2, 99, 99],
[3, 3, 99, 99],
[3, 4, 99, 99],
[4, 0, 99, 99],
[4, 1, 99, 99],
[4, 2, 99, 99],
[4, 3, 99, 99],
[4, 4, 99, 99]])
PROBLEM: I want to replace the dummy values in the last two columns of t, as follows:
Let's consider t row [1, 3, 99, 99]. So from m's row=1, I determine the indices of the two columns that hold value 3. These are columns (4,9), so the t row is updated to [1, 3, 4, 9].
In the same way, t row [4, 2, 99, 99] becomes [4, 2, 0, 8].
I currently do this by looping through each column i of array m, looking for the two instances where m[m_row, i] = val, then updating array t. (slow!)
Is there a way to speed up this process, perhaps using vectorization or broadcasting?
Use the following code:
import itertools
# First 2 columns
t = np.array(list(itertools.product(range(m.shape[0]), repeat=2)))
# Add columns - indices of "wanted" elements
t = np.hstack((t, np.apply_along_axis(lambda row, arr:
np.nonzero(arr[row[0]] == row[1])[0], 1, t, m)))
The result, for your data sample (m array), is:
array([[0, 0, 4, 8],
[0, 1, 5, 7],
[0, 2, 1, 2],
[0, 3, 3, 6],
[0, 4, 0, 9],
[1, 0, 2, 6],
[1, 1, 7, 8],
[1, 2, 0, 5],
[1, 3, 4, 9],
[1, 4, 1, 3],
[2, 0, 0, 9],
[2, 1, 3, 7],
[2, 2, 1, 6],
[2, 3, 2, 4],
[2, 4, 5, 8],
[3, 0, 5, 6],
[3, 1, 1, 9],
[3, 2, 0, 2],
[3, 3, 4, 8],
[3, 4, 3, 7],
[4, 0, 1, 3],
[4, 1, 2, 9],
[4, 2, 0, 8],
[4, 3, 4, 7],
[4, 4, 5, 6]], dtype=int64)
Edit
The above code relies on the fact that each row in m contains
just 2 "wanted" values.
To make the code resistant to the case that some row contains either too many
or not enough "wanted" values (even none):
Define a function returning indices of "wanted" elements as:
def inds(row, arr):
ind = np.nonzero(arr[row[0]] == row[1])[0]
return np.pad(ind, (0,2), constant_values=99)[0:2]
Change the second instruction to:
t = np.hstack((t, np.apply_along_axis(inds, 1, t, m)))
To test this variant, change the first line of m to:
[4, 2, 2, 3, 5, 5, 3, 1, 5, 4]
i.e. it:
does not contain 0 elements,
contains only a single 1.
Then the initial part of the result is:
array([[ 0, 0, 99, 99],
[ 0, 1, 7, 99],
so that the missing indices in the result are filled with 99.
Related
Looking for some help with numpy and building a 3d array from multiply 2d arrays. I want to make a loop, such that on every iteration I make a new 2d array and make it a new slice in an existing 3d array. Here's my code sample.
import numpy as np
import random
import array
a = np.random.randint(0, 9, size=(10, 10)) <-- make random 10x10 matrix
b = a <-- save copy
a = np.random.randint(0, 9, size=(10, 10)) <-- make random 10x10 matrix
a.shape
(10, 10) <-- verify it's 10x10
b.shape
(10, 10) <-- verify it's 10x10
b = np.array([b, a]) <-- convert two 2d matrix into one 3d matrix
b.shape
(2, 10, 10) <-- verify it's a 3d matrix with two planes
a = np.random.randint(0, 9, size=(10, 10)) <-- make new random 10x10 matrix
b = np.array([b, a]) <-- add new 2d plane to the 3d matrix
b.shape
(2,) <-- should be (3, 10, 10)
Can anyone see what I'm doing wrong?
When you combine two arrays by using np.array([...]), they have to be the same shape. If they aren't numpy treats them not as numpy arrays, but as dumb/blind objects. There should have been a warning when you ran the last b = np.array([b, a]):
VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
Instead, use np.stack
b = np.stack([*b, a])
*b basically expands the children of b, so the above is equivalent to b = np.stack([b[0], b[1], a])
Or you can use np.vstack (vertical stack):
b = np.vstack([b, a[None]])
a[None] basically wraps a in another array. a.shape == (10, 10), a[None].shape == (1, 10, 10)
Both of the above produce the following:
>>> b.shape
(3, 10, 10)
>>> b
array([[[3, 8, 0, 2, 8, 0, 0, 5, 7, 7],
[0, 5, 2, 8, 8, 2, 1, 4, 5, 8],
[3, 2, 2, 4, 1, 8, 2, 0, 7, 5],
[5, 6, 5, 0, 8, 7, 4, 0, 4, 6],
[6, 2, 3, 7, 4, 3, 6, 6, 4, 8],
[2, 5, 1, 7, 1, 3, 0, 6, 0, 5],
[3, 4, 0, 7, 3, 4, 5, 0, 7, 4],
[0, 7, 2, 8, 7, 7, 4, 3, 2, 6],
[4, 6, 2, 5, 5, 8, 5, 8, 0, 8],
[3, 4, 1, 0, 3, 7, 0, 6, 7, 3]],
[[4, 0, 6, 2, 4, 4, 7, 0, 7, 2],
[5, 8, 5, 8, 2, 8, 3, 7, 4, 6],
[2, 1, 2, 0, 4, 5, 6, 3, 0, 0],
[8, 7, 3, 0, 8, 8, 0, 4, 1, 4],
[0, 2, 5, 7, 5, 3, 0, 5, 1, 7],
[1, 5, 8, 0, 2, 6, 5, 0, 3, 2],
[4, 4, 4, 3, 3, 8, 6, 6, 5, 5],
[5, 3, 6, 8, 0, 3, 0, 8, 8, 3],
[4, 2, 6, 6, 6, 2, 0, 0, 6, 2],
[7, 3, 8, 0, 7, 1, 1, 8, 6, 2]],
[[6, 6, 1, 1, 6, 4, 6, 2, 6, 7],
[0, 5, 6, 7, 5, 0, 0, 5, 8, 2],
[6, 6, 1, 5, 2, 3, 2, 3, 3, 2],
[0, 3, 7, 6, 4, 5, 3, 1, 7, 2],
[7, 6, 3, 0, 1, 7, 8, 3, 8, 5],
[3, 1, 8, 6, 1, 5, 0, 8, 6, 1],
[1, 4, 8, 1, 7, 0, 1, 1, 5, 3],
[2, 1, 4, 8, 2, 3, 1, 6, 8, 7],
[8, 1, 1, 0, 6, 1, 0, 6, 1, 6],
[1, 8, 4, 7, 7, 5, 0, 3, 8, 6]]])
Consider the following code fragment:
import numpy as np
mask = np.array([True, True, False, True, True, False])
val = np.array([9, 3])
arr = np.random.randint(1, 9, size = (5,len(mask)))
As expected, we get an array of random integers, 1 to 9, with 5 rows and 6 columns as below. The val array has not been used yet.
[[2, 7, 6, 9, 7, 5],
[7, 2, 9, 7, 8, 3],
[9, 1, 3, 5, 7, 3],
[5, 7, 4, 4, 5, 2],
[7, 7, 9, 6, 9, 8]]
Now I'll introduce val = [9, 3].
Where mask = True, I want the row element to be taken randomly from 1 to 9.
Where mask = False, I want the row element to be taken randomly from 1 to 3.
How can this be done efficiently? A sample output is shown below.
[[2, 7, 2, 9, 7, 1],
[7, 2, 1, 7, 8, 3],
[9, 1, 3, 5, 7, 3],
[5, 7, 1, 4, 5, 2],
[7, 7, 2, 6, 9, 1]]
One idea is to sample randomly between 0 to 1, then multiply with 9 or 3 depending on mask, and finally add 1 to move the sample.
rand = np.random.rand(5,len(mask))
is3 = (1-mask).astype(int)
# out is random from 0-8 or 0-2 depending on `is3`
out = (rand*val[is3]).astype(int)
# move out by `1`:
out = (out + 1)
Output:
array([[4, 9, 3, 6, 2, 1],
[1, 8, 2, 7, 1, 3],
[8, 2, 1, 2, 3, 2],
[4, 3, 2, 2, 3, 2],
[5, 8, 1, 5, 6, 1]])
Consider the small sample of a 6-column integer array:
import numpy as np
J = np.array([[1, 3, 1, 3, 2, 5],
[2, 6, 3, 4, 2, 6],
[1, 7, 2, 5, 2, 5],
[4, 2, 8, 3, 8, 2],
[0, 3, 0, 3, 0, 3],
[2, 2, 3, 3, 2, 3],
[4, 3, 4, 3, 3, 4])
I want to remove, from J:
a) all rows where the first and second PAIRS of elements are exact matches
(this remove rows like [1,3, 1,3, 2,5])
b) all rows where the second and third PAIRS of elements are exact matches
(this remove rows like [1,7, 2,5, 2,5])
Matches between any other pairs are OK.
I have a solution, below, but it is handled in two steps. If there is a more direct, cleaner, or more readily extendable approach, I'd be very interested.
K = J[~(np.logical_and(J[:,0] == J[:,2], J[:,1] == J[:,3]))]
L = K[~(np.logical_and(K[:,2] == J[:,4], K[:,3] == K[:,5]))]
K removes the 1st, 5th, and 7th rows from J, leaving
K = [[2, 6, 3, 4, 2, 6],
[1, 7, 2, 5, 2, 5],
[4, 2, 8, 3, 8, 2],
[2, 2, 3, 3, 2, 3]])
L removes the 2nd row from K, giving the final outcome.
L = [[2, 6, 3, 4, 2, 6],
[4, 2, 8, 3, 8, 2],
[2, 2, 3, 3, 2, 3]])
I'm hoping for an efficient solution because, learning from this problem, I need to extend these ideas to 8-column arrays where
I eliminate rows having exact matches between the 1st and 2nd PAIRS, the 2nd and 3rd PAIRS, and the 3rd and 4th PAIRS.
Since we are checking for adjacent pairs for equality, a differencing on 3D reshaped data seems would be one way to do it for a cleaner vectorized one -
# a is input array
In [117]: b = a.reshape(a.shape[0],-1,2)
In [118]: a[~(np.diff(b,axis=1)==0).all(2).any(1)]
Out[118]:
array([[2, 6, 3, 4, 2, 6],
[4, 2, 8, 3, 8, 2],
[2, 2, 3, 3, 2, 3]])
If you are going for performance, skip the differencing and go for sliced equality -
In [142]: a[~(b[:,:-1] == b[:,1:]).all(2).any(1)]
Out[142]:
array([[2, 6, 3, 4, 2, 6],
[4, 2, 8, 3, 8, 2],
[2, 2, 3, 3, 2, 3]])
Generic no. of cols
Extends just as well on generic no. of cols -
In [156]: a
Out[156]:
array([[1, 3, 1, 3, 2, 5, 1, 3, 1, 3, 2, 5],
[2, 6, 3, 4, 2, 6, 2, 6, 3, 4, 2, 6],
[1, 7, 2, 5, 2, 5, 1, 7, 2, 5, 2, 5],
[4, 2, 8, 3, 8, 2, 4, 2, 8, 3, 8, 2],
[0, 3, 0, 3, 0, 3, 0, 3, 0, 3, 0, 3],
[2, 2, 3, 3, 2, 3, 2, 2, 3, 3, 2, 3],
[4, 3, 4, 3, 3, 4, 4, 3, 4, 3, 3, 4]])
In [158]: b = a.reshape(a.shape[0],-1,2)
In [159]: a[~(b[:,:-1] == b[:,1:]).all(2).any(1)]
Out[159]:
array([[4, 2, 8, 3, 8, 2, 4, 2, 8, 3, 8, 2],
[2, 2, 3, 3, 2, 3, 2, 2, 3, 3, 2, 3]])
Of course, we are assuming the number of cols allows pairing.
What you have is quite reasonable. Here's what I would write:
def eliminate_pairs(x: np.ndarray) -> np.ndarray:
first_second = (x[:, 0] == x[:, 2]) & (x[:, 1] == x[:, 3])
second_third = (x[:, 1] == x[:, 3]) & (x[:, 2] == x[:, 4])
return x[~(first_second | second_third)]
You could also apply DeMorgan's theorem and eliminate an extra not operation, but that's less important than clarity.
Let's try a loop:
mask = False
for i in range(0,3,2):
mask = (J[:,i:i+2]==J[:,i+2:i+4]).all(1) | mask
J[~mask]
Output:
array([[2, 6, 3, 4, 2, 6],
[4, 2, 8, 3, 8, 2],
[2, 2, 3, 3, 2, 3]])
Consider the numpy array below. I'd hoping to find a fast way to remove rows not having 4 distinct values.
import numpy as np
D = np.array([[2, 3, 6, 7],
[2, 4, 3, 4],
[4, 9, 0, 1],
[5, 5, 2, 5],
[7, 5, 4, 8],
[7, 5, 4, 7]])
In the small sample array show, the output should be:
D = np.array([[2, 3, 6, 7],
[4, 9, 0, 1],
[7, 5, 4, 8]])
Here's one way -
In [94]: s = np.sort(D,axis=1)
In [95]: D[(s[:,:-1] == s[:,1:]).sum(1) ==0]
Out[95]:
array([[2, 3, 6, 7],
[4, 9, 0, 1],
[7, 5, 4, 8]])
Alternatively -
In [107]: D[~(s[:,:-1] == s[:,1:]).any(1)]
Out[107]:
array([[2, 3, 6, 7],
[4, 9, 0, 1],
[7, 5, 4, 8]])
Or -
In [112]: D[(s[:,:-1] != s[:,1:]).all(1)]
Out[112]:
array([[2, 3, 6, 7],
[4, 9, 0, 1],
[7, 5, 4, 8]])
With pandas -
In [121]: import pandas as pd
In [122]: D[pd.DataFrame(D).nunique(1)==4]
Out[122]:
array([[2, 3, 6, 7],
[4, 9, 0, 1],
[7, 5, 4, 8]])
A working answer with np.unique
I found no way to use the axis keyword in np.unique to get rid of the list compression, perhaps someone can help?
D[np.array([np.max(np.unique(_,return_counts=True)[-1]) for _ in D])==1]
I have an numpy array like this with shape (6, 2, 4):
x = np.array([[[0, 3, 2, 0],
[1, 3, 1, 1]],
[[3, 2, 3, 3],
[0, 3, 2, 0]],
[[1, 0, 3, 1],
[3, 2, 3, 3]],
[[0, 3, 2, 0],
[1, 3, 2, 2]],
[[3, 0, 3, 1],
[1, 0, 1, 1]],
[[1, 3, 1, 1],
[3, 1, 3, 3]]])
And I have choices array like this:
choices = np.array([[1, 1, 1, 1],
[0, 1, 1, 0],
[1, 1, 1, 1],
[1, 0, 0, 0],
[1, 0, 1, 1],
[0, 0, 0, 1]])
How can I use choices array to index only the middle dimension with size 2 and get a new numpy array with shape (6, 4) in the most efficient way possible?
The result would be this:
[[1 3 1 1]
[3 3 2 3]
[3 2 3 3]
[1 3 2 0]
[1 0 1 1]
[1 3 1 3]]
I've tried to do it by x[:, choices, :] but this doesn't return what I want. I also tried to do x.take(choices, axis=1) but no luck.
Use np.take_along_axis to index along the second axis -
In [16]: np.take_along_axis(x,choices[:,None],axis=1)[:,0]
Out[16]:
array([[1, 3, 1, 1],
[3, 3, 2, 3],
[3, 2, 3, 3],
[1, 3, 2, 0],
[1, 0, 1, 1],
[1, 3, 1, 3]])
Or with explicit integer-array indexing -
In [22]: m,n = choices.shape
In [23]: x[np.arange(m)[:,None],choices,np.arange(n)]
Out[23]:
array([[1, 3, 1, 1],
[3, 3, 2, 3],
[3, 2, 3, 3],
[1, 3, 2, 0],
[1, 0, 1, 1],
[1, 3, 1, 3]])
as I recently had this issue, found #divakar's answer useful, but still wanted a general functions for that (independent of number of dims etc.), here it is :
def take_indices_along_axis(array, choices, choice_axis):
"""
array is N dim
choices are integer of N-1 dim
with valuesbetween 0 and array.shape[choice_axis] - 1
choice_axis is the axis along which you want to take indices
"""
nb_dims = len(array.shape)
list_indices = []
for this_axis, this_axis_size in enumerate(array.shape):
if this_axis == choice_axis:
# means this is the axis along which we want to choose
list_indices.append(choices)
continue
# else, we want arange(this_axis), but reshaped to match the purpose
this_indices = np.arange(this_axis_size)
reshape_target = [1 for _ in range(nb_dims)]
reshape_target[this_axis] = this_axis_size # replace the corresponding axis with the right range
del reshape_target[choice_axis] # remove the choice_axis
list_indices.append(
this_indices.reshape(tuple(reshape_target))
)
tuple_indices = tuple(list_indices)
return array[tuple_indices]
# test it !
array = np.random.random(size=(10, 10, 10, 10))
choices = np.random.randint(10, size=(10, 10, 10))
assert take_indices_along_axis(array, choices, choice_axis=0)[5, 5, 5] == array[choices[5, 5, 5], 5, 5, 5]
assert take_indices_along_axis(array, choices, choice_axis=2)[5, 5, 5] == array[5, 5, choices[5, 5, 5], 5]