Numpy - Indexing one dimension of a multidimensional array - arrays

I have an numpy array like this with shape (6, 2, 4):
x = np.array([[[0, 3, 2, 0],
[1, 3, 1, 1]],
[[3, 2, 3, 3],
[0, 3, 2, 0]],
[[1, 0, 3, 1],
[3, 2, 3, 3]],
[[0, 3, 2, 0],
[1, 3, 2, 2]],
[[3, 0, 3, 1],
[1, 0, 1, 1]],
[[1, 3, 1, 1],
[3, 1, 3, 3]]])
And I have choices array like this:
choices = np.array([[1, 1, 1, 1],
[0, 1, 1, 0],
[1, 1, 1, 1],
[1, 0, 0, 0],
[1, 0, 1, 1],
[0, 0, 0, 1]])
How can I use choices array to index only the middle dimension with size 2 and get a new numpy array with shape (6, 4) in the most efficient way possible?
The result would be this:
[[1 3 1 1]
[3 3 2 3]
[3 2 3 3]
[1 3 2 0]
[1 0 1 1]
[1 3 1 3]]
I've tried to do it by x[:, choices, :] but this doesn't return what I want. I also tried to do x.take(choices, axis=1) but no luck.

Use np.take_along_axis to index along the second axis -
In [16]: np.take_along_axis(x,choices[:,None],axis=1)[:,0]
Out[16]:
array([[1, 3, 1, 1],
[3, 3, 2, 3],
[3, 2, 3, 3],
[1, 3, 2, 0],
[1, 0, 1, 1],
[1, 3, 1, 3]])
Or with explicit integer-array indexing -
In [22]: m,n = choices.shape
In [23]: x[np.arange(m)[:,None],choices,np.arange(n)]
Out[23]:
array([[1, 3, 1, 1],
[3, 3, 2, 3],
[3, 2, 3, 3],
[1, 3, 2, 0],
[1, 0, 1, 1],
[1, 3, 1, 3]])

as I recently had this issue, found #divakar's answer useful, but still wanted a general functions for that (independent of number of dims etc.), here it is :
def take_indices_along_axis(array, choices, choice_axis):
"""
array is N dim
choices are integer of N-1 dim
with valuesbetween 0 and array.shape[choice_axis] - 1
choice_axis is the axis along which you want to take indices
"""
nb_dims = len(array.shape)
list_indices = []
for this_axis, this_axis_size in enumerate(array.shape):
if this_axis == choice_axis:
# means this is the axis along which we want to choose
list_indices.append(choices)
continue
# else, we want arange(this_axis), but reshaped to match the purpose
this_indices = np.arange(this_axis_size)
reshape_target = [1 for _ in range(nb_dims)]
reshape_target[this_axis] = this_axis_size # replace the corresponding axis with the right range
del reshape_target[choice_axis] # remove the choice_axis
list_indices.append(
this_indices.reshape(tuple(reshape_target))
)
tuple_indices = tuple(list_indices)
return array[tuple_indices]
# test it !
array = np.random.random(size=(10, 10, 10, 10))
choices = np.random.randint(10, size=(10, 10, 10))
assert take_indices_along_axis(array, choices, choice_axis=0)[5, 5, 5] == array[choices[5, 5, 5], 5, 5, 5]
assert take_indices_along_axis(array, choices, choice_axis=2)[5, 5, 5] == array[5, 5, choices[5, 5, 5], 5]

Related

Can this loopy array process be sped up?

Consider two given arrays: (in this sample, these arrays are based on n=5)
Given: array m has shape (n, 2n). When n = 5, each row of m holds a random arrangement of integers 0,0,1,1,2,2,3,3,4,4.
import numpy as np
m= np.array([[4, 2, 2, 3, 0, 1, 3, 1, 0, 4],
[2, 4, 0, 4, 3, 2, 0, 1, 1, 3],
[0, 2, 3, 1, 3, 4, 2, 1, 4, 0],
[2, 1, 2, 4, 3, 0, 0, 4, 3, 1],
[2, 0, 1, 0, 3, 4, 4, 3, 2, 1]])
Given: array t has shape (n^2, 4). When n = 5, the first two columns (m_row, val) hold all 25 permutations pairs of 0 to 4.
The 1st column refers to rows of array m. The 2nd column refers to values in array m.
For now, the last two columns hold dummy value 99 that will be replaced.
t = np.array([[0, 0, 99, 99],
[0, 1, 99, 99],
[0, 2, 99, 99],
[0, 3, 99, 99],
[0, 4, 99, 99],
[1, 0, 99, 99],
[1, 1, 99, 99],
[1, 2, 99, 99],
[1, 3, 99, 99],
[1, 4, 99, 99],
[2, 0, 99, 99],
[2, 1, 99, 99],
[2, 2, 99, 99],
[2, 3, 99, 99],
[2, 4, 99, 99],
[3, 0, 99, 99],
[3, 1, 99, 99],
[3, 2, 99, 99],
[3, 3, 99, 99],
[3, 4, 99, 99],
[4, 0, 99, 99],
[4, 1, 99, 99],
[4, 2, 99, 99],
[4, 3, 99, 99],
[4, 4, 99, 99]])
PROBLEM: I want to replace the dummy values in the last two columns of t, as follows:
Let's consider t row [1, 3, 99, 99]. So from m's row=1, I determine the indices of the two columns that hold value 3. These are columns (4,9), so the t row is updated to [1, 3, 4, 9].
In the same way, t row [4, 2, 99, 99] becomes [4, 2, 0, 8].
I currently do this by looping through each column i of array m, looking for the two instances where m[m_row, i] = val, then updating array t. (slow!)
Is there a way to speed up this process, perhaps using vectorization or broadcasting?
Use the following code:
import itertools
# First 2 columns
t = np.array(list(itertools.product(range(m.shape[0]), repeat=2)))
# Add columns - indices of "wanted" elements
t = np.hstack((t, np.apply_along_axis(lambda row, arr:
np.nonzero(arr[row[0]] == row[1])[0], 1, t, m)))
The result, for your data sample (m array), is:
array([[0, 0, 4, 8],
[0, 1, 5, 7],
[0, 2, 1, 2],
[0, 3, 3, 6],
[0, 4, 0, 9],
[1, 0, 2, 6],
[1, 1, 7, 8],
[1, 2, 0, 5],
[1, 3, 4, 9],
[1, 4, 1, 3],
[2, 0, 0, 9],
[2, 1, 3, 7],
[2, 2, 1, 6],
[2, 3, 2, 4],
[2, 4, 5, 8],
[3, 0, 5, 6],
[3, 1, 1, 9],
[3, 2, 0, 2],
[3, 3, 4, 8],
[3, 4, 3, 7],
[4, 0, 1, 3],
[4, 1, 2, 9],
[4, 2, 0, 8],
[4, 3, 4, 7],
[4, 4, 5, 6]], dtype=int64)
Edit
The above code relies on the fact that each row in m contains
just 2 "wanted" values.
To make the code resistant to the case that some row contains either too many
or not enough "wanted" values (even none):
Define a function returning indices of "wanted" elements as:
def inds(row, arr):
ind = np.nonzero(arr[row[0]] == row[1])[0]
return np.pad(ind, (0,2), constant_values=99)[0:2]
Change the second instruction to:
t = np.hstack((t, np.apply_along_axis(inds, 1, t, m)))
To test this variant, change the first line of m to:
[4, 2, 2, 3, 5, 5, 3, 1, 5, 4]
i.e. it:
does not contain 0 elements,
contains only a single 1.
Then the initial part of the result is:
array([[ 0, 0, 99, 99],
[ 0, 1, 7, 99],
so that the missing indices in the result are filled with 99.

np array rows with unique elements

Consider the numpy array below. I'd hoping to find a fast way to remove rows not having 4 distinct values.
import numpy as np
D = np.array([[2, 3, 6, 7],
[2, 4, 3, 4],
[4, 9, 0, 1],
[5, 5, 2, 5],
[7, 5, 4, 8],
[7, 5, 4, 7]])
In the small sample array show, the output should be:
D = np.array([[2, 3, 6, 7],
[4, 9, 0, 1],
[7, 5, 4, 8]])
Here's one way -
In [94]: s = np.sort(D,axis=1)
In [95]: D[(s[:,:-1] == s[:,1:]).sum(1) ==0]
Out[95]:
array([[2, 3, 6, 7],
[4, 9, 0, 1],
[7, 5, 4, 8]])
Alternatively -
In [107]: D[~(s[:,:-1] == s[:,1:]).any(1)]
Out[107]:
array([[2, 3, 6, 7],
[4, 9, 0, 1],
[7, 5, 4, 8]])
Or -
In [112]: D[(s[:,:-1] != s[:,1:]).all(1)]
Out[112]:
array([[2, 3, 6, 7],
[4, 9, 0, 1],
[7, 5, 4, 8]])
With pandas -
In [121]: import pandas as pd
In [122]: D[pd.DataFrame(D).nunique(1)==4]
Out[122]:
array([[2, 3, 6, 7],
[4, 9, 0, 1],
[7, 5, 4, 8]])
A working answer with np.unique
I found no way to use the axis keyword in np.unique to get rid of the list compression, perhaps someone can help?
D[np.array([np.max(np.unique(_,return_counts=True)[-1]) for _ in D])==1]

Creating an array of 3D vectors where each element of each vector is from a given range

I am trying to implement an array of 3D vectors. All vectors are combination of element ranges. What I mean is:
array = [v_1, v_2, v_3,....]
v_j = [x_1, x_2, x_3] with x_i in [a, b].
The important thing for me is, that I want to have all possible combinations.
So for example let a = 1, b = 10. Then it should be something like:
v_1 = [1, 1, 1], v_2 = [1, 1, 2],...v_10 = [1, 1, 10]
and then the next one should be:
v_11 = [1, 2, 1], v_12 = [1, 2, 2]....
I tried it by using linspace but I just get the vectors where each element is equal i.e.
v_1 = [1, 1, 1], v_2 = [2, 2, 2]....
Is there an easy way to do this or do I have to do it by a lot of loops.
My linspace example was:
ffac = np.linspace(-1E-3, 1E-3, 100, endpoint=True)
for i in range(100):
eps = np.ones(shape=[100, ]) * ffac[i]
With a and b, we can make np.arange(a, b+1), and then use np.meshgrid:
xij = np.arange(a, b+1)
np.transpose(np.meshgrid(xij, xij, xij), (2,1,3,0))
For b=2, we obtain:
>>> np.transpose(np.meshgrid(xij, xij, xij), (2,1,3,0))
array([[[[1, 1, 1],
[1, 1, 2]],
[[1, 2, 1],
[1, 2, 2]]],
[[[2, 1, 1],
[2, 1, 2]],
[[2, 2, 1],
[2, 2, 2]]]])
For a vector of n options, the result is thus a n×n×n×3.
Or if you want to flatten it:
>>> np.transpose(np.meshgrid(xij, xij, xij), (2,1,3,0)).reshape(-1, 3)
array([[1, 1, 1],
[1, 1, 2],
[1, 2, 1],
[1, 2, 2],
[2, 1, 1],
[2, 1, 2],
[2, 2, 1],
[2, 2, 2]])

How to get all sub matrices of 2D array without numpy?

I need to get all submatrices of the 2D array and to do the manipulation for each submatrix. So I created example matrix:
M3 = [list(range(5)) for i in range(6)]
[[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4]]
I need to capture 3 rows and 3 columns and then shift this "window" till I get all submatrices. The first submatrix would be:
[[0, 1, 2],
[0, 1, 2],
[0, 1, 2]]
and the last one is:
[[2, 3, 4],
[2, 3, 4],
[2, 3, 4]]
For this matrix I need 12 submatrices. However, I become more using code with which I tried to solve the problem:
for j in range(len(M3[0])-3):
for i in range(len(M3)-3):
for row in M3[0+j:3+j]:
X_i_j = [row[0+i:3+i] for row in M3[0+j:3+j]]
print(X_i_j)
I get 18 but not 12 (with two duplicates of each submatrix):
[[0, 1, 2], [0, 1, 2], [0, 1, 2]]
[[0, 1, 2], [0, 1, 2], [0, 1, 2]]
[[0, 1, 2], [0, 1, 2], [0, 1, 2]]
[[1, 2, 3], [1, 2, 3], [1, 2, 3]]
[[1, 2, 3], [1, 2, 3], [1, 2, 3]]
[[1, 2, 3], [1, 2, 3], [1, 2, 3]]
...
[[2, 3, 4], [2, 3, 4], [2, 3, 4]]
[[2, 3, 4], [2, 3, 4], [2, 3, 4]]
And with this sample of code I get 6 submatrices with 1 duplicate for each:
for i in range(len(M3)-3):
for j in range(len(M3[0])-3):
X_i_j = [row[0+i:3+i] for row in M3[0+j:3+j]]
print(X_i_j)
I do not see what is wrong with it and why I get the duplicates. How can I get all sub matrices of 2D array without numpy for this case?
Your code is working ( with change of order of vars and constants ):
for j in range(len(M3)-2):
for i in range(len(M3[0])-2):
X_i_j = [row[0+i:3+i] for row in M3[0+j:3+j]]
print('=======')
for x in X_i_j:
print(x)
I would solve it slightly different.
a function to read y-number-of-rows
then a function to read x-number-of-columns from those rows, which then is your sub.
This would work for any (2D) array / sub-array
Sample:
def read_y_rows(array, rows, offset):
return array[offset:rows + offset]
def read_x_cols(array, cols, offset):
return list(row[offset:cols + offset] for row in array)
def get_sub_arrays(array, x_dim_cols, y_dim_rows):
"""
get 2D sub arrays by x_dim columns and y_dim rows
from 2D array (list of lists)
"""
result = []
for start_row in range(len(array) - y_dim_rows + 1):
y_rows = read_y_rows(array, y_dim_rows, start_row)
for start_col in range(len(max(array, key=len)) - x_dim_cols + 1):
x_columns = read_x_cols(y_rows, x_dim_cols, start_col)
result.append(x_columns)
return result
to use it you could do:
M3 = [list(range(5)) for i in range(6)]
sub_arrays = get_sub_arrays(M3, 3, 3) ## this would also work for 2x2 arrays
the sub_arrays is again a list of lists, containing all found subarrays, you could print them like this:
for sub_array in sub_arrays:
print()
for row in sub_array:
print(row)
I know it is a lot more code than above, just wanted to share this code.

a repeated permutation with limitations

I am trying to generate all possible combinations of certain values in an array of 15 which add up to 50.
$a = [3, 4, 1, 2, 5]
print $a.repeated_permutation(15).to_a
In this case,
[2,2,2,2,4,4,4,4,4,4,4,4,4,3,3]
[2,2,2,4,2,4,4,4,4,4,4,4,4,3,3]
[2,2,4,2,2,4,4,4,4,4,4,4,4,3,3]
are all possible answers.
After some investigation I realize the code to do this is a bit over my head, but I will leave the question up if it might help someone else.
For some reference as to what I am working on, Project Euler, problem 114. It's pretty difficult, and so I am attempting to solve only a single case where my 50-space-long grid is filled only with 3-unit-long blocks. The blocks must be separated by at least one blank, so I am counting the blocks as 4. This (with some tweaking, which I have left out as this is confusing enough already) allows for twelve blocks plus three single blanks, or a maximum of fifteen elements.
Approach
I think recursion is the way to go here, where your recursive method looks like this:
def recurse(n,t)
where
n is the number of elements required; and
t is the required total.
If we let #arr be the array of integers you are given, recurse(n,t) returns an array of all permutations of n elements from #arr that sum to t.
Assumption
I have assumed that the elements of #arr are non-negative integers, sorted by size, but the method can be easily modified if it includes negative integers (though performance will suffer). Without loss of generality, we can assume the elements of #arr are unique, sorted by increasing magnitude.
Code
def recurse(n,t)
if n == 1
#arr.include?(t) ? [[t]] : nil
else
#arr.each_with_object([]) do |i,a|
break if i > t # as elements of #arr are non-decreasing
if (ret = recurse(n-1,t-i))
ret.each { |b| a << [i,*b] }
end
end
end
end
Examples
#arr = [3, 4, 1, 2, 5].sort
#=> [1, 2, 3, 4, 5]
recurse(1,4)
#=> [[4]]
recurse(2,6)
#=> [[1, 5], [2, 4], [3, 3], [4, 2], [5, 1]]
recurse(3,10)
#=> [[1, 4, 5], [1, 5, 4], [2, 3, 5], [2, 4, 4], [2, 5, 3],
# [3, 2, 5], [3, 3, 4], [3, 4, 3], [3, 5, 2], [4, 1, 5],
# [4, 2, 4], [4, 3, 3], [4, 4, 2], [4, 5, 1], [5, 1, 4],
# [5, 2, 3], [5, 3, 2], [5, 4, 1]]
recurse(3,50)
#=> []
Improvement
We can do better, however, by first computing all combinations, and then computing the permutations of each of those combinations.
def combo_recurse(n,t,last=0)
ndx = #arr.index { |i| i >= last }
return nil if ndx.nil?
arr_above = #arr[ndx..-1]
if n == 1
arr_above.include?(t) ? [[t]] : nil
else
arr_above.each_with_object([]) do |i,a|
break if i > t # as elements of #arr are non-decreasing
if (ret = combo_recurse(n-1,t-i,i))
ret.each { |b| a << [i,*b] }
end
end
end
end
combo_recurse(1,4)
#=> [[4]]
combo_recurse(2,6)
#=> [[1, 5], [2, 4], [3, 3]]
combo_recurse(3,10)
#=> [[1, 4, 5], [2, 3, 5], [2, 4, 4], [3, 3, 4]]
combo_recurse(3,50)
#=> []
combo_recurse(15,50).size
#=> 132
combo_recurse(15,50).first(5)
#=> [[1, 1, 1, 1, 1, 1, 4, 5, 5, 5, 5, 5, 5, 5, 5],
# [1, 1, 1, 1, 1, 2, 3, 5, 5, 5, 5, 5, 5, 5, 5],
# [1, 1, 1, 1, 1, 2, 4, 4, 5, 5, 5, 5, 5, 5, 5],
# [1, 1, 1, 1, 1, 3, 3, 4, 5, 5, 5, 5, 5, 5, 5],
# [1, 1, 1, 1, 1, 3, 4, 4, 4, 5, 5, 5, 5, 5, 5]]
We can then compute the permutations from the combinations:
combo_recurse(2,6).flat_map { |a| a.permutation(a.size).to_a }.uniq
#=> [[1, 5], [5, 1], [2, 4], [4, 2], [3, 3]]
combo_recurse(3,10).flat_map { |a| a.permutation(a.size).to_a }.uniq
#=> [[1, 4, 5], [1, 5, 4], [4, 1, 5], [4, 5, 1], [5, 1, 4],
# [5, 4, 1], [2, 3, 5], [2, 5, 3], [3, 2, 5], [3, 5, 2],
# [5, 2, 3], [5, 3, 2], [2, 4, 4], [4, 2, 4], [4, 4, 2],
# [3, 3, 4], [3, 4, 3], [4, 3, 3]]
We can approximate the number of permutations for (15,50) (it will be somewhat high because uniq is not applied):
def factorial(n)
(1..n).reduce :*
end
Math.log10 combo_recurse(15,50).reduce(1) { |t,a| t*factorial(a.size) }
#=> 1599.3779486682888
That is, the result has about 1,600 digits. What platform will you be running this on?

Resources