slice 2D numpy array based on condition

slice 2D numpy array based on condition - arrays

I have an numpy array
import numpy as np
a = np.array([
[999, 999, 999, 999, 999, 999, 999, 999, 999, 999],
[999, 999, 999, 1, 2, 3, 4, 999, 999, 999],
[999, 999, 999, 5, 6, 7, 8, 999, 999, 999],
[999, 999, 999, 9, 10, 11, 12, 999, 999, 999],
[999, 999, 999, 999, 999, 999, 999, 999, 999, 999]])
how to return the filtered values, containing only the different values than 999 using numpy slicing?
filtered = np.where(a != 999)
In [5]: filtered
Out[5]:
(array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4,
4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6,
6, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 9, 9,
9, 9, 9, 9, 9, 9, 9, 9]),
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2,
3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5,
6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8,
9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1,
2, 3, 4, 5, 6, 7, 8, 9]))
Desired output:
output = np.array([
[1, 2, 3, 4],
[5, 6, 7, 8],
[9, 10, 11, 12]])

You can do the following:
>>> mask = (a!=999)
>>> dim1 = np.any(mask, axis=1).sum()
>>> a[mask].reshape(dim1, -1)
array([[ 1, 2, 3, 4],
[ 5, 6, 7, 8],
[ 9, 10, 11, 12]])
This of course assume that you only have a single contiguous box in the whole array.

Yours is a special case, because the subarray is rectangular. You can get the flat values using fancy indexing:
>>> a[filtered]
array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])
And if you know the shape already, you can reshape that:
>>> a[filtered].reshape(3,4)
array([[ 1, 2, 3, 4],
[ 5, 6, 7, 8],
[ 9, 10, 11, 12]])
However, there can be no guarantee that the input data will leave you with a rectangular array after the filtering in the general case. Consider, for example, what output array should look like if the input array had a[0,0] == 13.

You can also do this. Create a 2D mask using the condition. Typecast the condition mask to int or float, depending on the array, and multiply it with the original array.
In [8]: arr
Out[8]:
array([[ 1., 2., 3., 4., 5.],
[ 6., 7., 8., 9., 10.]])
In [9]: arr*(arr % 2 == 0).astype(np.int)
Out[9]:
array([[ 0., 2., 0., 4., 0.],
[ 6., 0., 8., 0., 10.]])

Related

Sort the rows of the array by the value of the element of the main diagonal in each of the rows (in the initial array)

Sort the rows of the array by the value of the element of the main diagonal in each of the rows (in the initial array)
[[3, 2, 7, 1, 3, 7, 2, 6, 4, 8],
[5, 3, 7, 1, 1, 1, 6, 4, 6, 7],
[1, 9, 7, 8, 2, 1, 3, 7, 9, 8],
[1, 7, 3, 7, 6, 6, 6, 8, 4, 8],
[4, 2, 3, 2, 2, 3, 2, 4, 7, 6]]
There is such an array, how should it look as a result?

numpy arrays: building a 3d array by adding 2d slices one at a time

Looking for some help with numpy and building a 3d array from multiply 2d arrays. I want to make a loop, such that on every iteration I make a new 2d array and make it a new slice in an existing 3d array. Here's my code sample.
import numpy as np
import random
import array
a = np.random.randint(0, 9, size=(10, 10)) <-- make random 10x10 matrix
b = a <-- save copy
a = np.random.randint(0, 9, size=(10, 10)) <-- make random 10x10 matrix
a.shape
(10, 10) <-- verify it's 10x10
b.shape
(10, 10) <-- verify it's 10x10
b = np.array([b, a]) <-- convert two 2d matrix into one 3d matrix
b.shape
(2, 10, 10) <-- verify it's a 3d matrix with two planes
a = np.random.randint(0, 9, size=(10, 10)) <-- make new random 10x10 matrix
b = np.array([b, a]) <-- add new 2d plane to the 3d matrix
b.shape
(2,) <-- should be (3, 10, 10)
Can anyone see what I'm doing wrong?

When you combine two arrays by using np.array([...]), they have to be the same shape. If they aren't numpy treats them not as numpy arrays, but as dumb/blind objects. There should have been a warning when you ran the last b = np.array([b, a]):
VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
Instead, use np.stack
b = np.stack([*b, a])
*b basically expands the children of b, so the above is equivalent to b = np.stack([b[0], b[1], a])
Or you can use np.vstack (vertical stack):
b = np.vstack([b, a[None]])
a[None] basically wraps a in another array. a.shape == (10, 10), a[None].shape == (1, 10, 10)
Both of the above produce the following:
>>> b.shape
(3, 10, 10)
>>> b
array([[[3, 8, 0, 2, 8, 0, 0, 5, 7, 7],
[0, 5, 2, 8, 8, 2, 1, 4, 5, 8],
[3, 2, 2, 4, 1, 8, 2, 0, 7, 5],
[5, 6, 5, 0, 8, 7, 4, 0, 4, 6],
[6, 2, 3, 7, 4, 3, 6, 6, 4, 8],
[2, 5, 1, 7, 1, 3, 0, 6, 0, 5],
[3, 4, 0, 7, 3, 4, 5, 0, 7, 4],
[0, 7, 2, 8, 7, 7, 4, 3, 2, 6],
[4, 6, 2, 5, 5, 8, 5, 8, 0, 8],
[3, 4, 1, 0, 3, 7, 0, 6, 7, 3]],
[[4, 0, 6, 2, 4, 4, 7, 0, 7, 2],
[5, 8, 5, 8, 2, 8, 3, 7, 4, 6],
[2, 1, 2, 0, 4, 5, 6, 3, 0, 0],
[8, 7, 3, 0, 8, 8, 0, 4, 1, 4],
[0, 2, 5, 7, 5, 3, 0, 5, 1, 7],
[1, 5, 8, 0, 2, 6, 5, 0, 3, 2],
[4, 4, 4, 3, 3, 8, 6, 6, 5, 5],
[5, 3, 6, 8, 0, 3, 0, 8, 8, 3],
[4, 2, 6, 6, 6, 2, 0, 0, 6, 2],
[7, 3, 8, 0, 7, 1, 1, 8, 6, 2]],
[[6, 6, 1, 1, 6, 4, 6, 2, 6, 7],
[0, 5, 6, 7, 5, 0, 0, 5, 8, 2],
[6, 6, 1, 5, 2, 3, 2, 3, 3, 2],
[0, 3, 7, 6, 4, 5, 3, 1, 7, 2],
[7, 6, 3, 0, 1, 7, 8, 3, 8, 5],
[3, 1, 8, 6, 1, 5, 0, 8, 6, 1],
[1, 4, 8, 1, 7, 0, 1, 1, 5, 3],
[2, 1, 4, 8, 2, 3, 1, 6, 8, 7],
[8, 1, 1, 0, 6, 1, 0, 6, 1, 6],
[1, 8, 4, 7, 7, 5, 0, 3, 8, 6]]])

array rows where the random-integer elements may have different ranges

Consider the following code fragment:
import numpy as np
mask = np.array([True, True, False, True, True, False])
val = np.array([9, 3])
arr = np.random.randint(1, 9, size = (5,len(mask)))
As expected, we get an array of random integers, 1 to 9, with 5 rows and 6 columns as below. The val array has not been used yet.
[[2, 7, 6, 9, 7, 5],
[7, 2, 9, 7, 8, 3],
[9, 1, 3, 5, 7, 3],
[5, 7, 4, 4, 5, 2],
[7, 7, 9, 6, 9, 8]]
Now I'll introduce val = [9, 3].
Where mask = True, I want the row element to be taken randomly from 1 to 9.
Where mask = False, I want the row element to be taken randomly from 1 to 3.
How can this be done efficiently? A sample output is shown below.
[[2, 7, 2, 9, 7, 1],
[7, 2, 1, 7, 8, 3],
[9, 1, 3, 5, 7, 3],
[5, 7, 1, 4, 5, 2],
[7, 7, 2, 6, 9, 1]]

One idea is to sample randomly between 0 to 1, then multiply with 9 or 3 depending on mask, and finally add 1 to move the sample.
rand = np.random.rand(5,len(mask))
is3 = (1-mask).astype(int)
# out is random from 0-8 or 0-2 depending on `is3`
out = (rand*val[is3]).astype(int)
# move out by `1`:
out = (out + 1)
Output:
array([[4, 9, 3, 6, 2, 1],
[1, 8, 2, 7, 1, 3],
[8, 2, 1, 2, 3, 2],
[4, 3, 2, 2, 3, 2],
[5, 8, 1, 5, 6, 1]])

Looping through a collection and deleting things on the way

I want to go through a collection and find the first pair of matching elements, but my current approach is having trouble with the indexing going out of bounds all the time.
Here's a simplified MWE example:
function processstuff(stuff)
for pointer1 in 1:length(stuff)
for pointer2 in pointer1:length(stuff)
println("$(stuff)")
pointer1 == pointer2 && continue
if stuff[pointer1] == stuff[pointer2]
# items match, remove them
deleteat!(stuff, pointer1)
deleteat!(stuff, pointer2)
end
end
end
end
processstuff(collect(rand(1:5, 20)))
[1, 4, 3, 3, 2, 4, 5, 2, 2, 2, 3, 1, 2, 1, 2, 4, 3, 2, 1, 1]
[1, 4, 3, 3, 2, 4, 5, 2, 2, 2, 3, 1, 2, 1, 2, 4, 3, 2, 1, 1]
[1, 4, 3, 3, 2, 4, 5, 2, 2, 2, 3, 1, 2, 1, 2, 4, 3, 2, 1, 1]
[1, 4, 3, 3, 2, 4, 5, 2, 2, 2, 3, 1, 2, 1, 2, 4, 3, 2, 1, 1]
[1, 4, 3, 3, 2, 4, 5, 2, 2, 2, 3, 1, 2, 1, 2, 4, 3, 2, 1, 1]
[1, 4, 3, 3, 2, 4, 5, 2, 2, 2, 3, 1, 2, 1, 2, 4, 3, 2, 1, 1]
[1, 4, 3, 3, 2, 4, 5, 2, 2, 2, 3, 1, 2, 1, 2, 4, 3, 2, 1, 1]
[1, 4, 3, 3, 2, 4, 5, 2, 2, 2, 3, 1, 2, 1, 2, 4, 3, 2, 1, 1]
[1, 4, 3, 3, 2, 4, 5, 2, 2, 2, 3, 1, 2, 1, 2, 4, 3, 2, 1, 1]
[1, 4, 3, 3, 2, 4, 5, 2, 2, 2, 3, 1, 2, 1, 2, 4, 3, 2, 1, 1]
[1, 4, 3, 3, 2, 4, 5, 2, 2, 2, 3, 1, 2, 1, 2, 4, 3, 2, 1, 1]
[1, 4, 3, 3, 2, 4, 5, 2, 2, 2, 3, 1, 2, 1, 2, 4, 3, 2, 1, 1]
[4, 3, 3, 2, 4, 5, 2, 2, 2, 3, 1, 1, 2, 4, 3, 2, 1, 1]
[4, 3, 3, 2, 4, 5, 2, 2, 2, 3, 1, 1, 2, 4, 3, 2, 1, 1]
[3, 3, 2, 4, 5, 2, 2, 2, 3, 1, 1, 2, 4, 2, 1, 1]
[3, 3, 2, 4, 5, 2, 2, 2, 3, 1, 1, 2, 4, 2, 1, 1]
[3, 3, 2, 4, 5, 2, 2, 2, 3, 1, 1, 2, 4, 2, 1, 1]
ERROR: LoadError: BoundsError: attempt to access 16-element Array{Int64,1} at index [17]
(Obviously this example is just comparing two numbers, the real comparison isn't.)
The idea of updating the collection of stuff by removing both elements that have been processed looks like it works, because I think Julia updates the iteration thing each time through. But only for a while...?

You can use the following approach (assuming you want to remove pairs):
function processstuff!(stuff)
pointer1 = 1
while pointer1 < length(stuff)
for pointer2 in pointer1+1:length(stuff)
if stuff[pointer1] == stuff[pointer2]
deleteat!(stuff, (pointer1, pointer2))
pointer1 -= 1 # correct pointer location as we later add 1 to it
break
end
end
pointer1 += 1
end
end
In your code there were several problems:
you called deleteat! twice, which could invalidate indexing
your inner loop tried to delete pointer1 several times
in outer loop I use while to dynamically track changing size of stuff

Indexing highest value of numpy matrix

I have a numpy array of shape (4, 7) like this:
array([[ 1, 4, 5, 7, 8, 6, 7]
[ 2, 23, 2, 4, 8, 94, 2],
[ 1, 5, 6, 7, 10, 15, 20],
[ 3, 9, 2, 7, 6, 5, 4]])
I would like to get the index of the highest element, i.e. 94, in a form like: first row fifth column. Thus the output should be a numpy array ([1,5]) (matlab-style).

You get the index of the maximum index using arr.argmax() but to get the actual row and column you must use np.unravel_index as below:
import numpy as np
arr = np.array([[ 1, 4, 5, 7, 8, 6, 7],
[ 2, 23, 2, 4, 8, 94, 2],
[ 1, 5, 6, 7, 10, 15, 20],
[ 3, 9, 2, 7, 6, 5, 4]])
maximum = np.unravel_index(arr.argmax(), arr.shape)
print(maximum)
# (1, 5)
You have to use np.unravel_index as by default np.argmax will return the index from a flattened array (which in your case would be index 12).