I have two boolean numpy arrays of similar shape like:
a=[[True,True,False,False]]
b=[[True,False,True,False]]
How can I get an array c where 1 indicates that only array a is true,
2 indicates that only array b is true, 0 where both arrays are false and nan where both are true. So in this case the result should be [[nan,1,2,0]]].
You could use np.select:
In [20]: a = np.array([True,True,False,False])
In [21]: b = np.array([True,False,True,False])
In [23]: np.select([a&~b, b&~a, a&b], [1, 2, np.nan], default=0)
Out[23]: array([ nan, 1., 2., 0.])
You could use np.where -
np.where(a*b,np.nan,(2*b + a))
Sample run -
In [60]: a
Out[60]: array([[ True, True, False, False]], dtype=bool)
In [61]: b
Out[61]: array([[ True, False, True, False]], dtype=bool)
In [62]: np.where(a*b,np.nan,(2*b + a))
Out[62]: array([[ nan, 1., 2., 0.]])
Related
I have a dataset of 2D audio data. These audio fragments differ in length, hence I'm using Awkward Array. Through a Boolean mask, I want to only return the parts containing speech.
Table mask attempt
import numpy as np
import awkward as aw
awk = aw.fromiter([{"ch0": np.array([0, 1, 2]), "ch1": np.array([3, 4, 5])},
{"ch0": np.array([6, .7]), "ch1": np.array([8, 9])}])
# [{'ch0': [0.0, 1.0, 2.0], 'ch1': [3, 4, 5]},
# {'ch0': [6.0, 0.7], 'ch1': [8, 9]}]
awk_mask = aw.fromiter([{"op": np.array([False, True, False]), "cl": np.array([True, True, False])},
{"op": np.array([True, True]), "cl": np.array([True, False])}])
# [{'cl': [True, True, False], 'op': [False, True, False]},
# {'cl': [True, False], 'op': [True, True]}]
awk[awk_mask]
# TypeError: cannot interpret dtype [('cl', 'O'), ('op', 'O')] as a fancy index or mask
It seems that a Table cannot be used for fancy indexing.
Array mask attempts
Numpy equivalent
nparr = np.arange(0,6).reshape((2, -1))
# array([[0, 1, 2],
# [3, 4, 5]])
npmask = np.array([True, False, True])
nparr[:, npmask]
# array([[0, 2],
# [3, 5]])
Table version attempt; failed
awk[:, npmask]
# NotImplementedError: multidimensional index through a Table (TODO: needed for [0, n) -> [0, m) -> "table" -> ...)
Seems multidimensional selection is not implemented yet.
JaggedArray - Numpy mask version; works
jarr = aw.fromiter(nparr)
# <JaggedArray [[0 1 2] [3 4 5]] at 0x..>
jarr[:npmask]
# array([[0, 2],
# [3, 5]])
JaggedArray - JaggedArray mask version; works
jmask = aw.fromiter(npmask)
# array([ True, False, True])
jarr[:, jmask]
# array([[0, 2],
# [3, 5]])
Questions
How to do efficient boolean mask selection with Table or with named dimensions (like xarray)?
Will multidimensional selection in Table be implemented in awkward-array, or only in awkward-1.0?
Library versions
print("numpy version : ", np.__version__) # numpy version : 1.17.3
print("pandas version : ", pd.__version__) # pandas version : 0.25.3
print("awkward version : ", aw.__version__) # awkward version : 0.12.14
This is not with named array dimensions, but with only JaggedArrays, masked selection is possible:
jarr_2d = aw.fromiter([[np.array([0, 1, 2]), np.array([3, 4, 5])],
[np.array([6, 7]), np.array([8, 9])]])
# <JaggedArray [[[0 1 2] [3 4 5]] [[6 7] [8 9]]] at 0x7fc9c7c4e750>
jarr_2d_mask = aw.fromiter([[np.array([False, True, False]), np.array([True, True, False])],
[np.array([True, True]), np.array([True, False])]])
# <JaggedArray [[[False True False] [True True False]] [[True True] [True False]]] at 0x7fc9c7c1e590>
jarr_2d[jarr_2d_mask]
# <JaggedArray [[[1] [3 4]] [[6 7] [8]]] at 0x7fc9c7c5b690>
Not sure if this code is efficient? Especially compared to fancy indexing with only Numpy arrays?
I have an array of coordinates:
>> b
array([[11, 1],
[45, 10],
[-4, 5],
[ 8, 9]])
And I want to check whether each x value is between 4 and 15 and each y value is between 1 and 7. If a pair of coordinates qualifies, then a True should be added to the list, else False. And this should give me
array([True, False, False, False])
I am aware I can do this using list comprehension, but is there a faster/neater way to do it?
((b >= [4, 1]) & (b <= [15, 7])).all(axis=1)
Out: array([ True, False, False, False])
I don't understand one example in this numpy tutorial.
a = np.arange(12).reshape(3,4)
b1 = np.array([False, True, True])
b2 = np.array([True, False, True, False])
Then why will a[b1,b2] return array([4, 10])? Shouldn't it return array([[4, 6], [8, 10]])?
Any detailed explanation is appreciated!
When you index an array with multiple arrays, it indexes with pairs of elements from the indexing arrays
>>> a
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
>>> b1
array([False, True, True], dtype=bool)
>>> b2
array([ True, False, True, False], dtype=bool)
>>> a[b1, b2]
array([ 4, 10])
Notice that this is equivalent to:
>>> a[(1, 2), (0, 2)]
array([ 4, 10])
which are the elements at a[1, 0] and a[2, 2]
>>> a[1, 0]
4
>>> a[2, 2]
10
Because of this pairwise behavior, you cannot in general index with separate length arrays (they have to be able to broadcast). So this example is sort of an accident since both indexing arrays have two indices where they are True; if one had three True values for example, you'd get an error:
>>> b3 = np.array([True, True, True, False])
>>> a[b1, b3]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: shape mismatch: indexing arrays could not be broadcast together with shapes (2,) (3,)
So this is specifically letting you know that the indexing arrays must be able to be broadcast together (so that it can chip off indices together in a smart way; e.g. if one indexing array just had a single value, that would be repeated with each value from the other indexing array).
To get the results you expect, you could index the result separately:
>>> a[b1][:, b2]
array([[ 4, 6],
[ 8, 10]])
Otherwise, you could also turn your index array into a 2D array with the same shape as a, but note that if you do that the result will be a linear array (since any number of elements could be pulled out, which of course might not be square):
>>> a[np.outer(b1, b2)]
array([ 4, 6, 8, 10])
The indices of true for the first array are
>>> i = np.where(b1)
>>> i
array([1,2])
For the second array they are
>>> j = np.where(b2)
>>> j
array([0,1])
Using these index masks together,
>>> a[i,j]
array([4, 10])
Another way to apply a general boolean 2D mask on a 2D numpy array is the following:
Use matrix element-wise multiplication:
import numpy as np
n = 100
mask = np.identity(n)
data = np.random.rand(n,n)
data_masked = data * mask
In this random example, you are keeping only the elements on the diagonal. The mask could be any n by n matrix though.
In numpy, if I have a boolean array, I can use it to select elements of another array:
>>> import numpy as np
>>> x = np.array([1, 2, 3])
>>> idx = np.array([True, False, True])
>>> x[idx]
array([1, 3])
I need to do this in theano. This is what I tried, but I got an unexpected result.
>>> from theano import tensor as T
>>> x = T.vector()
>>> idx = T.ivector()
>>> y = x[idx]
>>> y.eval({x: np.array([1,2,3]), idx: np.array([True, False, True])})
array([ 2., 1., 2.])
Can someone explain the theano result and suggest how to get the numpy result? I need to know how to do this in order to properly instantiate a 'givens' argument in a theano function declaration. Thanks in advance.
This is not supported in theano:
We do not support boolean masks, as Theano does not have a boolean type (we use int8 for the output of logic operators).
Theano indexing with a “mask” (incorrect approach):
>>> t = theano.tensor.arange(9).reshape((3,3))
>>> t[t > 4].eval() # an array with shape (3, 3, 3)
...
Getting a Theano result like NumPy:
>>> t[(t > 4).nonzero()].eval()
array([5, 6, 7, 8])
So you need y = x[idx.nonzero()]
For example:
scala> val my_array = Array(4,5,Double.NaN,6,5,6, Double.NaN)
my_array: Array[Double] = Array(4.0, 5.0, NaN, 6.0, 5.0, 6.0, NaN)
scala> my_array.count(_ == Double.NaN)
res13: Int = 0
I understand that two Double.NaN are not equal to each other
scala> Double.NaN == Double.NaN
res14: Boolean = false
and therefore, I get the result that I get, but I can't find a function that would tell me the number of Double.NaNs, what am I missing?
In python the behaviour would look like this:
In [43]: import numpy as np
In [44]: a = np.array([5,np.nan,5,7,4,np.nan])
In [45]: np.isnan(a)
Out[45]: array([False, True, False, False, False, True], dtype=bool)
In [46]: np.isnan(a).sum()
Out[46]: 2
Double.isNan does the job:
scala> val array = Array(4,5,Double.NaN,6,5,6, Double.NaN)
array: Array[Double] = Array(4.0, 5.0, NaN, 6.0, 5.0, 6.0, NaN)
scala> array.count(_.isNaN)
res0: Int = 2