Numpy: finding nonzero values along arbitrary dimension

Numpy: finding nonzero values along arbitrary dimension - arrays

It seems I just cannot solve this in Numpy: I have a matrix, with an arbitrary number of dimensions, ordered in an arbitrary way. Inside this matrix, there is always one dimension I am interested in (as I said, the position of this dimension is not always the same). Now, I want to find the first nonzero value along this dimension. In fact, I need the index of that value to perform some operations on the value itself.
An example: if my matrix a is n x m x p and the dimension I am interested in is number 1, I would do something like:
for ii in xrange(a.shape[0]):
for kk in xrange(a.shape[2]):
myview = np.squeeze(a[ii, :, kk])
firsti = np.nonzero(myview)[0][0]
myview[firsti] = dostuff
Apart from performance considerations, I really do not know how to do this having different number of dimensions, and having the dimension I am interested in an arbitrary position.

You can abuse np.argmax for your purpose. Here, you can specify the axis which you are interested in, where 0 is along columns, 1 is along rows, and so on. You just need an array which contains the same value for all elements that are not zero. You can achieve that by doing a != 0, as this will contain False (meaning 0) for all zero-elements and True (meaning 1) for all non-zero-elements. Now np.argmax(a != 0, axis=1) would give you the first non-zero element along the 1 axis.
For example:
import numpy as np
a = np.array([[0, 1, 4],[1, 0, 2],[0, 0, 1]])
# a = [[0, 1, 4],
# [1, 0, 2],
# [0, 0, 1]]
print(np.argmax(a!=0, axis=0))
# >>> array([1, 0, 0]) -> along columns
print(np.argmax(a!=0, axis=1))
# >>> array([1, 0, 2]) -> along rows
This will also work for higher dimension, but the output is less instructive, as it is hard to imagine.

Related

Slicing numpy arrays

mean = [0, 0]
cov = [[1, 0], [0, 100]]
gg = np.random.multivariate_normal(mean, cov, size = [5, 12])
I get an array which has 2 columns and 12 rows, i want to take the first column which will include all 12 rows and convert them to columns. What is the appropriate method for sclicing and how can one convet the result to columns? To be precise, looking at the screen (the second one) one should take all 0 column columns and convert them in a normal way from the left to the right
the results should be like this (the first screen)

The problem is that your array gg is not two- but three-dimensional. So, what you need is in fact the first column of each stacked 2D array. Here is an example:
import numpy as np
x = np.random.randint(0, 10, (3, 4, 5))
x[:, :, 0].flatten()
The colon in slicing means "all values in this dimension". So, x[:, :, 0] means "all values in the the first dimension and all values in the second dimension and with third dimension fixed on index 0". This results in a two-dimensional array, which you have to flatten additionally.

what does numpy ndarray shape do?

I have a simple question about the .shape function, which confused me a lot.
a = np.array([1, 2, 3]) # Create a rank 1 array
print(type(a)) # Prints "<class 'numpy.ndarray'>"
print(a.shape) # Prints "(3,)"
b = np.array([[1,2,3],[4,5,6]]) # Create a rank 2 array
print(b.shape) # Prints "(2, 3)"
What did the .shape exactly do? count how many rows, how many columns,
then the a.shape suppose to be, (1,3), one row three columns, right?

yourarray.shape or np.shape() or np.ma.shape() returns the shape of your ndarray as a tuple; And you can get the (number of) dimensions of your array using yourarray.ndim or np.ndim(). (i.e. it gives the n of the ndarray since all arrays in NumPy are just n-dimensional arrays (shortly called as ndarrays))
For a 1D array, the shape would be (n,) where n is the number of elements in your array.
For a 2D array, the shape would be (n,m) where n is the number of rows and m is the number of columns in your array.
Please note that in 1D case, the shape would simply be (n, ) instead of what you said as either (1, n) or (n, 1) for row and column vectors respectively.
This is to follow the convention that:
For 1D array, return a shape tuple with only 1 element (i.e. (n,))
For 2D array, return a shape tuple with only 2 elements (i.e. (n,m))
For 3D array, return a shape tuple with only 3 elements (i.e. (n,m,k))
For 4D array, return a shape tuple with only 4 elements (i.e. (n,m,k,j))
and so on.
Also, please see the example below to see how np.shape() or np.ma.shape() behaves with 1D arrays and scalars:
# sample array
In [10]: u = np.arange(10)
# get its shape
In [11]: np.shape(u) # u.shape
Out[11]: (10,)
# get array dimension using `np.ndim`
In [12]: np.ndim(u)
Out[12]: 1
In [13]: np.shape(np.mean(u))
Out[13]: () # empty tuple (to indicate that a scalar is a 0D array).
# check using `numpy.ndim`
In [14]: np.ndim(np.mean(u))
Out[14]: 0
P.S.: So, the shape tuple is consistent with our understanding of dimensions of space, at least mathematically.

Unlike it's most popular commercial competitor, numpy pretty much from the outset is about "arbitrary-dimensional" arrays, that's why the core class is called ndarray. You can check the dimensionality of a numpy array using the .ndim property. The .shape property is a tuple of length .ndim containing the length of each dimensions. Currently, numpy can handle up to 32 dimensions:
a = np.ones(32*(1,))
a
# array([[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[ 1.]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]])
a.shape
# (1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1)
a.ndim
# 32
If a numpy array happens to be 2d like your second example, then it's appropriate to think about it in terms of rows and columns. But a 1d array in numpy is truly 1d, no rows or columns.
If you want something like a row or column vector you can achieve this by creating a 2d array with one of its dimensions equal to 1.
a = np.array([[1,2,3]]) # a 'row vector'
b = np.array([[1],[2],[3]]) # a 'column vector'
# or if you don't want to type so many brackets:
b = np.array([[1,2,3]]).T

array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])

.shape() gives the actual shape of your array in terms of no of elements in it, No of rows/No of Columns.
The answer you get is in the form of tuples.
For Example:
1D ARRAY:
d=np.array([1,2,3,4])
print(d)
(1,)
Output: (4,)
ie the number4 denotes the no of elements in the 1D Array.
2D Array:
e=np.array([[1,2,3],[4,5,6]])
print(e)
(2,3)
Output: (2,3) ie the number of rows and the number of columns.
The number of elements in the final output will depend on the number of rows in the Array....it goes on increasing gradually.

Algorithm to fill an array randomly without collision

Say I have an array of N integers set to the value '0', and I want to pick a random element of that array that has the value '0' and put it to value '1'
How do I do this efficiently ?
I came up with 2 solutions but they look quite ineficient
First solution
int array[N] //init to 0s
int n //number of 1s we want to add to the array
int i = 0
while i < n
int a = random(0, N)
if array[a] == 0
array[a] = 1
i++
end if
end while
It would be extremely inefficient for large arrays because of the probability of collision
The second involves a list containing all the 0 positions remaining and we choose a random number between 0 and the number of 0 remaining to lookup the value in the list that correspond to the index in the array.
It's a lot more reliable than the first solution, since the number of operations is bounded, but still has a worst case scenario complexity of N² if we want to fill the array completely

Your second solution is actually a good start. I assume that it involves rebuilding the list of positions after every change, which makes it O(N²) if you want to fill the whole array. However, you don't need to rebuild the list every time. Since you want to fill the array anyway, you can just use a random order and choose the remaining positions accordingly.
As an example, take the following array (size 7 and not initially full of zeroes) : [0, 0, 1, 0, 1, 1, 0]
Once you have built the list of zeros positions, here [0, 1, 3, 6], just shuffle it to get a random ordering. Then fill in the array in the order given by the positions.
For example, if the shuffle gives [3, 1, 6, 0], then you can fill the array like so :
[0, 0, 1, 0, 1, 1, 0] <- initial configuration
[0, 0, 1, 1, 1, 1, 0] <- First, position 3
[0, 1, 1, 1, 1, 1, 0] <- Second, position 1
[0, 1, 1, 1, 1, 1, 1] <- etc.
[1, 1, 1, 1, 1, 1, 1]
If the array is initially filled with zeros, then it's even easier. Your initial list is the list of integers from 0 to N (size of the array). Shuffle it and apply the same process.
If you do not want to fill the whole array, you still need to build the whole list, but you can truncate it after shuffling it (which just means to stop filling the array after some point).
Of course, this solution requires that the array does not change between each step.

You can fill array with n ones and N-n zeros and make random shuffling.
Fisher-Yates shuffle has linear complexity:
for i from N−1 downto 1 do
j ← random integer such that 0 ≤ j ≤ i
exchange a[j] and a[i]

tensorflow creating mask of varied lengths

I have a tensor of lengths in tensorflow, let's say it looks like this:
[4, 3, 5, 2]
I wish to create a mask of 1s and 0s whose number of 1s correspond to the entries to this tensor, padded by 0s to a total length of 8. I.e. I want to create this tensor:
[[1,1,1,1,0,0,0,0],
[1,1,1,0,0,0,0,0],
[1,1,1,1,1,0,0,0],
[1,1,0,0,0,0,0,0]
]
How might I do this?

This can now be achieved by tf.sequence_mask. More details here.

This can be achieved using a variety of TensorFlow transformations:
# Make a 4 x 8 matrix where each row contains the length repeated 8 times.
lengths = [4, 3, 5, 2]
lengths_transposed = tf.expand_dims(lengths, 1)
# Make a 4 x 8 matrix where each row contains [0, 1, ..., 7]
range = tf.range(0, 8, 1)
range_row = tf.expand_dims(range, 0)
# Use the logical operations to create a mask
mask = tf.less(range_row, lengths_transposed)
# Use the select operation to select between 1 or 0 for each value.
result = tf.select(mask, tf.ones([4, 8]), tf.zeros([4, 8]))

I've got a bit shorter version, than previous answer. Not sure if it is more efficient or not
def mask(self, seq_length, max_seq_length):
return tf.map_fn(
lambda x: tf.pad(tf.ones([x], dtype=tf.int32), [[0, max_seq_length - x]]),
seq_length)

NumPy speed up setting elements of 2D Array

I am trying to efficiently index a 2D array in Python and have the problem that it is really slow.
This is what I tried (simplified example):
xSize = veryBigNumber
ySize = veryBigNumber
a = np.ones((xSize,ySize))
N = veryBigNumber
const = 1
for t in range(N):
for i in range(xSize):
for j in range(ySize):
a[i,j] *= f(i,j)*const # f(i,j) is an arbitrary function of i and j.
Now I would like to substitute the nested loop by something more efficient. How do I do this?

Your 2D array could be produced using the following addition:
np.arange(200)[:,np.newaxis] + np.arange(200)
This type of vectorised operation is likely to be very fast:
>>> %timeit np.arange(200)[:,np.newaxis] + np.arange(200)
1000 loops, best of 3: 178 µs per loop
This method in not limited to addition. We can use the two arrays in the above operation as the arguments of any universal function (commonly abbreviated to ufunc).
For example:
>>> np.multiply(np.arange(5)[:,np.newaxis], np.arange(5))
array([[ 0, 0, 0, 0, 0],
[ 0, 1, 2, 3, 4],
[ 0, 2, 4, 6, 8],
[ 0, 3, 6, 9, 12],
[ 0, 4, 8, 12, 16]])
NumPy has built in ufuncs for all the basic arithmetic operations and some more interesting ones too. If you need a more exotic function, NumPy allows you to make your own ufunc.
Edit: To quickly explain the broadcasting happening in this method; you can think of it like this...
np.arange(5) produces 1D array which looks like this:
array([0, 1, 2, 3, 4])
The code np.arange(5)[:,np.newaxis] adds a second dimension (columns) to the range, producing this 2D array:
array([[0],
[1],
[2],
[3],
[4]])
To create the final 5x5 array using np.multiply (although we could use any ufunc or binary arithmetic operation), NumPy takes the 0 in the second array and mutliplies it with each elements it the first array making a row like this:
[ 0, 0, 0, 0, 0]
It then takes the second element in the second array, 1, and multiplies it with the first array, producing this row:
[ 0, 1, 2, 3, 4]
This continues until we have the final 5x5 matrix.

You could use the indices routine:
b=np.indices(a.shape)
a=b[0]+b[1]
Timings:
%%timeit
...: b=np.indices(a.shape)
...: c=b[0]+b[1]
1000 loops, best of 3: 370 µs per loop
%%timeit
for i in range(200):
for j in range(200):
a[i,j] = i + j
100 loops, best of 3: 10.4 ms per loop

Since your output matrix a is the element-wise power of N of a matrix F with elements f_ij = f(i,j) * const your code can simplify to
F = np.empty((xSize, ySize))
for i in range(xSize):
for j in range(ySize):
F[i,j] = f(i,j) * const
a = F ** n
For even more speed you can exchange the creation of the F matrix with something more efficient, given that the function f(i,j) is vectorized:
xmap, ymap = numpy.meshgrid(range(xSize), range(ySize))
F = f(xmap, ymap) * const

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Numpy: finding nonzero values along arbitrary dimension - arrays

Related

Slicing numpy arrays

what does numpy ndarray shape do?

Algorithm to fill an array randomly without collision

tensorflow creating mask of varied lengths

NumPy speed up setting elements of 2D Array

Categories

Resources