what does numpy ndarray shape do?

what does numpy ndarray shape do? - arrays

I have a simple question about the .shape function, which confused me a lot.
a = np.array([1, 2, 3]) # Create a rank 1 array
print(type(a)) # Prints "<class 'numpy.ndarray'>"
print(a.shape) # Prints "(3,)"
b = np.array([[1,2,3],[4,5,6]]) # Create a rank 2 array
print(b.shape) # Prints "(2, 3)"
What did the .shape exactly do? count how many rows, how many columns,
then the a.shape suppose to be, (1,3), one row three columns, right?

yourarray.shape or np.shape() or np.ma.shape() returns the shape of your ndarray as a tuple; And you can get the (number of) dimensions of your array using yourarray.ndim or np.ndim(). (i.e. it gives the n of the ndarray since all arrays in NumPy are just n-dimensional arrays (shortly called as ndarrays))
For a 1D array, the shape would be (n,) where n is the number of elements in your array.
For a 2D array, the shape would be (n,m) where n is the number of rows and m is the number of columns in your array.
Please note that in 1D case, the shape would simply be (n, ) instead of what you said as either (1, n) or (n, 1) for row and column vectors respectively.
This is to follow the convention that:
For 1D array, return a shape tuple with only 1 element (i.e. (n,))
For 2D array, return a shape tuple with only 2 elements (i.e. (n,m))
For 3D array, return a shape tuple with only 3 elements (i.e. (n,m,k))
For 4D array, return a shape tuple with only 4 elements (i.e. (n,m,k,j))
and so on.
Also, please see the example below to see how np.shape() or np.ma.shape() behaves with 1D arrays and scalars:
# sample array
In [10]: u = np.arange(10)
# get its shape
In [11]: np.shape(u) # u.shape
Out[11]: (10,)
# get array dimension using `np.ndim`
In [12]: np.ndim(u)
Out[12]: 1
In [13]: np.shape(np.mean(u))
Out[13]: () # empty tuple (to indicate that a scalar is a 0D array).
# check using `numpy.ndim`
In [14]: np.ndim(np.mean(u))
Out[14]: 0
P.S.: So, the shape tuple is consistent with our understanding of dimensions of space, at least mathematically.

Unlike it's most popular commercial competitor, numpy pretty much from the outset is about "arbitrary-dimensional" arrays, that's why the core class is called ndarray. You can check the dimensionality of a numpy array using the .ndim property. The .shape property is a tuple of length .ndim containing the length of each dimensions. Currently, numpy can handle up to 32 dimensions:
a = np.ones(32*(1,))
a
# array([[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[ 1.]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]])
a.shape
# (1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1)
a.ndim
# 32
If a numpy array happens to be 2d like your second example, then it's appropriate to think about it in terms of rows and columns. But a 1d array in numpy is truly 1d, no rows or columns.
If you want something like a row or column vector you can achieve this by creating a 2d array with one of its dimensions equal to 1.
a = np.array([[1,2,3]]) # a 'row vector'
b = np.array([[1],[2],[3]]) # a 'column vector'
# or if you don't want to type so many brackets:
b = np.array([[1,2,3]]).T

array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])

.shape() gives the actual shape of your array in terms of no of elements in it, No of rows/No of Columns.
The answer you get is in the form of tuples.
For Example:
1D ARRAY:
d=np.array([1,2,3,4])
print(d)
(1,)
Output: (4,)
ie the number4 denotes the no of elements in the 1D Array.
2D Array:
e=np.array([[1,2,3],[4,5,6]])
print(e)
(2,3)
Output: (2,3) ie the number of rows and the number of columns.
The number of elements in the final output will depend on the number of rows in the Array....it goes on increasing gradually.

Related

How to understand the index of np.array

I am learning python numpy.array and am confused about how the index works. Let's see I have the following 3x4 2D array:
A = np.array([[ 1, 2, 3, 4],
[ 5, 6, 7, 8],
[ 9,10,11,12]])
If I want to extract the 1 from this array, I need to input the index of that number, which is A[0,0]
Out of curiosity I also tried the following
B = A[[0,0]]
C = A[[0],[0]]
B turns out to be a 2x4 2D array:
array([[1, 2, 3, 4],
[1, 2, 3, 4]])
C turns out to be a 1D array of 1 element:
array([1])
I am wondering how indexing of B and C works and why I obtain those arrays?

In B, you are only giving one index for a 2 dimensional array which is [0,0]. So it will return the element in the first dimension of the index given (0 and 0 here).
So, for the first index (which is 0) it will return the first element in the first dimension which is [1,2,3,4] and it will go for the next index given which is again 0, so it will print two [1,2,3,4] as you have got.
Next in the C, you have given 2 indices for a 2 dimensional array which are [0] and [0]. So it will go through the first dimension for the index 0 which is [1,2,3,4] and in that element it will return the 0th position which is [1] as you have got.
For better understanding, let's see another case A[[0,1],2].
Here, we have given 2 indices for a 2 dimensional array which are [0,1] and 2. So, we get the elements which are in the index [0,2] and next with [1,2].Th output will be [3,7].
The thing is it will iterate through all possible combinations of indices given and return those values in those indices.

array in array to array in numpy

Dear friends in stack overflow,
I have trouble calculation with Numpy and Sympy. A is defined by
import numpy as np
import sympy as sym
sym.var('x y')
f = sym.Matrix([0,x,y])
func = sym.lambdify( (x,y), f, "numpy")
X=np.array([1,2,3])
Y=np.array((1,2,3])
A = func(X,Y).
Here, X and Y are just examples. In general, X and Y are one dimensional array in numpy, and they have the same length. Then, A’s output is
array([[0],
[array([1, 2, 3])],
[array([1, 2, 3])]], dtype=object).
But, I'd like to get this as
np.array([[0,0,0],[1,2,3],[1,2,3]]).
If we call this B, How do you convert A to B automatically. B’s first column is filled by 0, and it has the same length with X and Y.
Do you have any ideas?

First let's make sure we understand what is happening:
In [52]: x, y = symbols('x y')
In [54]: f = Matrix([0,x,y])
...: func = lambdify( (x,y), f, "numpy")
In [55]: f
Out[55]:
⎡0⎤
⎢ ⎥
⎢x⎥
⎢ ⎥
⎣y⎦
In [56]: print(func.__doc__)
Created with lambdify. Signature:
func(x, y)
Expression:
Matrix([[0], [x], [y]])
Source code:
def _lambdifygenerated(x, y):
return (array([[0], [x], [y]]))
See how the numpy function looks just like the sympy, replacing sym.Matrix with np.array. lambdify just does a lexographic translation; it does not have a deep knowledge of the differences between the languages.
With scalars the func runs as expected:
In [57]: func(1,2)
Out[57]:
array([[0],
[1],
[2]])
With arrays the results is this ragged array (new enough numpy adds this warning:
In [59]: func(np.array([1,2,3]),np.array([1,2,3]))
<lambdifygenerated-2>:2: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
return (array([[0], [x], [y]]))
Out[59]:
array([[0],
[array([1, 2, 3])],
[array([1, 2, 3])]], dtype=object)
If you don't know numpy, sympy is not a short cut to filling in your knowledge gaps.
The simplest fix is to replace original 0 with another symbol.
Even in sympy, the 0 is not expanded:
In [65]: f.subs({x:Matrix([[1,2,3]]), y:Matrix([[4,5,6]])})
Out[65]:
⎡ 0 ⎤
⎢ ⎥
⎢[1 2 3]⎥
⎢ ⎥
⎣[4 5 6]⎦
In [74]: Matrix([[0,0,0],[1,2,3],[4,5,6]])
Out[74]:
⎡0 0 0⎤
⎢ ⎥
⎢1 2 3⎥
⎢ ⎥
⎣4 5 6⎦
In [75]: Matrix([[0],[1,2,3],[4,5,6]])
...
ValueError: mismatched dimensions
To make the desired array in numpy we have to do something like:
In [71]: arr = np.zeros((3,3), int)
In [72]: arr[1:,:] = [[1,2,3],[4,5,6]]
In [73]: arr
Out[73]:
array([[0, 0, 0],
[1, 2, 3],
[4, 5, 6]])
That is, initial the array and fill selected rows. There isn't simple expression that will do the desired 'automaticlly fill the first row with 0', much less something that can be naively translated from sympy.

Slicing numpy arrays

mean = [0, 0]
cov = [[1, 0], [0, 100]]
gg = np.random.multivariate_normal(mean, cov, size = [5, 12])
I get an array which has 2 columns and 12 rows, i want to take the first column which will include all 12 rows and convert them to columns. What is the appropriate method for sclicing and how can one convet the result to columns? To be precise, looking at the screen (the second one) one should take all 0 column columns and convert them in a normal way from the left to the right
the results should be like this (the first screen)

The problem is that your array gg is not two- but three-dimensional. So, what you need is in fact the first column of each stacked 2D array. Here is an example:
import numpy as np
x = np.random.randint(0, 10, (3, 4, 5))
x[:, :, 0].flatten()
The colon in slicing means "all values in this dimension". So, x[:, :, 0] means "all values in the the first dimension and all values in the second dimension and with third dimension fixed on index 0". This results in a two-dimensional array, which you have to flatten additionally.

Answering Queries on List of Subarrays

Given an array A of size N, we construct a list containing all possible subarrays of A in descending order.
Two subarrays B and C are compare by padding zeroes until both are of size N. Then, we compare the two subarrays element by element and return as soon as a point of difference is observed.
We are given multiple queries where given x we have to find the maximum element in the xth subarray sorted according to the order given above.
For example, if the array A is [3, 1, 2, 4]; then the sorted subarrays will be:
[4]
[3, 1, 2, 4]
[3, 1, 2]
[3, 1]
[3]
[2, 4]
[2]
[1, 2, 4]
[1, 2]
[1]
A query where x = 3 corresponds to finding the maximum element in the subarray [3, 1, 2]; so here the answer would be 3.
Since the number of queries are large (of the order of 10^5) and the number of elements in the array can also be large (of the order of 10^5), we would need to do some preprocessing to answer each query in O(1) or O(log N) or O(sqrt N) time. I can't seem to figure out how to do this. I have solved it for when the array contains unique elements, however how could we do this for when the array contains repetitions? Is there any data structure which could help in storing the required information?

Build suffix array in back order for this array (consider it like string)
For every entry store it's length and cumulative count (sum of lengths from the beginning of suffix array)
For query find needed index by binary search for cumulative counts, and get needed prefix of found suffix
For your examples suffixes with cumul.counts are
4 (0)
3124 (1)
34 (5)
124 (7)
query 3 finds entry 3124 (1<=3<5), and gets 3-1=2-nd (by length) prefix = 312

Numpy: finding nonzero values along arbitrary dimension

It seems I just cannot solve this in Numpy: I have a matrix, with an arbitrary number of dimensions, ordered in an arbitrary way. Inside this matrix, there is always one dimension I am interested in (as I said, the position of this dimension is not always the same). Now, I want to find the first nonzero value along this dimension. In fact, I need the index of that value to perform some operations on the value itself.
An example: if my matrix a is n x m x p and the dimension I am interested in is number 1, I would do something like:
for ii in xrange(a.shape[0]):
for kk in xrange(a.shape[2]):
myview = np.squeeze(a[ii, :, kk])
firsti = np.nonzero(myview)[0][0]
myview[firsti] = dostuff
Apart from performance considerations, I really do not know how to do this having different number of dimensions, and having the dimension I am interested in an arbitrary position.

You can abuse np.argmax for your purpose. Here, you can specify the axis which you are interested in, where 0 is along columns, 1 is along rows, and so on. You just need an array which contains the same value for all elements that are not zero. You can achieve that by doing a != 0, as this will contain False (meaning 0) for all zero-elements and True (meaning 1) for all non-zero-elements. Now np.argmax(a != 0, axis=1) would give you the first non-zero element along the 1 axis.
For example:
import numpy as np
a = np.array([[0, 1, 4],[1, 0, 2],[0, 0, 1]])
# a = [[0, 1, 4],
# [1, 0, 2],
# [0, 0, 1]]
print(np.argmax(a!=0, axis=0))
# >>> array([1, 0, 0]) -> along columns
print(np.argmax(a!=0, axis=1))
# >>> array([1, 0, 2]) -> along rows
This will also work for higher dimension, but the output is less instructive, as it is hard to imagine.