Slicing numpy arrays - arrays

mean = [0, 0]
cov = [[1, 0], [0, 100]]
gg = np.random.multivariate_normal(mean, cov, size = [5, 12])
I get an array which has 2 columns and 12 rows, i want to take the first column which will include all 12 rows and convert them to columns. What is the appropriate method for sclicing and how can one convet the result to columns? To be precise, looking at the screen (the second one) one should take all 0 column columns and convert them in a normal way from the left to the right
the results should be like this (the first screen)

The problem is that your array gg is not two- but three-dimensional. So, what you need is in fact the first column of each stacked 2D array. Here is an example:
import numpy as np
x = np.random.randint(0, 10, (3, 4, 5))
x[:, :, 0].flatten()
The colon in slicing means "all values in this dimension". So, x[:, :, 0] means "all values in the the first dimension and all values in the second dimension and with third dimension fixed on index 0". This results in a two-dimensional array, which you have to flatten additionally.

Related

Compare two matrix and find the matched index in the corresponding column

So I have two arrays. For each element in column i in array_1, I want to find whether the same element exists in column i in array_2. If it exists, I want the corresponding position/index in column i in array_2. If it doesn't exist, return False.
Instead of using for loop, I am wondering if there is any vectorized function that would do this task efficiently?
Below are two sample arrays and my desired output.
array_1 = np.array([[1, 2], [2 , np.nan]])
array_2 = np.array([[2, np.nan], [1 , 3 ]])
My desired output:
np.array([[1, False], [0 , 0]])

A question that involves permutations of pairs of row elements

Consider two numpy arrays of integers. U has 2 columns and shows all (p,q) where p<q. For this question, I'll restrict myself to 0<=p,q<=5. The cardinality of U is C(6,2) = 15.
U = [[0,1],
[0,2],
[0,3],
[0,4],
[0,5],
[1,2],
[1,3],
[1,4],
[1,5],
[2,3],
[2,4],
[2,5],
[3,4],
[3,5],
[4,5]]
The 2nd array, V, has 6 columns. I formed it by finding the cartesian product UxUxU. So, the first row of V is [0,1,0,1,0,1], and the last row is [4,5,4,5,4,5]. The cardinality of V is C(6,2)^3 = 3375.
A SMALL SAMPLE of V, used in my question, is shown below. The elements of each row should be thought of as 3 pairs. The rationale follows.
V = [[0,1, 2,5, 2,4],
[0,1, 2,5, 2,5],
[0,1, 2,5, 3,4],
[0,1, 2,5, 3,5],
[0,1, 2,5, 4,0],
[0,1, 2,5, 4,1]]
Here's why the row elements should be thought of as a set of 3 pairs: Later in my code, I will loop through each row of V, using the pair values to 'swap' columns of a matrix M. (M is not shown because it isn't needed for this question) When we get to row [0,1, 2,5, 2,4], for example, we will swap the columns of M having indices 0 & 1, THEN swap the columns having indices 2 & 5, and finally, swap the columns having indices 2 & 4.
I'm currently wasting a lot of time because many of the rows of V could be eliminated.
The easiest case to understand involves V rows like [0,1, 2,5, 3,4] where all values are unique. This row has 6 pair permutations, but they all have the same net effect on M. Their values are unique, so none of the swaps will encounter 'interference' from another swap.
Question 1: How can I efficiently eliminate rows that have unique elements in unneeded permutations?
I would keep, say, [0,1, 2,5, 3,4], but drop:
[0,1, 3,4, 2,5],
[2,5, 0,1, 3,4],
[2,5, 3,4, 0,1],
[3,4, 0,1, 2,5],
[3,4, 2,5, 0,1]
I'm guessing a solution would involve np.sort and np.unique, but I'm struggling with getting a good result.
Question 2: (I don't think it's reasonable to expect an answer to this question, but I'd certainly appreciate any pointers or tips re resources that I could study) The question involves rows of V having one or more common elements, like [0,1, 2,5, 2,4] or [0,5, 2,5, 2,4] or [0,5, 2,5, 3,5]. All of these have 6 pair permutations, but they don't all have the same effect of M. The row [0,1, 2,5, 2,4], for example, has 3 permutations that produce one M outcome, and 3 permutations that produce another. Ideally, I would like to keep two of the rows but eliminate the other four. The two other rows I showed are even more 'pathological'.
Does anyone see a path forward here that would allow more eliminations of V rows? If not, I'll continue what I'm currently doing even though it's really inefficient - screening the code's final outputs for doubles.
To get rows of an array, without repetitions (in your sense), you can run:
VbyRows = V[np.lexsort(V[:, ::-1].T)]
sorted_data = np.sort(VbyRows, axis=1)
result = VbyRows[np.append([True], np.any(np.diff(sorted_data, axis=0), 1))]
Details:
VbyRows = V[np.lexsort(V[:, ::-1].T)] - sort rows by all columns.
I used ::-1 as the column index to sort first on the first column,
then by the second, and so on.
sorted_data = np.sort(VbyRows, axis=1) - sort each row from VbyRows
(and save it as a separate array).
np.diff(sorted_data, axis=0) - compute "vertical" differences between
previous and current row (in sorted_data).
np.any(...) - A bool vector - "cumulative difference indicator" for
each row from sorted_data but the first (does it differ from the
previous row on any position).
np.append([True], ...) - prepend the above result with True (an
indicator that the first row should be included in the result).
The result is also a bool vector, this time for all rows. Each element
of this row answers the question: Should the respective row from VbyRows
be included in the result.
result = VbyRows[np.append([True], np.any(np.diff(sorted_data, axis=0), 1))] -
the final result.
To test the above code I prepared V as follows:
array([[ 0, 1, 2, 5, 3, 4],
[ 0, 1, 3, 4, 2, 5],
[ 2, 5, 0, 1, 3, 4],
[ 2, 5, 3, 4, 0, 1],
[ 3, 4, 0, 1, 2, 5],
[13, 14, 12, 15, 10, 11],
[ 3, 4, 2, 5, 0, 1]])
(the last but one row is "other", all remaining rows contain the same
numbers in various order).
The result is:
array([[ 0, 1, 2, 5, 3, 4],
[13, 14, 12, 15, 10, 11]])
Note that lexsort as the first step provides that from rows with
the same set of numbers the returned row will be the first from rows
sorted by consecutive columns.

how to get the numbers of columns and rows of an array

I have come accross a line code that actually works for the work I am doing but I do not understand it. I would like someone to please explain what it means.
b=(3,1,2,1)
a=2
q=np.zeros(b+(a,))
I would like to know why length of q is always the first entry of b.
for example len(q)=3
if b=(1,2,4,3) then len(q)=1
This is really confusing as I thought that the function 'len' returns the number of columns of a given array. Also, how do I get the number of rows of q. So far the only specifications I have found are len(q), q.size( which gives the total number of elements in q) and q.shape(which also I do not quite get the output, because in the latter case, q.shape=(b,a)=(1,2,4,3,2).
Is there function that could return the size of the array in terms of the numberof columns and rows? for example 24x2?
Thank you in advance.
In Python a array does only have one dimension, that's why len(array) returns a single number.
Assuming that you have a 'matrix' in form of array of arrays, like this:
1 2 3
4 5 6
7 8 9
declared like
mat = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
you can determine the 'number of columns and rows' by the following commands:
rows = len(mat)
columns = len(mat[0])
Note that it only works if number of elements in each row is constant
If you are using numpy to make the arrays, another way to get the column rows and columns is using the tuple from the np.shape() function. Here is a complete example:
import numpy as np
mat = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
rownum = np.shape(mat)[0]
colnum = np.shape(mat)[1]

what does numpy ndarray shape do?

I have a simple question about the .shape function, which confused me a lot.
a = np.array([1, 2, 3]) # Create a rank 1 array
print(type(a)) # Prints "<class 'numpy.ndarray'>"
print(a.shape) # Prints "(3,)"
b = np.array([[1,2,3],[4,5,6]]) # Create a rank 2 array
print(b.shape) # Prints "(2, 3)"
What did the .shape exactly do? count how many rows, how many columns,
then the a.shape suppose to be, (1,3), one row three columns, right?
yourarray.shape or np.shape() or np.ma.shape() returns the shape of your ndarray as a tuple; And you can get the (number of) dimensions of your array using yourarray.ndim or np.ndim(). (i.e. it gives the n of the ndarray since all arrays in NumPy are just n-dimensional arrays (shortly called as ndarrays))
For a 1D array, the shape would be (n,) where n is the number of elements in your array.
For a 2D array, the shape would be (n,m) where n is the number of rows and m is the number of columns in your array.
Please note that in 1D case, the shape would simply be (n, ) instead of what you said as either (1, n) or (n, 1) for row and column vectors respectively.
This is to follow the convention that:
For 1D array, return a shape tuple with only 1 element (i.e. (n,))
For 2D array, return a shape tuple with only 2 elements (i.e. (n,m))
For 3D array, return a shape tuple with only 3 elements (i.e. (n,m,k))
For 4D array, return a shape tuple with only 4 elements (i.e. (n,m,k,j))
and so on.
Also, please see the example below to see how np.shape() or np.ma.shape() behaves with 1D arrays and scalars:
# sample array
In [10]: u = np.arange(10)
# get its shape
In [11]: np.shape(u) # u.shape
Out[11]: (10,)
# get array dimension using `np.ndim`
In [12]: np.ndim(u)
Out[12]: 1
In [13]: np.shape(np.mean(u))
Out[13]: () # empty tuple (to indicate that a scalar is a 0D array).
# check using `numpy.ndim`
In [14]: np.ndim(np.mean(u))
Out[14]: 0
P.S.: So, the shape tuple is consistent with our understanding of dimensions of space, at least mathematically.
Unlike it's most popular commercial competitor, numpy pretty much from the outset is about "arbitrary-dimensional" arrays, that's why the core class is called ndarray. You can check the dimensionality of a numpy array using the .ndim property. The .shape property is a tuple of length .ndim containing the length of each dimensions. Currently, numpy can handle up to 32 dimensions:
a = np.ones(32*(1,))
a
# array([[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[ 1.]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]])
a.shape
# (1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1)
a.ndim
# 32
If a numpy array happens to be 2d like your second example, then it's appropriate to think about it in terms of rows and columns. But a 1d array in numpy is truly 1d, no rows or columns.
If you want something like a row or column vector you can achieve this by creating a 2d array with one of its dimensions equal to 1.
a = np.array([[1,2,3]]) # a 'row vector'
b = np.array([[1],[2],[3]]) # a 'column vector'
# or if you don't want to type so many brackets:
b = np.array([[1,2,3]]).T
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
.shape() gives the actual shape of your array in terms of no of elements in it, No of rows/No of Columns.
The answer you get is in the form of tuples.
For Example:
1D ARRAY:
d=np.array([1,2,3,4])
print(d)
(1,)
Output: (4,)
ie the number4 denotes the no of elements in the 1D Array.
2D Array:
e=np.array([[1,2,3],[4,5,6]])
print(e)
(2,3)
Output: (2,3) ie the number of rows and the number of columns.
The number of elements in the final output will depend on the number of rows in the Array....it goes on increasing gradually.

Numpy: finding nonzero values along arbitrary dimension

It seems I just cannot solve this in Numpy: I have a matrix, with an arbitrary number of dimensions, ordered in an arbitrary way. Inside this matrix, there is always one dimension I am interested in (as I said, the position of this dimension is not always the same). Now, I want to find the first nonzero value along this dimension. In fact, I need the index of that value to perform some operations on the value itself.
An example: if my matrix a is n x m x p and the dimension I am interested in is number 1, I would do something like:
for ii in xrange(a.shape[0]):
for kk in xrange(a.shape[2]):
myview = np.squeeze(a[ii, :, kk])
firsti = np.nonzero(myview)[0][0]
myview[firsti] = dostuff
Apart from performance considerations, I really do not know how to do this having different number of dimensions, and having the dimension I am interested in an arbitrary position.
You can abuse np.argmax for your purpose. Here, you can specify the axis which you are interested in, where 0 is along columns, 1 is along rows, and so on. You just need an array which contains the same value for all elements that are not zero. You can achieve that by doing a != 0, as this will contain False (meaning 0) for all zero-elements and True (meaning 1) for all non-zero-elements. Now np.argmax(a != 0, axis=1) would give you the first non-zero element along the 1 axis.
For example:
import numpy as np
a = np.array([[0, 1, 4],[1, 0, 2],[0, 0, 1]])
# a = [[0, 1, 4],
# [1, 0, 2],
# [0, 0, 1]]
print(np.argmax(a!=0, axis=0))
# >>> array([1, 0, 0]) -> along columns
print(np.argmax(a!=0, axis=1))
# >>> array([1, 0, 2]) -> along rows
This will also work for higher dimension, but the output is less instructive, as it is hard to imagine.

Resources