Numpy array questions about .shape - arrays

I'm new to numpy, and I have some troubles in array shapes.
I want to operate the array like a matrix in matlab. However, I'm confused about the following things:
>>> b = np.array([[1,2],[3,4]])
array([[1, 2],
[3, 4]])
>>> c = b[:,1] # I want c is a column vector
>>> c.shape
(2,)
>>> d = b[1,:] # I want d is a row vector
>>> d.shape
>>> (2,)
I want to treat c and d as column vector and row vector respectively.
I don't understand why c and d have the same shape (2,).
So it troubles me in later calculations.
Could anyone help me deal with this problem. Thanks a lot !

Using a plain integer as an index returns that column/row as a true vector. This is similar to indexing a list - you only receive the element at that index. The containing dimension is stripped away:
>>> my_list = ['a', 'b', 'c', 'd']
>>> my_list[2]
'c'
Instead, you want a slice. A slice of a list is a (sub-) list, and a slice of a matrix is a matrix. With numpy, you can specify this either as slice notation using : or a sequence of indices:
>>> c = b[:,:1] # slice notation
>>> c.shape
(2, 1)
>>> d = b[[1],:] # sequence of indices
>>> d.shape
(1, 2)
Slice notation is for consecutive index ranges. For example, :1 means "everything from the start up to 1". Sequence notation is for non-consecutive index sets. For example, [0, 2] does skip index 1. If you just want a single index, sequence notation is simpler unless you are dealing with borders (first/last row/column).

You can use
c = b[:,[1]]
d = b[[1],:]
to get the vector as an explicit row/column vector:
c.shape # (1, 2)
d.shape # (2, 1)

In general, if you want your array c to be a column vector of shape (2,1), you can reshape it by:
c = c.reshape(-1,1) # c.shape --> (2, 1)
Similarly, if you want your array d to be a row vector of shape (1,2), you can reshape it by:
d = d.reshape(1,-1) # d.shape --> (1, 2)

Related

How to do a cartesian product of a variable number of lists in Julia?

For each value j in the set {1, 2, ..., n} where the value of n can vary (it is some variable in my program that can be different depending on the inputs from the user), I have an array A_j. I would like to obtain the cartesian product of all the arrays A_j, so that I can then iterate through that cartesian product (taking one element from each A_1, A_2, ... A_n to get a tuple (a_1, a_2, ..., a_n) in A_1 x A_2 x ... x A_n). How would I accomplish this in Julia?
Use Iterators.product:
help?> Iterators.product
product(iters...)
Return an iterator over the product of several iterators. Each generated
element is a tuple whose ith element comes from the ith argument iterator.
The first iterator changes the fastest.
Examples
≡≡≡≡≡≡≡≡≡≡
julia> collect(Iterators.product(1:2, 3:5))
2×3 Matrix{Tuple{Int64, Int64}}:
(1, 3) (1, 4) (1, 5)
(2, 3) (2, 4) (2, 5)

Python: Finding a numpy array in a list of numpy arrays

I have a list of 50 numpy arrays called vectors:
[array([0.1, 0.8, 0.03, 1.5], dtype=float32), array([1.2, 0.3, 0.1], dtype=float32), .......]
I also have a smaller list (means) of 10 numpy arrays, all of which are from the bigger list above. I want to loop though each array in means and find its position in vectors.
So when I do this:
for c in means:
print(vectors.index(c))
I get the error:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
I've gone through various SO questions and I know why I'm getting this error, but I can't find a solution. Any help?
Thanks!
One possible solution is converting to a list.
vectors = np.array([[1, 2, 3], [4, 5, 6], [7,8,9], [10,11,12]], np.int32)
print vectors.tolist().index([1,2,3])
This will return index 0, because [1,2,3] can be found in index 0 of vectors
Th example above is a 2d Numpy array, however you seem to have a list of numpy arrays,
so I would convert it to a list of lists this way:
vectors = [arr.tolist() for arr in vectors]
Do the same for means:
means = [arr.tolist() for arr in means]
Now we are working with two lists of lists:
So your original 'for loop' will work:
for c in means:
print(vectors.index(c))

Binning then sorting arrays in each bin but keeping their indices together

I have two arrays and the indices of these arrays are related. So x[0] is related to y[0], so they need to stay organized. I have binned the x array into two bins as shown in the code below.
x = [1,4,7,0,5]
y = [.1,.7,.6,.8,.3]
binx = [0,4,9]
index = np.digitize(x,binx)
Giving me the following:
In [1]: index
Out[1]: array([1, 2, 2, 1, 2])
So far so good. (I think)
The y array is a parameter telling me how well measured the x data point is, so .9 is better than .2, so I'm using the next code to sort out the best of the y array:
y.sort()
ysorted = y[int(len(y) * .5):]
which gives me:
In [2]: ysorted
Out[2]: [0.6, 0.7, 0.8]
giving me the last 50% of the array. Again, this is what I want.
My question is how do I combine these two operations? From each bin, I need to get the best 50% and put these new values into a new x and new y array. Again, keeping the indices of each array organized. Or is there an easier way to do this? I hope this makes sense.
Many numpy functions have arg... variants that don't operate "by value" but rather "by index". In your case argsort does what you want:
order = np.argsort(y)
# order is an array of indices such that
# y[order] is sorted
top50 = order[len(order) // 2 :]
top50x = x[top50]
# now top50x are the x corresponding 1-to-1 to the 50% best y
You should make a list of pairs from your x and y lists
It can be achieved with the zip function:
x = [1,4,7,0,5]
y = [.1,.7,.6,.8,.3]
values = zip(x, y)
values
[(1, 0.1), (4, 0.7), (7, 0.6), (0, 0.8), (5, 0.3)]
To sort such a list of pairs by a specific element of each pair you may use the sort's key parameter:
values.sort(key=lambda pair: pair[1])
[(1, 0.1), (5, 0.3), (7, 0.6), (4, 0.7), (0, 0.8)]
Then you may do whatever you want with this sorted list of pairs.

Replace zero array with new values one by one NumPy

I stuck with a simple question in NumPy. I have an array of zero values. Once I generate a new value I would like to add it one by one.
arr=array([0,0,0])
# something like this
l=[1,5,10]
for x in l:
arr.append(x) # from python logic
so I would like to add one by one x into array, so I would get: 1st iteration arr=([1,0,0]); 2d iteration arr=([1,5,0]); 3rd arr=([1,5,10]);
Basically I need to substitute zeros with new values one by one in NumPy (I am learning NumPy!!!!!!).
I checked many of NumPy options like np.append (it adds to existing values new values), but can't find the right.
thank you
There are a few things to pick up with numpy:
you can generate the array full of zeros with
>>> np.zeros(3)
array([ 0., 0., 0.])
You can get/set array elements with indexing as with lists etc:
arr[2] = 7
for i, val in enumerate([1, 5, 10]):
arr[i] = val
Or, if you want to fill with array with something like a list, you can directly use:
>>> np.array([1, 5, 10])
array([ 1, 5, 10])
Also, numpy's signature for appending stuff to an array is a bit different:
arr = np.append(arr, 7)
Having said that, you should just consider diving into Numpy's own userguide.

Looping through slices of Theano tensor

I have two 2D Theano tensors, call them x_1 and x_2, and suppose for the sake of example, both x_1 and x_2 have shape (1, 50). Now, to compute their mean squared error, I simply run:
T.sqr(x_1 - x_2).mean(axis = -1).
However, what I wanted to do was construct a new tensor that consists of their mean squared error in chunks of 10. In other words, since I'm more familiar with NumPy, what I had in mind was to create the following tensor M in Theano:
M = [theano.tensor.sqr(x_1[:, i:i+10] - x_2[:, i:i+10]).mean(axis = -1) for i in xrange(0, 50, 10)]
Now, since Theano doesn't have for loops, but instead uses scan (which map is a special case of), I thought I would try the following:
sequence = T.arange(0, 50, 10)
M = theano.map(lambda i: theano.tensor.sqr(x_1[:, i:i+10] - x_2[:, i:i+10]).mean(axis = -1), sequence)
However, this does not seem to work, as I get the error:
only integers, slices (:), ellipsis (...), numpy.newaxis (None) and integer or boolean arrays are valid indices
Is there a way to loop through the slices using theano.scan (or map)? Thanks in advance, as I'm new to Theano!
Similar to what can be done in numpy, a solution would be to reshape your (1, 50) tensor to a (1, 10, 5) tensor (or even a (10, 5) tensor), and then to compute the mean along the second axis.
To illustrate this with numpy, suppose I want to compute means by slices of 2
x = np.array([0, 2, 0, 4, 0, 6])
x = x.reshape([3, 2])
np.mean(x, axis=1)
outputs
array([ 1., 2., 3.])

Resources