np.asarray error: could not broadcast input array from shape (2,2) into shape (2) - arrays

I am experimenting with influence functions to understand blackbox models. I am encountering broadcast error while working with a toy dataset of 2 features and 2 classes. Below, I have summarized the actual error using two lists a1 and a2.
a1 = [array([[-0.00491985, 0.00491965],
[-0.00334969, 0.00334955],
[-0.00136081, 0.00136076]], dtype=float32),
array([-0.00104678, 0.00104674], dtype=float32)]
a2 =
[array([[-0.00334969, 0.00334955],
[-0.00136081, 0.00136076]], dtype=float32),
array([-0.00104678, 0.00104674], dtype=float32)]
I am trying to convert the above two lists into arrays using np.asarray()
print(np.asarray(a1))
array([array([[-0.00491985, 0.00491965],
[-0.00334969, 0.00334955],
[-0.00136081, 0.00136076]], dtype=float32),
array([-0.00104678, 0.00104674], dtype=float32)], dtype=object)
While np.asarray(a1) works fine, np.asarray(a2) throws the following error:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-51-3060768e9016> in <module>()
----> 1 np.asarray(a2)
/home/devi/.local/lib/python3.5/site-packages/numpy/core/numeric.py in asarray(a, dtype, order)
536
537 """
--> 538 return array(a, dtype, copy=False, order=order)
539
540
ValueError: could not broadcast input array from shape (2,2) into shape (2)
I went through many forums describing broadcasting errors but still could not figure out the working style of np.asarray().
When the elements of list are arrays of dimensions (3x2)and (1x2), np.asarray() returns an array of length 2. Whereas, when the elements are of dimensions (2x2) and (1x2), why does it throw an error? instead of returning an array of length 2 as in the previous case.. Any help to understand the same will be greatly appreciated!

First you need to reshape all arrays to have the same number dimentions.
And then you should convert it to a numpy array
a2 = [a.reshape(-1, 2) for a in a2]
a2 = np.array(a2)

Related

in golang what does the [:] syntax differ from the array assignment?

I am currently going through the GoLang tutorials and have the following doubt.
arr1:=[...]int{1,2,3}
arr2:=arr1
arr1[1]=99
fmt.Println(arr1)
fmt.Println(arr2)
it outputs the following statements
[1 99 3]
[1 2 3]
here only array a is modified, which makes sense as an array is treated as values.
if I try following things get confusing
a:=[...]int{1,2,3}
b:=a[:]
a[1]=88
fmt.Println(a)
fmt.Println(b)
this results in printing
[1 88 3]
[1 88 3]
Question: does this mean saying b:=a creates a copy of the array and saying b:=a[:] will create a slice that will point to the underlying array ('a' in this case)?
Slicing does not copy the slice's data. It creates a new slice value
that points to the original array. This makes slice operations as
efficient as manipulating array indices. Therefore, modifying the
elements (not the slice itself) of a re-slice modifies the elements of
the original slice
https://blog.golang.org/slices-intro
Check above link for internal structure behind Slice

Python ValueError: setting an array element with a sequence. while using SVM in scikit-learn

I have been working on scikit-learn SVMs for a binary classification problem. I have calculated the features of images and store it in array. This is how each row in a array looks like:
[variable(0.16749821603298187) variable(0.15862827003002167)
variable(0.15818320214748383) ..., variable(0.2765314280986786)
variable(0.2909393608570099) variable(0.2909393608570099)]
shape of X_train_svm is (6, 7290) and Y_train is (6,)
So when I print X_train_svm and Y_train I get exact values in an array. But when I use
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score
classifier=SVC(kernel='linear',random_state=0)
classifier.fit(X_train_svm,Y_train)
get the error saying
ValueError Traceback (most recent call last)
<ipython-input-145-a957b86fe2dc> in <module>
2 from sklearn.metrics import accuracy_score
3 classifier=SVC(kernel='linear',random_state=0)
----> 4 classifier.fit(X_train_svm,Y_train)
c:\users\s121293.squ\appdata\local\programs\python\python35\lib\site-
packages\numpy\core\numeric.py in asarray(a, dtype, order)
480
481 """
--> 482 return array(a, dtype, copy=False, order=order)
483
484 def asanyarray(a, dtype=None, order=None):
ValueError: setting an array element with a sequence.
Can someone help me as to what I can do now? I am really not sure what is happening inside. Both the dimensions of X_train and Y_train are same.
Note: I have a feeling that something might be wrong while I convert the object to a numpy array. Thanks in advance.
Edit: X_Train_svm is looking like the following
[[variable(0.16749821603298187) variable(0.15862827003002167)
variable(0.15818320214748383) ..., variable(0.2765314280986786)
variable(0.2909393608570099) variable(0.2909393608570099)]
..............................................................
[variable(0.22378747165203094) variable(0.22378747165203094)
variable(0.20569562911987305) ..., variable(0.29241225123405457)
variable(0.31552478671073914) variable(0.31552478671073914)]]
y_train is the label
[0 0 0 1 1 1]
i have used the following code to convert the features in fully-connected layer for SVM classifier
X_train_SVM=Fc1_output
print(Y_train)
print(X_train_SVM.shape)
Y_train_svm=np.reshape(Y_train,(6,1))
####### SVM ######################
clf = SVC(gamma=0.01,C=10,kernel='poly')
clf.fit(X_train_SVM,Y_train_svm)
And all my images are of same size ie i resized to 224x224

Inconsistent Results - Jupyter Numpy & Transpose

enter image description here
I am getting odd behavior with Jupyter/Numpy/Tranpose()/1D Arrays.
I found another post where transpose() will not transpose a 1D array, but in previous Jupyter notebooks, it does.
I have an example where it is inconsistent, and I do not understand:
Please see the picture attached of my jupyter notebook if 2 more or less identical arrays with 2 different outputs.
It seems it IS and IS NOT transposing the 1D array. Inconsistency is bad
outputs is (1000,) and (1,1000), why does this occur?
# GENERATE WAVEORM:
#---------------------------------------------------------------------------------------------------
N = 1000
fxc = []
fxn = []
for t in range(0,N):
fxc.append(A1*m.sin(2.0*pi*50.0*dt*t) + A2*m.sin(2.0*pi*120.0*dt*t))
fxn.append(A1*m.sin(2.0*pi*50.0*dt*t) + A2*m.sin(2.0*pi*120.0*dt*t) + 5*np.random.normal(u,std,size=1))
#---------------------------------------------------------------------------------------------------
# TAKE TRANSPOSE:
#---------------------------------
fc = np.transpose(np.array(fxc))
fn = np.transpose(np.array(fxn))
#---------------------------------
# PRINT DIMENSION:
#---------------------------------
print(fc.shape)
print(fn.shape)
#---------------------------------
Remove size=1 from your call to numpy.random.normal. Then it will return a scalar instead of a 1-d array of length 1.
For example,
In [2]: np.random.normal(0, 3, size=1)
Out[2]: array([0.47058288])
In [3]: np.random.normal(0, 3)
Out[3]: 4.350733438283539
Using size=1 in your code is a problem, because it results in fxn being a list of 1-d arrays (e.g. something like [[0.123], [-.4123], [0.9455], ...]. When NumPy converts that to an array, it has shape (N, 1). Transposing such an array results in the shape (1, N).
fxc, on the other hand, is a list of scalars (e.g. something like [0.123, 0.456, ...]). When converted to a NumPy array, it will be a 1-d array with shape (N,). NumPy's transpose operation swaps dimensions, but it does not create new dimensions, so transposing a 1-d array does nothing.

Python 2.7: looping over 1D fibers in a multidimensional Numpy array

I am looking for a way to loop over 1D fibers (row, column, and multi-dimensional equivalents) along any dimension in a 3+-dimensional array.
In a 2D array this is fairly trivial since the fibers are rows and columns, so just saying for row in A gets the job done. But for 3D arrays for example, this expression iterates over 2D slices, not 1D fibers.
A working solution is the one below:
import numpy as np
A = np.arange(27).reshape((3,3,3))
func = np.sum
for fiber_index in np.ndindex(A.shape[:-1]):
print func(A[fiber_index])
However, I am wondering whether there is something that is:
More idiomatic
Faster
Hope you can help!
I think you might be looking for numpy.apply_along_axis
In [10]: def my_func(x):
...: return x**2 + x
In [11]: np.apply_along_axis(my_func, 2, A)
Out[11]:
array([[[ 0, 2, 6],
[ 12, 20, 30],
[ 42, 56, 72]],
[[ 90, 110, 132],
[156, 182, 210],
[240, 272, 306]],
[[342, 380, 420],
[462, 506, 552],
[600, 650, 702]]])
Although many NumPy functions (including sum) have their own axis argument to specify which axis to use:
In [12]: np.sum(A, axis=2)
Out[12]:
array([[ 3, 12, 21],
[30, 39, 48],
[57, 66, 75]])
numpy provides a number of different ways of looping over 1 or more dimensions.
Your example:
func = np.sum
for fiber_index in np.ndindex(A.shape[:-1]):
print func(fiber_index)
print A[fiber_index]
produces something like:
(0, 0)
[0 1 2]
(0, 1)
[3 4 5]
(0, 2)
[6 7 8]
...
generates all index combinations over the 1st 2 dim, giving your function the 1D fiber on the last.
Look at the code for ndindex. It's instructive. I tried to extract it's essence in https://stackoverflow.com/a/25097271/901925.
It uses as_strided to generate a dummy matrix over which an nditer iterate. It uses the 'multi_index' mode to generate an index set, rather than elements of that dummy. The iteration itself is done with a __next__ method. This is the same style of indexing that is currently used in numpy compiled code.
http://docs.scipy.org/doc/numpy-dev/reference/arrays.nditer.html
Iterating Over Arrays has good explanation, including an example of doing so in cython.
Many functions, among them sum, max, product, let you specify which axis (axes) you want to iterate over. Your example, with sum, can be written as:
np.sum(A, axis=-1)
np.sum(A, axis=(1,2)) # sum over 2 axes
An equivalent is
np.add.reduce(A, axis=-1)
np.add is a ufunc, and reduce specifies an iteration mode. There are many other ufunc, and other iteration modes - accumulate, reduceat. You can also define your own ufunc.
xnx suggests
np.apply_along_axis(np.sum, 2, A)
It's worth digging through apply_along_axis to see how it steps through the dimensions of A. In your example, it steps over all possible i,j in a while loop, calculating:
outarr[(i,j)] = np.sum(A[(i, j, slice(None))])
Including slice objects in the indexing tuple is a nice trick. Note that it edits a list, and then converts it to a tuple for indexing. That's because tuples are immutable.
Your iteration can applied along any axis by rolling that axis to the end. This is a 'cheap' operation since it just changes the strides.
def with_ndindex(A, func, ax=-1):
# apply func along axis ax
A = np.rollaxis(A, ax, A.ndim) # roll ax to end (changes strides)
shape = A.shape[:-1]
B = np.empty(shape,dtype=A.dtype)
for ii in np.ndindex(shape):
B[ii] = func(A[ii])
return B
I did some timings on 3x3x3, 10x10x10 and 100x100x100 A arrays. This np.ndindex approach is consistently a third faster than the apply_along_axis approach. Direct use of np.sum(A, -1) is much faster.
So if func is limited to operating on a 1D fiber (unlike sum), then the ndindex approach is a good choice.

Despite many examples online, I cannot get my MATLAB repmat equivalent working in python

I am trying to do some numpy matrix math because I need to replicate the repmat function from MATLAB. I know there are a thousand examples online, but I cannot seem to get any of them working.
The following is the code I am trying to run:
def getDMap(image, mapSize):
newSize = (float(mapSize[0]) / float(image.shape[1]), float(mapSize[1]) / float(image.shape[0]))
sm = cv.resize(image, (0,0), fx=newSize[0], fy=newSize[1])
for j in range(0, sm.shape[1]):
for i in range(0, sm.shape[0]):
dmap = sm[:,:,:]-np.array([np.tile(sm[j,i,:], (len(sm[0]), len(sm[1]))) for k in xrange(len(sm[2]))])
return dmap
The function getDMap(image, mapSize) expects an OpenCV2 HSV image as its image argument, which is a numpy array with 3 dimensions: [:,:,:]. It also expects a tuple with 2 elements as its imSize argument, of course making sure the function passing the arguments takes into account that in numpy arrays the rows and colums are swapped (not: x, y, but: y, x).
newSize then contains a tuple containing fracions that are used to resize the input image to a specific scale, and sm becomes a resized version of the input image. This all works fine.
This is my goal:
The following line:
np.array([np.tile(sm[i,j,:], (len(sm[0]), len(sm[1]))) for k in xrange(len(sm[2]))]),
should function equivalent to the MATLAB expression:
repmat(sm(j,i,:),[size(sm,1) size(sm,2)]),
This is my problem:
Testing this, an OpenCV2 image with dimensions 800x479x3 is passed as the image argument, and (64, 48) (a tuple) is passed as the imSize argument.
However when testing this, I get the following ValueError:
dmap = sm[:,:,:]-np.array([np.tile(sm[i,j,:], (len(sm[0]),
len(sm[1]))) for k in xrange(len(sm[2]))])
ValueError: operands could not be broadcast together with
shapes (48,64,3) (64,64,192)
So it seems that the array dimensions do not match and numpy has a problem with that. But my question is what? And how do I get this working?
These 2 calculations match:
octave:26> sm=reshape(1:12,2,2,3)
octave:27> x=repmat(sm(1,2,:),[size(sm,1) size(sm,2)])
octave:28> x(:,:,2)
7 7
7 7
In [45]: sm=np.arange(1,13).reshape(2,2,3,order='F')
In [46]: x=np.tile(sm[0,1,:],[sm.shape[0],sm.shape[1],1])
In [47]: x[:,:,1]
Out[47]:
array([[7, 7],
[7, 7]])
This runs:
sm[:,:,:]-np.array([np.tile(sm[0,1,:], (2,2,1)) for k in xrange(3)])
But it produces a (3,2,2,3) array, with replication on the 1st dimension. I don't think you want that k loop.
What's the intent with?
for i in ...:
for j in ...:
data = ...
You'll only get results from the last iteration. Did you want data += ...? If so, this might work (for a (N,M,K) shaped sm)
np.sum(np.array([sm-np.tile(sm[i,j,:], (N,M,1)) for i in xrange(N) for j in xrange(M)]),axis=0)
z = np.array([np.tile(sm[i,j,:], (N,M,1)) for i in xrange(N) for j in xrange(M)]),axis=0)
np.sum(sm - z, axis=0) # let numpy broadcast sm
Actually I don't even need the tile. Let broadcasting do the work:
np.sum(np.array([sm-sm[i,j,:] for i in xrange(N) for j in xrange(M)]),axis=0)
I can get rid of the loops with repeat.
sm1 = sm.reshape(N*M,L) # combine 1st 2 dim to simplify repeat
z1 = np.repeat(sm1, N*M, axis=0).reshape(N*M,N*M,L)
x1 = np.sum(sm1 - z1, axis=0).reshape(N,M,L)
I can also apply broadcasting to the last case
x4 = np.sum(sm1-sm1[:,None,:], 0).reshape(N,M,L)
# = np.sum(sm1[None,:,:]-sm1[:,None,:], 0).reshape(N,M,L)
With sm I have to expand (and sum) 2 dimensions:
x5 = np.sum(np.sum(sm[None,:,None,:,:]-sm[:,None,:,None,:],0),1)
len(sm[0]) and len(sm[1]) are not the sizes of the first and second dimensions of sm. They are the lengths of the first and second row of sm, and should both return the same value. You probably want to replace them with sm.shape[0] and sm.shape[1], which are equivalent to your Matlab code, although I am not sure that it will work as you expect it to.

Resources