Numpy complaining about ambigoous array: ValueError: The truth value of - arrays

I have a minimal code in Python 3, which uses numpy and the function apply_along_axis. I cannot understand the reason I am having this error:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
Providing a direct formula inside the lambda is working. As soon as I use another function, I am getting this error. Am I supposed to return something else?
The minimal code:
import numpy as np
def logn(x, b):
return np.log(x)/np.log(b)
def h(x, b):
if x == 0:
return 0
else:
return -x*logn(x, b)
p = np.array([0.00000000e+00, 9.99997956e-01, 2.04440466e-06])
print(np.apply_along_axis(lambda _e: h(_e, 3), -1, p))

Look at what apply_along_axis passes to your function:
In [99]: def foo(x):
...: print(x)
...: return x
...:
In [100]: np.apply_along_axis(foo, -1, p)
[0.00000000e+00 9.99997956e-01 2.04440466e-06]
Out[100]: array([0.00000000e+00, 9.99997956e-01, 2.04440466e-06])
In the case of a 1d array, it passes the whole array at once. It does not iterate on that dimension. That's whole purpose of apply_along_axis - to pass 1d arrays to your function.
Judging from other SO apply_along_axis is not very useful, and often gives problems. It is not faster than a more explicit iteration. For 3d (or higher) it can make the iteration (over the 'other' two axes) simpler (but again not faster).
For the 1d p, this is simpler:
In [102]: [h(_e,3) for _e in p]
Out[102]: [0, 1.8605270777946112e-06, 2.4378506521338855e-05]
A non-iterative approach is to use a boolean mask to select which p are used in the calculation. That way you don't have to use a scalar if expression:
In [106]: mask = p!=0
In [107]: mask
Out[107]: array([False, True, True])
In [108]: p1 = p[mask]
In [109]: res = np.zeros(p.shape)
In [110]: res[mask] = -p1*logn(p1,3)
In [111]: res
Out[111]: array([0.00000000e+00, 1.86052708e-06, 2.43785065e-05])
ufunc like np.log take a where parameter, which can be used to bypass bad input values:
In [114]: -p * np.log(p, where=(p!=0), out=np.zeros(p.shape))/np.log(3)
Out[114]: array([-0.00000000e+00, 1.86052708e-06, 2.43785065e-05])

Related

Scipy Curve Fit: "Result from function call is not a proper array of floats."

I am trying to fit a 2D Gaussian with an offset to a 2D array. The code is based on this thread here (which was written for Python2 while I am using Python3, therefore some changes were necessary to make it run somewhat):
import numpy as np
import scipy.optimize as opt
n_pixels = 2400
def twoD_Gaussian(data_list, amplitude, xo, yo, sigma_x, sigma_y, offset):
x = data_list[0]
y = data_list[1]
theta = 0 # don't care about theta for the moment but want to leave the option in
a = (np.cos(theta)**2)/(2*sigma_x**2) + (np.sin(theta)**2)/(2*sigma_y**2)
b = -(np.sin(2*theta))/(4*sigma_x**2) + (np.sin(2*theta))/(4*sigma_y**2)
c = (np.sin(theta)**2)/(2*sigma_x**2) + (np.cos(theta)**2)/(2*sigma_y**2)
g = offset + amplitude*np.exp( - (a*((x-xo)**2) + 2*b*(x-xo)*(y-yo) + c*((y-yo)**2)))
return g
x = np.linspace(1, n_pixels, n_pixels) #starting with 1 because proper data is from a fits file
y = np.linspace(1, n_pixels, n_pixels)
x, y = np.meshgrid(x,y)
amp = -3
x0, y0 = n_pixels/2, n_pixels/2
sigma_x, sigma_y = 100, 100
offset = -1
initial_guess = np.asarray([amp, x0, y0, sigma_x, sigma_y, offset])
data_array = np.asarray([x, y])
testmap = twoD_Gaussian(data_array, initial_guess[0], initial_guess[1], initial_guess[2], initial_guess[3], initial_guess[4], initial_guess[5])
popt, pcov = opt.curve_fit(twoD_Gaussian, data_array, testmap, p0=initial_guess)
However, I first get a value error:
ValueError: object too deep for desired array
Which the traceback then traces to:
error: Result from function call is not a proper array of floats.
From what I understood in other threads with this other, this has to do with some part of the argument not being properly defined as an array, but e.g. as a symbolic object, which I do not understand since the output testmap (which is working as expected) is actually a numpy array, and all input into curve_fit is also either a numpy array or the function itself. What is the exact issue and how can I solve it?
edit: the full error if I try to run it from console is:
ValueError: object too deep for desired array
Traceback (most recent call last):
File "fit-2dgauss.py", line 41, in <module>
popt, pcov = opt.curve_fit(twoD_Gaussian, data_array, test, p0=initial_guess)
File "/users/drhiem/.local/lib/python3.6/site-packages/scipy/optimize/minpack.py", line 784, in curve_fit
res = leastsq(func, p0, Dfun=jac, full_output=1, **kwargs)
File "/users/drhiem/.local/lib/python3.6/site-packages/scipy/optimize/minpack.py", line 423, in leastsq
gtol, maxfev, epsfcn, factor, diag)
minpack.error: Result from function call is not a proper array of floats.
I just noticed that instead of "error", it's now "minpack.error". I ran this in an ipython console environment beforehand for testing purposes, so maybe that difference is down to that, not sure how much this difference matters.
data_array is (2, 2400, 2400) float64 (from added print)
testmap is (2400, 2400) float64 (again a diagnostic print)
curve_fit docs talk about M length or (k,M) arrays.
You are providing (2,N,N) and (N,N) shape arrays.
Lets try flattening the N,N dimensions:
In the objective function:
def twoD_Gaussian(data_list, amplitude, xo, yo, sigma_x, sigma_y, offset):
x = data_list[0]
y = data_list[1]
x = x.reshape(2400,2400)
y = y.reshape(2400,2400)
theta = 0 # don't care about theta for the moment but want to leave the option in
a = (np.cos(theta)**2)/(2*sigma_x**2) + (np.sin(theta)**2)/(2*sigma_y**2)
b = -(np.sin(2*theta))/(4*sigma_x**2) + (np.sin(2*theta))/(4*sigma_y**2)
c = (np.sin(theta)**2)/(2*sigma_x**2) + (np.cos(theta)**2)/(2*sigma_y**2)
g = offset + amplitude*np.exp( - (a*((x-xo)**2) + 2*b*(x-xo)*(y-yo) + c*((y-yo)**2)))
return g.ravel()
and
and in the calls:
testmap = twoD_Gaussian(data_array.reshape(2,-1), initial_guess[0], initial_guess[1], initial_guess[2], initial_guess[3], initial_guess[4], initial_guess[5])
# shape (5760000,) float64
print(type(testmap),testmap.shape, testmap.dtype)
popt, pcov = opt.curve_fit(twoD_Gaussian, data_array.reshape(2,-1), testmap, p0=initial_guess)
And it runs:
1624:~/mypy$ python3 stack65587542.py
(2, 2400, 2400) float64
<class 'numpy.ndarray'> (5760000,) float64
popt and pcov:
[-3.0e+00 1.2e+03 1.2e+03 1.0e+02 1.0e+02 -1.0e+00]
[[ 0. -0. -0. 0. 0. -0.]
[-0. 0. -0. -0. -0. -0.]
[-0. -0. 0. -0. -0. -0.]
[ 0. -0. -0. 0. 0. 0.]
[ 0. -0. -0. 0. 0. 0.]
[-0. -0. -0. 0. 0. 0.]]
The popt values are the same as initial_guess as expected with the exact testmap.
So the basic issue is that you did not take the documented specifications seriously. That
ValueError: object too deep for desired array
error message is a bit obscure, though I vaguely recall seeing it before. Sometimes we get errors like this when inputs are ragged arrays and the result arrays is object dtype. But here it's simply a matter of shape.
A past SO with similar problem and fix:
Scipy curve_fit for Two Dimensions Not Working - Object Too Deep?
ValueError When Performing scipy.stats test on Pandas Column Selection by Row
Fitting a 2D Gaussian function using scipy.optimize.curve_fit - ValueError and minpack.error
This is just a subset of SO with the same error message. Other scipy functions produce it. And often the problem is with shapes like (m,1) instead of (N,N). I'd be tempted to close this as a duplicate, but my long answer with debugging details may be instructive.

How to get a sub-shape of an array in Python?

Not sure the title is correct, but I have an array with shape (84,84,3) and I need to get subset of this array with shape (84,84), excluding that third dimension.
How can I accomplish this with Python?
your_array[:,:,0]
This is called slicing. This particular example gets the first 'layer' of the array. This assumes your subshape is a single layer.
If you are using numpy arrays, using slices would be a standard way of doing it:
import numpy as np
n = 3 # or any other positive integer
a = np.empty((84, 84, n))
i = 0 # i in [0, n]
b = a[:, :, i]
print(b.shape)
I recommend you have a look at this.

Nested array slicing

Let's say I have an array of vectors:
""" simple line equation """
function getline(a::Array{Float64,1},b::Array{Float64,1})
line = Vector[]
for i=0:0.1:1
vector = (1-i)a+(i*b)
push!(line, vector)
end
return line
end
This function returns an array of vectors containing x-y positions
Vector[11]
> Float64[2]
> Float64[2]
> Float64[2]
> Float64[2]
.
.
.
Now I want to seprate all x and y coordinates of these vectors to plot them with plotyjs.
I have already tested some approaches with no success!
What is a correct way in Julia to achive this?
You can broadcast getindex:
xs = getindex.(vv, 1)
ys = getindex.(vv, 2)
Edit 3:
Alternatively, use list comprehensions:
xs = [v[1] for v in vv]
ys = [v[2] for v in vv]
Edit:
For performance reasons, you should use StaticArrays to represent 2D points. E.g.:
getline(a,b) = [(1-i)a+(i*b) for i=0:0.1:1]
p1 = SVector(1.,2.)
p2 = SVector(3.,4.)
vv = getline(p1,p2)
Broadcasting getindex and list comprehensions will still work, but you can also reinterpret the vector as a 2×11 matrix:
to_matrix{T<:SVector}(a::Vector{T}) = reinterpret(eltype(T), a, (size(T,1), length(a)))
m = to_matrix(vv)
Note that this does not copy the data. You can simply use m directly or define, e.g.,
xs = #view m[1,:]
ys = #view m[2,:]
Edit 2:
Btw., not restricting the type of the arguments of the getline function has many advantages and is preferred in general. The version above will work for any type that implements multiplication with a scalar and addition, e.g., a possible implementation of immutable Point ... end (making it fully generic will require a bit more work, though).

Looping through slices of Theano tensor

I have two 2D Theano tensors, call them x_1 and x_2, and suppose for the sake of example, both x_1 and x_2 have shape (1, 50). Now, to compute their mean squared error, I simply run:
T.sqr(x_1 - x_2).mean(axis = -1).
However, what I wanted to do was construct a new tensor that consists of their mean squared error in chunks of 10. In other words, since I'm more familiar with NumPy, what I had in mind was to create the following tensor M in Theano:
M = [theano.tensor.sqr(x_1[:, i:i+10] - x_2[:, i:i+10]).mean(axis = -1) for i in xrange(0, 50, 10)]
Now, since Theano doesn't have for loops, but instead uses scan (which map is a special case of), I thought I would try the following:
sequence = T.arange(0, 50, 10)
M = theano.map(lambda i: theano.tensor.sqr(x_1[:, i:i+10] - x_2[:, i:i+10]).mean(axis = -1), sequence)
However, this does not seem to work, as I get the error:
only integers, slices (:), ellipsis (...), numpy.newaxis (None) and integer or boolean arrays are valid indices
Is there a way to loop through the slices using theano.scan (or map)? Thanks in advance, as I'm new to Theano!
Similar to what can be done in numpy, a solution would be to reshape your (1, 50) tensor to a (1, 10, 5) tensor (or even a (10, 5) tensor), and then to compute the mean along the second axis.
To illustrate this with numpy, suppose I want to compute means by slices of 2
x = np.array([0, 2, 0, 4, 0, 6])
x = x.reshape([3, 2])
np.mean(x, axis=1)
outputs
array([ 1., 2., 3.])

Despite many examples online, I cannot get my MATLAB repmat equivalent working in python

I am trying to do some numpy matrix math because I need to replicate the repmat function from MATLAB. I know there are a thousand examples online, but I cannot seem to get any of them working.
The following is the code I am trying to run:
def getDMap(image, mapSize):
newSize = (float(mapSize[0]) / float(image.shape[1]), float(mapSize[1]) / float(image.shape[0]))
sm = cv.resize(image, (0,0), fx=newSize[0], fy=newSize[1])
for j in range(0, sm.shape[1]):
for i in range(0, sm.shape[0]):
dmap = sm[:,:,:]-np.array([np.tile(sm[j,i,:], (len(sm[0]), len(sm[1]))) for k in xrange(len(sm[2]))])
return dmap
The function getDMap(image, mapSize) expects an OpenCV2 HSV image as its image argument, which is a numpy array with 3 dimensions: [:,:,:]. It also expects a tuple with 2 elements as its imSize argument, of course making sure the function passing the arguments takes into account that in numpy arrays the rows and colums are swapped (not: x, y, but: y, x).
newSize then contains a tuple containing fracions that are used to resize the input image to a specific scale, and sm becomes a resized version of the input image. This all works fine.
This is my goal:
The following line:
np.array([np.tile(sm[i,j,:], (len(sm[0]), len(sm[1]))) for k in xrange(len(sm[2]))]),
should function equivalent to the MATLAB expression:
repmat(sm(j,i,:),[size(sm,1) size(sm,2)]),
This is my problem:
Testing this, an OpenCV2 image with dimensions 800x479x3 is passed as the image argument, and (64, 48) (a tuple) is passed as the imSize argument.
However when testing this, I get the following ValueError:
dmap = sm[:,:,:]-np.array([np.tile(sm[i,j,:], (len(sm[0]),
len(sm[1]))) for k in xrange(len(sm[2]))])
ValueError: operands could not be broadcast together with
shapes (48,64,3) (64,64,192)
So it seems that the array dimensions do not match and numpy has a problem with that. But my question is what? And how do I get this working?
These 2 calculations match:
octave:26> sm=reshape(1:12,2,2,3)
octave:27> x=repmat(sm(1,2,:),[size(sm,1) size(sm,2)])
octave:28> x(:,:,2)
7 7
7 7
In [45]: sm=np.arange(1,13).reshape(2,2,3,order='F')
In [46]: x=np.tile(sm[0,1,:],[sm.shape[0],sm.shape[1],1])
In [47]: x[:,:,1]
Out[47]:
array([[7, 7],
[7, 7]])
This runs:
sm[:,:,:]-np.array([np.tile(sm[0,1,:], (2,2,1)) for k in xrange(3)])
But it produces a (3,2,2,3) array, with replication on the 1st dimension. I don't think you want that k loop.
What's the intent with?
for i in ...:
for j in ...:
data = ...
You'll only get results from the last iteration. Did you want data += ...? If so, this might work (for a (N,M,K) shaped sm)
np.sum(np.array([sm-np.tile(sm[i,j,:], (N,M,1)) for i in xrange(N) for j in xrange(M)]),axis=0)
z = np.array([np.tile(sm[i,j,:], (N,M,1)) for i in xrange(N) for j in xrange(M)]),axis=0)
np.sum(sm - z, axis=0) # let numpy broadcast sm
Actually I don't even need the tile. Let broadcasting do the work:
np.sum(np.array([sm-sm[i,j,:] for i in xrange(N) for j in xrange(M)]),axis=0)
I can get rid of the loops with repeat.
sm1 = sm.reshape(N*M,L) # combine 1st 2 dim to simplify repeat
z1 = np.repeat(sm1, N*M, axis=0).reshape(N*M,N*M,L)
x1 = np.sum(sm1 - z1, axis=0).reshape(N,M,L)
I can also apply broadcasting to the last case
x4 = np.sum(sm1-sm1[:,None,:], 0).reshape(N,M,L)
# = np.sum(sm1[None,:,:]-sm1[:,None,:], 0).reshape(N,M,L)
With sm I have to expand (and sum) 2 dimensions:
x5 = np.sum(np.sum(sm[None,:,None,:,:]-sm[:,None,:,None,:],0),1)
len(sm[0]) and len(sm[1]) are not the sizes of the first and second dimensions of sm. They are the lengths of the first and second row of sm, and should both return the same value. You probably want to replace them with sm.shape[0] and sm.shape[1], which are equivalent to your Matlab code, although I am not sure that it will work as you expect it to.

Resources