How to compute integral of an array - arrays

I am wondering if there is a way to perform integral on arrays in R.
I have arrays S and R. I want to integrate them over pressure level (P) from 1000 to 850. How can I do this?
.
The S, R and P data are
S<-structure(c(0.0011979939772106, 0.0011979939772106, 0.0011979939772106,
0.00122851820731487, 0.00122654890214685, 0.00122457959697883,
0.00124164690843498, 0.00123705186304294, 0.0012324568176509,
0.00133617355649982, 0.00133617355649982, 0.00133617355649982,
0.00138048292278021, 0.00137752896502818, 0.00137457500727616,
0.00140575567243643, 0.00139951953940438, 0.00139328340637232,
0.00139820666929237, 0.00139820666929237, 0.00139820666929237,
0.00151308280409338, 0.00150192340814128, 0.00149076401218919,
0.00155575108273376, 0.00154426346925366, 0.00153277585577356
), .Dim = c(3L, 3L, 3L))
R<-structure(c(-15.1752538162522, -15.1929331135921, -15.2092524649828,
-16.2142525214608, -16.2400914944961, -16.2604906837345, -17.2355719293295,
-17.2641307942633, -17.2858899294509, -13.3842050011216, -13.4059641363092,
-13.4250033795984, -14.3266475439352, -14.3361671655798, -14.3402470034274,
-15.3466070058547, -15.3398072761085, -15.3262078166163, -10.7132711568418,
-10.7350302920294, -10.7554294812678, -11.8379464568517, -11.8066677000195,
-11.7726690512888, -13.8003484615847, -13.7187517046312, -13.6317151638807
), .Dim = c(3L, 3L, 3L))
P<-c(1000,950,900,850)
I tried the following and could not figure out how I can perform integral for an array.
f <- function(x) {x}
inr <- integrate(f,1000,850) #where f would be a function.

I am not entirely sure what you are asking, but there are two common answers when dealing with arrays and integration. The first one is a vectorization issue which can be dealt with by doing the following:
IntFunc <- function(x,y) {
sum(x-y)
}
IntFunc(1:5,c(0,0))
Warning message:
In x - y :
longer object length is not a multiple of shorter object length
integrate(Vectorize(IntFunc,vectorize.args = 'x'), upper = 1000, lower = 850, y = R)
3803862 with absolute error < 4.2e-08
This question has been answered elsewhere on StackOverflow:
How to pass vector to integrate function
R, Integrate at each point of array

Related

Indexing elements in rows of multidimensional array at indexes given by another array

I have a question which should have a simple solution but I have not found a nice way to deal with it with the standard methods of indexing of numpy.
Suppose I have an array $A$ with a general shape (a1,a2,...,an, b). Then I have a second array of indexes I of shape (a1,a2,..., an) whose entries are integer number in 0,..., b-1. What I would like to do is to use I as the indexes at which A is computed, returning an array F of shape (a1,a2,...,an) such that
F[i1,i2,..., in] = A[i1,i2,..., in, I[i1,i2,..., in]]
For n = 2, a simple solution is the following
F = A[np.arange(n), I]
where n = A.shape[0]. But for the general case I have not found a general and simple solution. What would you suggest?
A way to generalise the n=2 case is to employ the general relation holding for the advanced indexing
result[i_1, ..., i_M] == x[ind_1[i_1, ..., i_M], ind_2[i_1, ..., i_M],
..., ind_N[i_1, ..., i_M]]
as given in Indexing by numpy.
To use this for n=3 for instance, we need
a0,a1,a2 = A.shape
F = A[np.reshape(np.arange(a0), (a0,1)), np.reshape(np.arange(a1), (1, a1)), I]
For a general n this can be written as follows
sha = A.shape[:-1]
indones = [-1] + [1]*(len(sha)-1)
ind = [np.reshape(np.arange(s), np.roll(indones, i)) for i,s in enumerate(sha)]
F = A[tuple(ind + [I])]
Not sure if this is the most efficient (and most pythonic) way, but it works.

Scipy Curve Fit: "Result from function call is not a proper array of floats."

I am trying to fit a 2D Gaussian with an offset to a 2D array. The code is based on this thread here (which was written for Python2 while I am using Python3, therefore some changes were necessary to make it run somewhat):
import numpy as np
import scipy.optimize as opt
n_pixels = 2400
def twoD_Gaussian(data_list, amplitude, xo, yo, sigma_x, sigma_y, offset):
x = data_list[0]
y = data_list[1]
theta = 0 # don't care about theta for the moment but want to leave the option in
a = (np.cos(theta)**2)/(2*sigma_x**2) + (np.sin(theta)**2)/(2*sigma_y**2)
b = -(np.sin(2*theta))/(4*sigma_x**2) + (np.sin(2*theta))/(4*sigma_y**2)
c = (np.sin(theta)**2)/(2*sigma_x**2) + (np.cos(theta)**2)/(2*sigma_y**2)
g = offset + amplitude*np.exp( - (a*((x-xo)**2) + 2*b*(x-xo)*(y-yo) + c*((y-yo)**2)))
return g
x = np.linspace(1, n_pixels, n_pixels) #starting with 1 because proper data is from a fits file
y = np.linspace(1, n_pixels, n_pixels)
x, y = np.meshgrid(x,y)
amp = -3
x0, y0 = n_pixels/2, n_pixels/2
sigma_x, sigma_y = 100, 100
offset = -1
initial_guess = np.asarray([amp, x0, y0, sigma_x, sigma_y, offset])
data_array = np.asarray([x, y])
testmap = twoD_Gaussian(data_array, initial_guess[0], initial_guess[1], initial_guess[2], initial_guess[3], initial_guess[4], initial_guess[5])
popt, pcov = opt.curve_fit(twoD_Gaussian, data_array, testmap, p0=initial_guess)
However, I first get a value error:
ValueError: object too deep for desired array
Which the traceback then traces to:
error: Result from function call is not a proper array of floats.
From what I understood in other threads with this other, this has to do with some part of the argument not being properly defined as an array, but e.g. as a symbolic object, which I do not understand since the output testmap (which is working as expected) is actually a numpy array, and all input into curve_fit is also either a numpy array or the function itself. What is the exact issue and how can I solve it?
edit: the full error if I try to run it from console is:
ValueError: object too deep for desired array
Traceback (most recent call last):
File "fit-2dgauss.py", line 41, in <module>
popt, pcov = opt.curve_fit(twoD_Gaussian, data_array, test, p0=initial_guess)
File "/users/drhiem/.local/lib/python3.6/site-packages/scipy/optimize/minpack.py", line 784, in curve_fit
res = leastsq(func, p0, Dfun=jac, full_output=1, **kwargs)
File "/users/drhiem/.local/lib/python3.6/site-packages/scipy/optimize/minpack.py", line 423, in leastsq
gtol, maxfev, epsfcn, factor, diag)
minpack.error: Result from function call is not a proper array of floats.
I just noticed that instead of "error", it's now "minpack.error". I ran this in an ipython console environment beforehand for testing purposes, so maybe that difference is down to that, not sure how much this difference matters.
data_array is (2, 2400, 2400) float64 (from added print)
testmap is (2400, 2400) float64 (again a diagnostic print)
curve_fit docs talk about M length or (k,M) arrays.
You are providing (2,N,N) and (N,N) shape arrays.
Lets try flattening the N,N dimensions:
In the objective function:
def twoD_Gaussian(data_list, amplitude, xo, yo, sigma_x, sigma_y, offset):
x = data_list[0]
y = data_list[1]
x = x.reshape(2400,2400)
y = y.reshape(2400,2400)
theta = 0 # don't care about theta for the moment but want to leave the option in
a = (np.cos(theta)**2)/(2*sigma_x**2) + (np.sin(theta)**2)/(2*sigma_y**2)
b = -(np.sin(2*theta))/(4*sigma_x**2) + (np.sin(2*theta))/(4*sigma_y**2)
c = (np.sin(theta)**2)/(2*sigma_x**2) + (np.cos(theta)**2)/(2*sigma_y**2)
g = offset + amplitude*np.exp( - (a*((x-xo)**2) + 2*b*(x-xo)*(y-yo) + c*((y-yo)**2)))
return g.ravel()
and
and in the calls:
testmap = twoD_Gaussian(data_array.reshape(2,-1), initial_guess[0], initial_guess[1], initial_guess[2], initial_guess[3], initial_guess[4], initial_guess[5])
# shape (5760000,) float64
print(type(testmap),testmap.shape, testmap.dtype)
popt, pcov = opt.curve_fit(twoD_Gaussian, data_array.reshape(2,-1), testmap, p0=initial_guess)
And it runs:
1624:~/mypy$ python3 stack65587542.py
(2, 2400, 2400) float64
<class 'numpy.ndarray'> (5760000,) float64
popt and pcov:
[-3.0e+00 1.2e+03 1.2e+03 1.0e+02 1.0e+02 -1.0e+00]
[[ 0. -0. -0. 0. 0. -0.]
[-0. 0. -0. -0. -0. -0.]
[-0. -0. 0. -0. -0. -0.]
[ 0. -0. -0. 0. 0. 0.]
[ 0. -0. -0. 0. 0. 0.]
[-0. -0. -0. 0. 0. 0.]]
The popt values are the same as initial_guess as expected with the exact testmap.
So the basic issue is that you did not take the documented specifications seriously. That
ValueError: object too deep for desired array
error message is a bit obscure, though I vaguely recall seeing it before. Sometimes we get errors like this when inputs are ragged arrays and the result arrays is object dtype. But here it's simply a matter of shape.
A past SO with similar problem and fix:
Scipy curve_fit for Two Dimensions Not Working - Object Too Deep?
ValueError When Performing scipy.stats test on Pandas Column Selection by Row
Fitting a 2D Gaussian function using scipy.optimize.curve_fit - ValueError and minpack.error
This is just a subset of SO with the same error message. Other scipy functions produce it. And often the problem is with shapes like (m,1) instead of (N,N). I'd be tempted to close this as a duplicate, but my long answer with debugging details may be instructive.

Fractal dimension algorithms gives results of >2 for time-series

I'm trying to compute Fractal Dimension of very specific time series array.
I've found implementations of Higuchi FD algorithm:
def hFD(a, k_max): #Higuchi FD
L = []
x = []
N = len(a)
for k in range(1,k_max):
Lk = 0
for m in range(0,k):
#we pregenerate all idxs
idxs = np.arange(1,int(np.floor((N-m)/k)),dtype=np.int32)
Lmk = np.sum(np.abs(a[m+idxs*k] - a[m+k*(idxs-1)]))
Lmk = (Lmk*(N - 1)/(((N - m)/ k)* k)) / k
Lk += Lmk
L.append(np.log(Lk/(m+1)))
x.append([np.log(1.0/ k), 1])
(p, r1, r2, s)=np.linalg.lstsq(x, L)
return p[0]
from https://github.com/gilestrolab/pyrem/blob/master/src/pyrem/univariate.py
and Katz FD algorithm:
def katz(data):
n = len(data)-1
L = np.hypot(np.diff(data), 1).sum() # Sum of distances
d = np.hypot(data - data[0], np.arange(len(data))).max() # furthest distance from first point
return np.log10(n) / (np.log10(d/L) + np.log10(n))
from https://github.com/ProjectBrain/brainbits/blob/master/katz.py
I expect results of ~1,5 in both cases however get 2,2 and 4 instead...
hFD(x,4) = 2.23965648024 (k value of here is chosen as an example, however result won't change much in range 4-12 edit: I was able to get result of ~1,9 with k=22, however this still does not make any sense);
katz(x) = 4.03911343057
Which in theory should not be possible for 1D time-series array.
Questions here are: are Higuchi and Katz algorithms not suitable for time-series analysis in general, or am I doing something wrong on my side? Also are there any other python libraries with already implemented and error-less algorithms to verify my results?
My array of interest (each element represents point in time t, t+1, t+2,..., t+N)
x = np.array([373.4413096546802, 418.58026161917803,
395.7387698762124, 416.21163042783206,
407.9812265426947, 430.2355284504048,
389.66095393296763, 442.18969320408166,
383.7448638776275, 452.8931822090381,
413.5696828065546, 434.45932712853585
,429.95212301648996, 436.67612861616215,
431.10235365546964, 418.86935850068545,
410.84902747247423, 444.4188867775925,
397.1576881118471, 451.6129904245434,
440.9181246439599, 438.9857353268666,
437.1800408012741, 460.6251405281339,
404.3208481355302, 500.0432305427639,
380.49579242696177, 467.72953450552893,
333.11328535523967, 444.1171938340972,
303.3024198243042, 453.16332062153276,
356.9697406524534, 520.0720647379901,
402.7949987727925, 536.0721418821788,
448.21609036718445, 521.9137447208354,
470.5822486372967, 534.0572029633416,
480.03741443274765, 549.2104258193126,
460.0853321729541, 561.2705350421926,
444.52689144575794, 560.0835589548401,
462.2154563472787, 559.7166600213686,
453.42374550322353, 559.0591804941763,
421.4899935529862, 540.7970410737004,
454.34364779193913, 531.6018122709779,
437.1545739076901, 522.4262260216169,
444.6017030695873, 533.3991716674865,
458.3492761150962, 513.1735160522104])
The array you are trying to estimate hDF is too short. You need to get longer sample or oversample the current one to have at least 128 points for hDF and more then 4000 points for Katz
import scipy.signal as signal
...
x_res=signal.resample(x,128)
hfd(x_res,4) will be 1.74383694265

numpy binned mean, conserving extra axes

It seems I am stuck on the following problem with numpy.
I have an array X with shape: X.shape = (nexp, ntime, ndim, npart)
I need to compute binned statistics on this array along npart dimension, according to the values in binvals (and some bins), but keeping all the other dimensions there, because I have to use the binned statistic to remove some bias in the original array X. Binning values have shape binvals.shape = (nexp, ntime, npart).
A complete, minimal example, to explain what I am trying to do. Note that, in reality, I am working on large arrays and with several hunderds of bins (so this implementation takes forever):
import numpy as np
np.random.seed(12345)
X = np.random.randn(24).reshape(1,2,3,4)
binvals = np.random.randn(8).reshape(1,2,4)
bins = [-np.inf, 0, np.inf]
nexp, ntime, ndim, npart = X.shape
cleanX = np.zeros_like(X)
for ne in range(nexp):
for nt in range(ntime):
indices = np.digitize(binvals[ne, nt, :], bins)
for nd in range(ndim):
for nb in range(1, len(bins)):
inds = indices==nb
cleanX[ne, nt, nd, inds] = X[ne, nt, nd, inds] - \
np.mean(X[ne, nt, nd, inds], axis = -1)
Looking at the results of this may make it clearer?
In [8]: X
Out[8]:
array([[[[-0.20470766, 0.47894334, -0.51943872, -0.5557303 ],
[ 1.96578057, 1.39340583, 0.09290788, 0.28174615],
[ 0.76902257, 1.24643474, 1.00718936, -1.29622111]],
[[ 0.27499163, 0.22891288, 1.35291684, 0.88642934],
[-2.00163731, -0.37184254, 1.66902531, -0.43856974],
[-0.53974145, 0.47698501, 3.24894392, -1.02122752]]]])
In [10]: cleanX
Out[10]:
array([[[[ 0. , 0.67768523, -0.32069682, -0.35698841],
[ 0. , 0.80405255, -0.49644541, -0.30760713],
[ 0. , 0.92730041, 0.68805503, -1.61535544]],
[[ 0.02303938, -0.02303938, 0.23324375, -0.23324375],
[-0.81489739, 0.81489739, 1.05379752, -1.05379752],
[-0.50836323, 0.50836323, 2.13508572, -2.13508572]]]])
In [12]: binvals
Out[12]:
array([[[ -5.77087303e-01, 1.24121276e-01, 3.02613562e-01,
5.23772068e-01],
[ 9.40277775e-04, 1.34380979e+00, -7.13543985e-01,
-8.31153539e-01]]])
Is there a vectorized solution? I thought of using scipy.stats.binned_statistic, but I seem to be unable to understand how to use it for this aim. Thanks!
import numpy as np
np.random.seed(100)
nexp = 3
ntime = 4
ndim = 5
npart = 100
nbins = 4
binvals = np.random.rand(nexp, ntime, npart)
X = np.random.rand(nexp, ntime, ndim, npart)
bins = np.linspace(0, 1, nbins + 1)
d = np.digitize(binvals, bins)[:, :, np.newaxis, :]
r = np.arange(1, len(bins)).reshape((-1, 1, 1, 1, 1))
m = d[np.newaxis, ...] == r
counts = np.sum(m, axis=-1, keepdims=True).clip(min=1)
means = np.sum(X[np.newaxis, ...] * m, axis=-1, keepdims=True) / counts
cleanX = X - np.choose(d - 1, means)
Ok, I think I got it, mainly based on the answer by #jdehesa.
clean2 = np.zeros_like(X)
d = np.digitize(binvals, bins)
for i in range(1, len(bins)):
m = d == i
minds = np.where(m)
sl = [*minds[:2], slice(None), minds[2]]
msum = m.sum(axis=-1)
clean2[sl] = (X - \
(np.sum(X * m[...,np.newaxis,:], axis=-1) /
msum[..., np.newaxis])[..., np.newaxis])[sl]
Which gives the same results as my original code.
On the small arrays I have in the example here, this solution is approximately three times as fast as the original code. I expect it to be way faster on larger arrays.
Update:
Indeed it's faster on larger arrays (didn't do any formal test), but despite this, it just reaches the level of acceptable in terms of performance... any further suggestion on extra vectoriztaions would be very welcome.

Nested array slicing

Let's say I have an array of vectors:
""" simple line equation """
function getline(a::Array{Float64,1},b::Array{Float64,1})
line = Vector[]
for i=0:0.1:1
vector = (1-i)a+(i*b)
push!(line, vector)
end
return line
end
This function returns an array of vectors containing x-y positions
Vector[11]
> Float64[2]
> Float64[2]
> Float64[2]
> Float64[2]
.
.
.
Now I want to seprate all x and y coordinates of these vectors to plot them with plotyjs.
I have already tested some approaches with no success!
What is a correct way in Julia to achive this?
You can broadcast getindex:
xs = getindex.(vv, 1)
ys = getindex.(vv, 2)
Edit 3:
Alternatively, use list comprehensions:
xs = [v[1] for v in vv]
ys = [v[2] for v in vv]
Edit:
For performance reasons, you should use StaticArrays to represent 2D points. E.g.:
getline(a,b) = [(1-i)a+(i*b) for i=0:0.1:1]
p1 = SVector(1.,2.)
p2 = SVector(3.,4.)
vv = getline(p1,p2)
Broadcasting getindex and list comprehensions will still work, but you can also reinterpret the vector as a 2×11 matrix:
to_matrix{T<:SVector}(a::Vector{T}) = reinterpret(eltype(T), a, (size(T,1), length(a)))
m = to_matrix(vv)
Note that this does not copy the data. You can simply use m directly or define, e.g.,
xs = #view m[1,:]
ys = #view m[2,:]
Edit 2:
Btw., not restricting the type of the arguments of the getline function has many advantages and is preferred in general. The version above will work for any type that implements multiplication with a scalar and addition, e.g., a possible implementation of immutable Point ... end (making it fully generic will require a bit more work, though).

Resources