Array subsetting in Julia - arrays

With the Julia Language, I defined a function to sample points uniformly inside the sphere of radius 3.14 using rejection sampling as follows:
function spherical_sample(N::Int64)
# generate N points uniformly distributed inside sphere
# using rejection sampling:
points = pi*(2*rand(5*N,3).-1.0)
ind = sum(points.^2,dims=2) .<= pi^2
## ideally I wouldn't have to do this:
ind_ = dropdims(ind,dims=2)
return points[ind_,:][1:N,:]
end
I found a hack for subsetting arrays:
ind = sum(points.^2,dims=2) .<= pi^2
## ideally I wouldn't have to do this:
ind_ = dropdims(ind,dims=2)
But, in principle array indexing should be a one-liner. How could I do this better in Julia?

The problem is that you are creating a 2-dimensional index vector. You can avoid it by using eachrow:
ind = sum.(eachrow(points.^2)) .<= pi^2
So that your full answer would be:
function spherical_sample(N::Int64)
points = pi*(2*rand(5*N,3).-1.0)
ind = sum.(eachrow(points.^2)) .<= pi^2
return points[ind,:][1:N,:]
end

Here is a one-liner:
points[(sum(points.^2,dims=2) .<= pi^2)[:],:][1:N, :]
Note that [:] is dropping a dimension so the BitArray can be used for indexing.

This does not answer your question directly (as you already got two suggestions), but I rather thought to hint how you could implement the whole procedure differently if you want it to be efficient.
The first point is to avoid generating 5*N rows of data - the problem is that it is very likely that it will be not enough to generate N valid samples. The point is that the probability of a valid sample in your model is ~50%, so it is possible that there will not be enough points to choose from and [1:N, :] selection will throw an error.
Below is the code I would use that avoids this problem:
function spherical_sample(N::Integer) # no need to require Int64 only here
points = 2 .* pi .* rand(N, 3) .- 1.0 # note that all operations are vectorized to avoid excessive allocations
while N > 0 # we will run the code until we have N valid rows
v = #view points[N, :] # use view to avoid allocating
if sum(x -> x^2, v) <= pi^2 # sum accepts a transformation function as a first argument
N -= 1 # row is valid - move to the previous one
else
rand!(v) # row is invalid - resample it in place
#. v = 2 * pi * v - 1.0 # again - do the computation in place via broadcasting
end
end
return points
end

This one is pretty fast, and uses StaticArrays. You can probably also implement something similar with ordinary tuples:
using StaticArrays
function sphsample(N)
T = SVector{3, Float64}
v = Vector{T}(undef, N)
n = 1
while n <= N
p = rand(T) .- 0.5
#inbounds v[n] = p .* 2π
n += (sum(abs2, p) <= 0.25)
end
return v
end
On my laptop it is ~9x faster than the solution with views.

Related

MethodError: no method matching -(::Int64, ::Array{Int64,1})

I tried playing around with this example in the Julia documentation. My attempt was to make the cell split into two parts that have half of the amount of protein each.
using OrdinaryDiffEq
const α = 0.3
function f(du,u,p,t)
for i in 1:length(u)
du[i] = α*u[i]/length(u)
end
end
function condition(u,t,integrator) # Event when event_f(u,t) == 0
1-maximum(u)
end
function affect!(integrator)
u = integrator.u
idxs = findall(x->x>=1-eps(eltype(u)),u)
resize!(integrator,length(u)+length(idxs))
u[idxs] ./ 2
u[end-idxs:end] = 0.5
nothing
end
callback = ContinuousCallback(condition,affect!)
u0 = [0.2]
tspan = (0.0,10.0)
prob = ODEProblem(f,u0,tspan)
sol = solve(prob,Tsit5(),callback=callback)
I get the error: MethodError: no method matching -(::Int64, ::Array{Int64,1}). I know there is a problem with idxs = findall(x->x>=1-eps(eltype(u)),u), and my attempt was to put a dot between the 1 and eps, but that didn't fix it. I am using Julia 1.1.1.
Running your code the stacktrace points to the line
u[end-idxs:end] = 0.5
The issue here is that findall returns an array even when it only finds one element, e.g.
julia> findall(x -> x > 2, [1,2,3])
1-element Array{Int64,1}:
3
and you can't subtract an array from end in your indexing expression.
I don't understand enough about your code to figure out what idxs should be, but if you expect this to only return one element you could either use first(idxs) (or even only(idxs) in Julia 1.4), or replace findall with findfirst, which returns the index as an Integer (rather than an array).

Sorting Eigenvalues/Eigenvectors in Julia 1.0

This question isn't so much a need for a solution but rather to ask if my approach is natural to the Julia language (Julianic?), if not what would be a more natural implementation:
#doc """
function sorteigen!(evals::Array{Number,1},evecs::Array{Number,2})
Sort the eigenvalues and vectors.
"""
function sorteigen!(evals::Array{Number,1},evecs::Array{Number,2})
n=size(evecs)[1];
#Shallow copy and force local scope
local sortedevals = copy(evals);
local sortedevecs = copy(evecs);
#Sort eigenvalue Array{Number,1}
sortedindex = sortperm(evals);
evals[:] = sortedevals[sortedindex];
#Sort eigenvectors
for i=1:n
sortedevecs[:,i] = evecs[:,sortedindex[i]];
end
evecs[:,:] = sortedevecs[:,:]
end
I would make a non-mutating function in this case:
function sorteigen(evals::Vector{T},evecs::Matrix{T}) where {T<:Real}
p = sortperm(evals)
evals[p], evecs[:, p]
end
If you really need to save memory then you can do something like this that operates in-place:
function sorteigen!(evals::Vector{T},evecs::Matrix{T}) where {T<:Real}
p = sortperm(evals)
s = similar(evals)
for i in axes(evecs, 1)
for (j, pv) in enumerate(p)
#inbounds s[pv] = evecs[i, j]
end
for j in eachindex(s)
#inbounds evecs[i, j] = s[j]
end
end
sort!(evals), evecs
end
It will be more memory efficient, but probably slower, because we operate row-wise so SIMD cannot be applied.
Also note that I use Real in the signature of the methods because general Number does not have to have an order defined (in particular complex numbers).

Fractal dimension algorithms gives results of >2 for time-series

I'm trying to compute Fractal Dimension of very specific time series array.
I've found implementations of Higuchi FD algorithm:
def hFD(a, k_max): #Higuchi FD
L = []
x = []
N = len(a)
for k in range(1,k_max):
Lk = 0
for m in range(0,k):
#we pregenerate all idxs
idxs = np.arange(1,int(np.floor((N-m)/k)),dtype=np.int32)
Lmk = np.sum(np.abs(a[m+idxs*k] - a[m+k*(idxs-1)]))
Lmk = (Lmk*(N - 1)/(((N - m)/ k)* k)) / k
Lk += Lmk
L.append(np.log(Lk/(m+1)))
x.append([np.log(1.0/ k), 1])
(p, r1, r2, s)=np.linalg.lstsq(x, L)
return p[0]
from https://github.com/gilestrolab/pyrem/blob/master/src/pyrem/univariate.py
and Katz FD algorithm:
def katz(data):
n = len(data)-1
L = np.hypot(np.diff(data), 1).sum() # Sum of distances
d = np.hypot(data - data[0], np.arange(len(data))).max() # furthest distance from first point
return np.log10(n) / (np.log10(d/L) + np.log10(n))
from https://github.com/ProjectBrain/brainbits/blob/master/katz.py
I expect results of ~1,5 in both cases however get 2,2 and 4 instead...
hFD(x,4) = 2.23965648024 (k value of here is chosen as an example, however result won't change much in range 4-12 edit: I was able to get result of ~1,9 with k=22, however this still does not make any sense);
katz(x) = 4.03911343057
Which in theory should not be possible for 1D time-series array.
Questions here are: are Higuchi and Katz algorithms not suitable for time-series analysis in general, or am I doing something wrong on my side? Also are there any other python libraries with already implemented and error-less algorithms to verify my results?
My array of interest (each element represents point in time t, t+1, t+2,..., t+N)
x = np.array([373.4413096546802, 418.58026161917803,
395.7387698762124, 416.21163042783206,
407.9812265426947, 430.2355284504048,
389.66095393296763, 442.18969320408166,
383.7448638776275, 452.8931822090381,
413.5696828065546, 434.45932712853585
,429.95212301648996, 436.67612861616215,
431.10235365546964, 418.86935850068545,
410.84902747247423, 444.4188867775925,
397.1576881118471, 451.6129904245434,
440.9181246439599, 438.9857353268666,
437.1800408012741, 460.6251405281339,
404.3208481355302, 500.0432305427639,
380.49579242696177, 467.72953450552893,
333.11328535523967, 444.1171938340972,
303.3024198243042, 453.16332062153276,
356.9697406524534, 520.0720647379901,
402.7949987727925, 536.0721418821788,
448.21609036718445, 521.9137447208354,
470.5822486372967, 534.0572029633416,
480.03741443274765, 549.2104258193126,
460.0853321729541, 561.2705350421926,
444.52689144575794, 560.0835589548401,
462.2154563472787, 559.7166600213686,
453.42374550322353, 559.0591804941763,
421.4899935529862, 540.7970410737004,
454.34364779193913, 531.6018122709779,
437.1545739076901, 522.4262260216169,
444.6017030695873, 533.3991716674865,
458.3492761150962, 513.1735160522104])
The array you are trying to estimate hDF is too short. You need to get longer sample or oversample the current one to have at least 128 points for hDF and more then 4000 points for Katz
import scipy.signal as signal
...
x_res=signal.resample(x,128)
hfd(x_res,4) will be 1.74383694265

Nested array slicing

Let's say I have an array of vectors:
""" simple line equation """
function getline(a::Array{Float64,1},b::Array{Float64,1})
line = Vector[]
for i=0:0.1:1
vector = (1-i)a+(i*b)
push!(line, vector)
end
return line
end
This function returns an array of vectors containing x-y positions
Vector[11]
> Float64[2]
> Float64[2]
> Float64[2]
> Float64[2]
.
.
.
Now I want to seprate all x and y coordinates of these vectors to plot them with plotyjs.
I have already tested some approaches with no success!
What is a correct way in Julia to achive this?
You can broadcast getindex:
xs = getindex.(vv, 1)
ys = getindex.(vv, 2)
Edit 3:
Alternatively, use list comprehensions:
xs = [v[1] for v in vv]
ys = [v[2] for v in vv]
Edit:
For performance reasons, you should use StaticArrays to represent 2D points. E.g.:
getline(a,b) = [(1-i)a+(i*b) for i=0:0.1:1]
p1 = SVector(1.,2.)
p2 = SVector(3.,4.)
vv = getline(p1,p2)
Broadcasting getindex and list comprehensions will still work, but you can also reinterpret the vector as a 2×11 matrix:
to_matrix{T<:SVector}(a::Vector{T}) = reinterpret(eltype(T), a, (size(T,1), length(a)))
m = to_matrix(vv)
Note that this does not copy the data. You can simply use m directly or define, e.g.,
xs = #view m[1,:]
ys = #view m[2,:]
Edit 2:
Btw., not restricting the type of the arguments of the getline function has many advantages and is preferred in general. The version above will work for any type that implements multiplication with a scalar and addition, e.g., a possible implementation of immutable Point ... end (making it fully generic will require a bit more work, though).

Despite many examples online, I cannot get my MATLAB repmat equivalent working in python

I am trying to do some numpy matrix math because I need to replicate the repmat function from MATLAB. I know there are a thousand examples online, but I cannot seem to get any of them working.
The following is the code I am trying to run:
def getDMap(image, mapSize):
newSize = (float(mapSize[0]) / float(image.shape[1]), float(mapSize[1]) / float(image.shape[0]))
sm = cv.resize(image, (0,0), fx=newSize[0], fy=newSize[1])
for j in range(0, sm.shape[1]):
for i in range(0, sm.shape[0]):
dmap = sm[:,:,:]-np.array([np.tile(sm[j,i,:], (len(sm[0]), len(sm[1]))) for k in xrange(len(sm[2]))])
return dmap
The function getDMap(image, mapSize) expects an OpenCV2 HSV image as its image argument, which is a numpy array with 3 dimensions: [:,:,:]. It also expects a tuple with 2 elements as its imSize argument, of course making sure the function passing the arguments takes into account that in numpy arrays the rows and colums are swapped (not: x, y, but: y, x).
newSize then contains a tuple containing fracions that are used to resize the input image to a specific scale, and sm becomes a resized version of the input image. This all works fine.
This is my goal:
The following line:
np.array([np.tile(sm[i,j,:], (len(sm[0]), len(sm[1]))) for k in xrange(len(sm[2]))]),
should function equivalent to the MATLAB expression:
repmat(sm(j,i,:),[size(sm,1) size(sm,2)]),
This is my problem:
Testing this, an OpenCV2 image with dimensions 800x479x3 is passed as the image argument, and (64, 48) (a tuple) is passed as the imSize argument.
However when testing this, I get the following ValueError:
dmap = sm[:,:,:]-np.array([np.tile(sm[i,j,:], (len(sm[0]),
len(sm[1]))) for k in xrange(len(sm[2]))])
ValueError: operands could not be broadcast together with
shapes (48,64,3) (64,64,192)
So it seems that the array dimensions do not match and numpy has a problem with that. But my question is what? And how do I get this working?
These 2 calculations match:
octave:26> sm=reshape(1:12,2,2,3)
octave:27> x=repmat(sm(1,2,:),[size(sm,1) size(sm,2)])
octave:28> x(:,:,2)
7 7
7 7
In [45]: sm=np.arange(1,13).reshape(2,2,3,order='F')
In [46]: x=np.tile(sm[0,1,:],[sm.shape[0],sm.shape[1],1])
In [47]: x[:,:,1]
Out[47]:
array([[7, 7],
[7, 7]])
This runs:
sm[:,:,:]-np.array([np.tile(sm[0,1,:], (2,2,1)) for k in xrange(3)])
But it produces a (3,2,2,3) array, with replication on the 1st dimension. I don't think you want that k loop.
What's the intent with?
for i in ...:
for j in ...:
data = ...
You'll only get results from the last iteration. Did you want data += ...? If so, this might work (for a (N,M,K) shaped sm)
np.sum(np.array([sm-np.tile(sm[i,j,:], (N,M,1)) for i in xrange(N) for j in xrange(M)]),axis=0)
z = np.array([np.tile(sm[i,j,:], (N,M,1)) for i in xrange(N) for j in xrange(M)]),axis=0)
np.sum(sm - z, axis=0) # let numpy broadcast sm
Actually I don't even need the tile. Let broadcasting do the work:
np.sum(np.array([sm-sm[i,j,:] for i in xrange(N) for j in xrange(M)]),axis=0)
I can get rid of the loops with repeat.
sm1 = sm.reshape(N*M,L) # combine 1st 2 dim to simplify repeat
z1 = np.repeat(sm1, N*M, axis=0).reshape(N*M,N*M,L)
x1 = np.sum(sm1 - z1, axis=0).reshape(N,M,L)
I can also apply broadcasting to the last case
x4 = np.sum(sm1-sm1[:,None,:], 0).reshape(N,M,L)
# = np.sum(sm1[None,:,:]-sm1[:,None,:], 0).reshape(N,M,L)
With sm I have to expand (and sum) 2 dimensions:
x5 = np.sum(np.sum(sm[None,:,None,:,:]-sm[:,None,:,None,:],0),1)
len(sm[0]) and len(sm[1]) are not the sizes of the first and second dimensions of sm. They are the lengths of the first and second row of sm, and should both return the same value. You probably want to replace them with sm.shape[0] and sm.shape[1], which are equivalent to your Matlab code, although I am not sure that it will work as you expect it to.

Resources