Efficient method for checking monotonicity in array Julia? - arrays

Trying to come up with a fast way to make sure a is monotonic in Julia.
The slow (and obvious) way to do it that I have been using is something like this:
function check_monotonicity(
timeseries::Array,
column::Int
)
current = timeseries[1,column]
for row in 1:size(timeseries, 1)
if timeseries[row,column] > current
return false
end
current = copy(timeseries[row,column])
end
return true
end
So that it works something like this:
julia> using Distributions
julia>mono_matrix = hcat(collect(0:0.1:10), rand(Uniform(0.4,0.6),101),reverse(collect(0.0:0.1:10.0)))
101×3 Matrix{Float64}:
0.0 0.574138 10.0
0.1 0.490671 9.9
0.2 0.457519 9.8
0.3 0.567687 9.7
⋮
9.8 0.513691 0.2
9.9 0.589585 0.1
10.0 0.405018 0.0
julia> check_monotonicity(mono_matrix, 2)
false
And then for the opposite example:
julia> check_monotonicity(mono_matrix, 3)
true
Does anyone know a more efficient way to do this for long time series?

Your implementation is certainly not slow! It is very nearly optimally fast. You should definitely get rid of the copy. Though it doesn't hurt when the matrix elements are just plain data, it can be bad in other cases, perhaps for BigInt for example.
This is close to your original effort, but also more robust with respect to indexing and array types:
function ismonotonic(A::AbstractMatrix, column::Int, cmp = <)
current = A[begin, column] # begin instead of 1
for i in axes(A, 1)[2:end] # skip the first element
newval = A[i, column] # don't use copy here
cmp(newval, current) && return false
current = newval
end
return true
end
Another tip: You don't need to use collect. In fact, you should almost never use collect. Do this instead (I removed Uniform since I don't have Distributions.jl):
mono_matrix = hcat(0:0.1:10, rand(101), reverse(0:0.1:10)) # or 10:-0.1:0
Or perhaps this is better, since you have more control over the numer of elements in the range:
mono_matrix = hcat(range(0, 10, 101), rand(101), range(10, 0, 101))
Then you can use it like this:
1.7.2> ismonotonic(mono_matrix, 3)
false
1.7.2> ismonotonic(mono_matrix, 3, >=)
true
1.7.2> ismonotonic(mono_matrix, 1)
true

In mathematics typically we define a series to be monotonic if it is non-decreasing or non-increasing. If this is what you want then do:
issorted(view(mono_matrix, :, 2), rev=true)
to check if it is non-increasing, and:
issorted(view(mono_matrix, :, 2))
to check if it is non-decreasing.
If you want a decreasing check do:
issorted(view(mono_matrix, :, 3), rev=true, lt = <=)
for decreasing, but treating 0.0 and -0.0 as equal or
issorted(view(mono_matrix, :, 3), lt = <=)
for increasing, but treating 0.0 and -0.0 as equal.
If you want to distinguish 0.0 and -0.0 then do respectively:
issorted(view(mono_matrix, :, 3), rev=true, lt = (x, y) -> isequal(x, y) || isless(x, y))
issorted(view(mono_matrix, :, 3), lt = (x, y) -> isequal(x, y) || isless(x, y))

Related

Why does a subtype of AbstractArray result in imprecise matrix operations in Julia?

I'm currently working on creating a subtype of AbstractArray in Julia, which allows you to store a vector in addition to an Array itself. You can think of it as the column "names", with element types as a subtype of AbstractFloat. Hence, it has some similarities to the NamedArray.jl package, but restricts to only assigning the columns with Floats (in case of matrices).
The struct that I've created so far (following the guide to create a subtype of AbstractArray) is defined as follows:
struct FooArray{T, N, AT, VT} <: AbstractArray{T, N}
data::AT
vec::VT
function FooArray(data::AbstractArray{T1, N}, vec::AbstractVector{T2}) where {T1 <: AbstractFloat, T2 <: AbstractFloat, N}
length(vec) == size(data, 2) || error("Inconsistent dimensions")
new{T1, N, typeof(data), typeof(vec)}(data, vec)
end
end
#inline Base.#propagate_inbounds Base.getindex(fooarr::FooArray, i::Int) = getindex(fooarr.data, i)
#inline Base.#propagate_inbounds Base.getindex(fooarr::FooArray, I::Vararg{Int, 2}) = getindex(fooarr.data, I...)
#inline Base.#propagate_inbounds Base.size(fooarr::FooArray) = size(fooarr.data)
Base.IndexStyle(::Type{<:FooArray}) = IndexLinear()
This already seems to be enough to create objects of type fooArray and do some simple math with it. However, I've observed that some essential functions such as matrix-vector multiplications seem to be imprecise. For example, the following should consistently return a vector of 0.0, but:
R = rand(100, 3)
S = FooArray(R, collect(1.0:3.0))
y = rand(100)
S'y - R'y
3-element Vector{Float64}:
-7.105427357601002e-15
0.0
3.552713678800501e-15
While the differences are very small, they can quickly add up over many different calculations, leading to significant errors.
Where do these differences come from?
A look at the calculations via macro #code_llvm reveals that appearently different matmul functions from LinearAlgebra are used (with other minor differences):
#code_llvm S'y
...
# C:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.6\LinearAlgebra\src\matmul.jl:111 within `*'
...
#code_llvm S'y
...
# C:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.6\LinearAlgebra\src\matmul.jl:106 within `*'
...
Redefining the adjoint and * functions on our FooArray object provides the expected, correct result:
import Base: *, adjoint, /
Base.adjoint(a::FooArray) = FooArray(a.data', zeros(size(a.data, 1)))
*(a::FooArray{T, 2, AT, VT} where {AT, VT}, b::AbstractVector{S}) where {T, S} = a.data * b
S'y - R'y
3-element Vector{Float64}:
0.0
0.0
0.0
However, this solution (which is also done in NamedArrays here) would require defining and maintaining all sorts of functions, not just the standard functions in base, adding more and more dependencies just because of this small error margin.
Is there any simpler way to get rid of this issue without redefining every operation and possibly many other functions from other packages?
I'm using Julia version 1.6.1 on Windows 64-bit system.
Yes, the implementation of matrix multiplication will vary depending upon your array type. The builtin Array will use BLAS, whereas your custom fooArray will use a generic implementation, and due to the non-associativity of floating point arithmetic, these different approaches will indeed yield different values — and note that they may be different from the ground truth, even for the builtin Arrays!
julia> using Random; Random.seed!(0); R = rand(100, 3); y = rand(100);
julia> R'y - Float64.(big.(R)'big.(y))
3-element Vector{Float64}:
-3.552713678800501e-15
0.0
0.0
You may be able to implement your custom array as a DenseArray, which will ensure that it uses the same (BLAS-enabled) codepath. You just need to implement a few more methods, most importantly strides and unsafe_convert:
julia> struct FooArray{T, N} <: DenseArray{T, N}
data::Array{T, N}
end
Base.getindex(fooarr::FooArray, i::Int) = fooarr.data[i]
Base.size(fooarr::FooArray) = size(fooarr.data)
Base.IndexStyle(::Type{<:FooArray}) = IndexLinear()
Base.strides(fooarr::FooArray) = strides(fooarr.data)
Base.unsafe_convert(P::Type{Ptr{T}}, fooarr::FooArray{T}) where {T} = Base.unsafe_convert(P, fooarr.data)
julia> R = rand(100, 3); S = FooArray(R); y = rand(100)
R'y - S'y
3-element Vector{Float64}:
0.0
0.0
0.0
julia> R = rand(100, 1000); S = FooArray(R); y = rand(100)
R'y == S'y
true

Julia: rational behind array size and index for "extra" dimensions?

I am using Julia from time to time, however I am surprised by the following behavior:
Let's define an 3x4 array
julia> m=rand(3,4)
3×4 Array{Float64,2}:
0.889018 0.500847 0.539856 0.828231
0.492425 0.582958 0.521406 0.754102
0.28227 0.834333 0.669967 0.0939701
Now I check that
julia> size(m,1), size(m,2)
(3, 4)
as expected.
However, I am surprised by this:
julia> size(m,3), size(m,2018)
(1, 1)
-> I would have expected (0,0) or an error message
Looking the Julia code confirms this behavior:
size(t::AbstractArray{T,N}, d) where {T,N} = d <= N ? size(t)[d] : 1
Moreover:
julia> m[2,1,1,1,1]
0.4924252391289974
-> I would have expected an out of bounds error
So my question is: "what is the rationale?"
( I do not thing it is a bug, I use Julia version 0.6.2)
I believe it's for broadcasting.
julia> m=rand(3,4)
3×4 Array{Float64,2}:
0.139323 0.663912 0.994985 0.517332
0.423913 0.121753 0.0327054 0.0754665
0.392672 0.47006 0.351121 0.787318
julia> size(m)
(3, 4)
julia> n = rand(3)
3-element Array{Float64,1}:
0.716752
0.98755
0.661226
julia> m .* n
3×4 Array{Float64,2}:
0.09986 0.475861 0.713157 0.370799
0.418636 0.120237 0.0322983 0.074527
0.259645 0.310816 0.23217 0.520595
Notice that n is of one dimension less, so it's size 1 in the 2nd dimension and thus applies column-wise. Scalars in broadcast are treated differently and are generally inlined into the fused broadcasting function which you cannot do with a mutable type, so the size 1 = expand in higher dimensions rule for broadcast is a nice way to implement this.

Passing two-dimensional array to a function in julia

I have a three dimensional array defined as:
x=zeros(Float64,2,2,2)
I want to assign ones to x by passing x to a function, one layer at a time.
The function is:
function init(p,y)
y=ones(p,p)
end
and I will pass x as follows:
for k=1:2
init(2,x[2,2,k])
end
but when I do that, x is zeros, not ones. Why?
julia> x
2x2x2 Array{Float64,3}:
[:, :, 1] =
0.0 0.0
0.0 0.0
[:, :, 2] =
0.0 0.0
0.0 0.0
Any idea how to get Julia to assign ones to x?
One possible solution is to use slice, which makes a SubArray:
x = zeros(2, 2, 2) # Float64 by default
function init!(y)
y[:] = ones(y) # change contents not binding
end
for k in 1:2
init!(slice(x, :, :, k)) # use slice to get SubArray
end
Note that you can use ones(y) to get a vector of ones of the same size as y.
A SubArray gives a view of an array, instead of a copy. In future versions of Julia, indexing an array may give this by default, but currently you must do it explicitly.
For a discussion about values vs. bindings, see
http://www.johnmyleswhite.com/notebook/2014/09/06/values-vs-bindings-the-map-is-not-the-territory/
EDIT: I hadn't seen #tholy's answer, which contains the same idea.
I'm also not sure I understand the question, but slice(x, :, :, k) will take a 2d slice of x.
If you're initializing x as an array of Float64 and then hoping to assign a matrix to each element (which is what it appears you're trying to do), that won't work---the type of x won't allow it. You could make x an array of Any, and then that would be permitted.
I'm not certain I understand, but if you're trying to modify x in place, you'll want to do things a little differently.
The code below should do what you need.
x = zeros(Float64, 2, 2, 2)
function init!(p, y, k)
y[:, :, k] = ones(Float64, p, p)
end
for k = 1:2
init!(2, x, k)
end
And you might also want to keep in mind that the standard convention in Julia is to include an exclamation mark in the name of a function that modifies its arguments. And if I've understood your question, then you want your init!() function to do exactly that.
A lot has changed in Julia, and I thought I would update this answer to reflect Julia 1.5 (probably most of the changes were 1.0). While I would expect the modern x[:, :, k] to work, as this is still refered to as a SubArray this actually is copy now when in an expression. Instead you must use view():
x= zeros(2, 2, 2)
function init!(y)
y[:]= ones(size(y))
end
init!(view(x, :, :, 1)) # get reference to original items
This gives you the desired result:
julia> x
2×2×2 Array{Float64,3}:
[:, :, 1] =
1.0 1.0
1.0 1.0
[:, :, 2] =
0.0 0.0
0.0 0.0
There are also helper macros for writing it in a more palatable form,
init!(#view x[:,:,1])
but you run the danger of greedy macro parsing if you have other arguments, such that
otherfunc!(#view x[:,:,1], 10)
would give you an error Invalid use of #view macro: argument must be a reference expression. To get around this, there is the kludge #views which turns all SubArrays into views, or you can wrap the argument in parenthesis.
otherfunc!(#views x[:,:,1], 10)
otherfunc!(#view( x[:,:,1]), 10)
You can find more information on the manipulation of Arrays and Matrices in this presentation:
(Youtube) Arrays: slices and views

Looping through slices of Theano tensor

I have two 2D Theano tensors, call them x_1 and x_2, and suppose for the sake of example, both x_1 and x_2 have shape (1, 50). Now, to compute their mean squared error, I simply run:
T.sqr(x_1 - x_2).mean(axis = -1).
However, what I wanted to do was construct a new tensor that consists of their mean squared error in chunks of 10. In other words, since I'm more familiar with NumPy, what I had in mind was to create the following tensor M in Theano:
M = [theano.tensor.sqr(x_1[:, i:i+10] - x_2[:, i:i+10]).mean(axis = -1) for i in xrange(0, 50, 10)]
Now, since Theano doesn't have for loops, but instead uses scan (which map is a special case of), I thought I would try the following:
sequence = T.arange(0, 50, 10)
M = theano.map(lambda i: theano.tensor.sqr(x_1[:, i:i+10] - x_2[:, i:i+10]).mean(axis = -1), sequence)
However, this does not seem to work, as I get the error:
only integers, slices (:), ellipsis (...), numpy.newaxis (None) and integer or boolean arrays are valid indices
Is there a way to loop through the slices using theano.scan (or map)? Thanks in advance, as I'm new to Theano!
Similar to what can be done in numpy, a solution would be to reshape your (1, 50) tensor to a (1, 10, 5) tensor (or even a (10, 5) tensor), and then to compute the mean along the second axis.
To illustrate this with numpy, suppose I want to compute means by slices of 2
x = np.array([0, 2, 0, 4, 0, 6])
x = x.reshape([3, 2])
np.mean(x, axis=1)
outputs
array([ 1., 2., 3.])

Fortran-like arrays such as FArray(Float64, -1:1,-7:7,-128:512) in Julia

Generally having 1-based array for Julia is a good decision, but sometimes it is desirable to have Fortran-like array with indices that span some subranges of ℤ:
julia> x = FArray(Float64, -1:1,-7:7,-128:512)
where it would be useful:
in the codes accompanying the book Numerical Solution of Hyperbolic Partial Differential Equations by prof. John A. Trangenstein these negative indices are used intensively for ghost cells for boundary conditions.
The same is true for Clawpack (stands for “Conservation Laws Package”) by prof. Randall J. LeVeque http://depts.washington.edu/clawpack/ and there are many other codes where such indices would be natural.
So such auxiliary class would be useful for speedy translation of such codes.
I just started to implement such an auxiliary type but as I'm quite new to Julia your help would be greatly appreciated.
I started with:
type FArray
ranges
array::Array
function FArray(T, r::Range1{Int}...)
dims = map((x) -> length(x), r)
array = Array(T, dims)
new(r, array)
end
end
Output:
julia> include ("FortranArray.jl")
julia> x = FArray(Float64, -1:1,-7:7,-128:512)
FArray((-1:1,-7:7,-128:512),3x15x641 Array{Float64,3}:
[:, :, 1] =
6.90321e-310 2.6821e-316 1.96042e-316 0.0 0.0 0.0 9.84474e-317 … 1.83233e-316 2.63285e-316 0.0 9.61618e-317 0.0
6.90321e-310 6.32404e-322 2.63285e-316 0.0 0.0 0.0 2.63292e-316 2.67975e-316
...
[:, :, 2] =
...
As I'm completely new to Julia any recommendations especially that lead to more efficient would be greatly appreciated.
[Edit]
The topic has been discussed here:
https://groups.google.com/forum/#!topic/julia-dev/NOF6MA6tb9Y
During the discussion two ways to have Julia arrays with arbitrary base were elaborated:
SubArray-based, sample usage is with a helper function:
function farray(T, r::Range1{Int64}...)
dims = map((x) -> length(x), r)
array = Array(T, dims)
sub_indices = map((x) -> -minimum(x) + 2 : maximum(x), r)
sub(array, sub_indices)
end
julia> y[-1,-7,-128] = 777
777
julia> y[-1,-7,-128] + 33
810.0
julia> y[-2,-7,-128]
ERROR: BoundsError()
in getindex at subarray.jl:200
julia> y[2,-7,-128]
2.3977385e-316
Please note, that bounds are not checked fully more details are here:
https://github.com/JuliaLang/julia/issues/4044
At the moment SubArray has performance issues but eventually its performance might be improved, see also:
https://github.com/JuliaLang/julia/issues/5117
https://github.com/JuliaLang/julia/issues/3496
Another approach that has better performance at the moment, besides checks both bounds:
type FArray{T<:Number, N, A<:AbstractArray} <: AbstractArray
ranges
offsets::NTuple{N,Int}
array::A
function FArray(r::Range1{Int}...)
dims = map((x) -> length(x), r)
array = Array(T, dims)
offs = map((x) -> 1 - minimum(x), r)
new(r, offs, array)
end
end
FArray(T, r::Range1{Int}...) = FArray{T, length(r,), Array{T, length(r,)}}(r...)
getindex{T<:Number}(FA::FArray{T}, i1::Int) = FA.array[i1+FA.offsets[1]]
getindex{T<:Number}(FA::FArray{T}, i1::Int, i2::Int) = FA.array[i1+FA.offsets[1], i2+FA.offsets[2]]
getindex{T<:Number}(FA::FArray{T}, i1::Int, i2::Int, i3::Int) = FA.array[i1+FA.offsets[1], i2+FA.offsets[2], i3+FA.offsets[3]]
setindex!{T}(FA::FArray{T}, x, i1::Int) = arrayset(FA.array, convert(T,x), i1+FA.offsets[1])
setindex!{T}(FA::FArray{T}, x, i1::Int, i2::Int) = arrayset(FA.array, convert(T,x), i1+FA.offsets[1], i2+FA.offsets[2])
setindex!{T}(FA::FArray{T}, x, i1::Int, i2::Int, i3::Int) = arrayset(FA.array, convert(T,x), i1+FA.offsets[1], i2+FA.offsets[2], i3+FA.offsets[3])
getindex and setindex! methods for FArray were inspired by base/array.jl code.
Use cases:
julia> y = FArray(Float64, -1:1,-7:7,-128:512);
julia> y[-1,-7,-128] = 777
777
julia> y[-1,-7,-128] + 33
810.0
julia> y[-1,2,3]
0.0
julia> y[-2,-7,-128]
ERROR: BoundsError()
in getindex at FortranArray.jl:27
julia> y[2,-7,-128]
ERROR: BoundsError()
in getindex at FortranArray.jl:27
There are now two packages that provide this kind of functionality. For arrays with arbitrary start indices, see https://github.com/alsam/OffsetArrays.jl. For even more flexibility see https://github.com/mbauman/AxisArrays.jl, where indices do not have to be integers.

Resources