Trying to come up with a fast way to make sure a is monotonic in Julia.
The slow (and obvious) way to do it that I have been using is something like this:
function check_monotonicity(
timeseries::Array,
column::Int
)
current = timeseries[1,column]
for row in 1:size(timeseries, 1)
if timeseries[row,column] > current
return false
end
current = copy(timeseries[row,column])
end
return true
end
So that it works something like this:
julia> using Distributions
julia>mono_matrix = hcat(collect(0:0.1:10), rand(Uniform(0.4,0.6),101),reverse(collect(0.0:0.1:10.0)))
101×3 Matrix{Float64}:
0.0 0.574138 10.0
0.1 0.490671 9.9
0.2 0.457519 9.8
0.3 0.567687 9.7
⋮
9.8 0.513691 0.2
9.9 0.589585 0.1
10.0 0.405018 0.0
julia> check_monotonicity(mono_matrix, 2)
false
And then for the opposite example:
julia> check_monotonicity(mono_matrix, 3)
true
Does anyone know a more efficient way to do this for long time series?
Your implementation is certainly not slow! It is very nearly optimally fast. You should definitely get rid of the copy. Though it doesn't hurt when the matrix elements are just plain data, it can be bad in other cases, perhaps for BigInt for example.
This is close to your original effort, but also more robust with respect to indexing and array types:
function ismonotonic(A::AbstractMatrix, column::Int, cmp = <)
current = A[begin, column] # begin instead of 1
for i in axes(A, 1)[2:end] # skip the first element
newval = A[i, column] # don't use copy here
cmp(newval, current) && return false
current = newval
end
return true
end
Another tip: You don't need to use collect. In fact, you should almost never use collect. Do this instead (I removed Uniform since I don't have Distributions.jl):
mono_matrix = hcat(0:0.1:10, rand(101), reverse(0:0.1:10)) # or 10:-0.1:0
Or perhaps this is better, since you have more control over the numer of elements in the range:
mono_matrix = hcat(range(0, 10, 101), rand(101), range(10, 0, 101))
Then you can use it like this:
1.7.2> ismonotonic(mono_matrix, 3)
false
1.7.2> ismonotonic(mono_matrix, 3, >=)
true
1.7.2> ismonotonic(mono_matrix, 1)
true
In mathematics typically we define a series to be monotonic if it is non-decreasing or non-increasing. If this is what you want then do:
issorted(view(mono_matrix, :, 2), rev=true)
to check if it is non-increasing, and:
issorted(view(mono_matrix, :, 2))
to check if it is non-decreasing.
If you want a decreasing check do:
issorted(view(mono_matrix, :, 3), rev=true, lt = <=)
for decreasing, but treating 0.0 and -0.0 as equal or
issorted(view(mono_matrix, :, 3), lt = <=)
for increasing, but treating 0.0 and -0.0 as equal.
If you want to distinguish 0.0 and -0.0 then do respectively:
issorted(view(mono_matrix, :, 3), rev=true, lt = (x, y) -> isequal(x, y) || isless(x, y))
issorted(view(mono_matrix, :, 3), lt = (x, y) -> isequal(x, y) || isless(x, y))
I have the following line of code in Julia:
X=[(i,i^2) for i in 1:100 if i^2%5==0]
Basically, it returns a list of tuples (i,i^2) from i=1 to 100 if the remainder of i^2 and 5 is zero. What I want to do is, in the array comprehension, break out of the for loop if i^2 becomes larger than 1000. However, if I implement
X=[(i,i^2) for i in 1:100 if i^2%5==0 else break end]
I get the error: syntax: expected "]".
Is there any way to easily break out of this for loop inside the array? I've tried looking online, but nothing came up.
It's a "fake" for-loop, so you can't break it. Take a look at the lowered code below:
julia> foo() = [(i,i^2) for i in 1:100 if i^2%5==0]
foo (generic function with 1 method)
julia> #code_lowered foo()
LambdaInfo template for foo() at REPL[0]:1
:(begin
nothing
#1 = $(Expr(:new, :(Main.##1#3)))
SSAValue(0) = #1
#2 = $(Expr(:new, :(Main.##2#4)))
SSAValue(1) = #2
SSAValue(2) = (Main.colon)(1,100)
SSAValue(3) = (Base.Filter)(SSAValue(1),SSAValue(2))
SSAValue(4) = (Base.Generator)(SSAValue(0),SSAValue(3))
return (Base.collect)(SSAValue(4))
end)
The output shows that array comprehension is implemented via Base.Generator which takes an iterator as input. It only supports the [if cond(x)::Bool] "guard" for now, so there is no way to use break here.
For your specific case, a workaround is to use isqrt:
julia> X=[(i,i^2) for i in 1:isqrt(1000) if i^2%5==0]
6-element Array{Tuple{Int64,Int64},1}:
(5,25)
(10,100)
(15,225)
(20,400)
(25,625)
(30,900)
I don't think so. You could always just
tmp(i) = (j = i^2; j > 1000 ? false : j%5==0)
X=[(i,i^2) for i in 1:100 if tmp(i)]
Using a for loop is considered idiomatic in Julia and could be more readable in this instance. Also, it could be faster.
Specifically:
julia> using BenchmarkTools
julia> tmp(i) = (j = i^2; j > 1000 ? false : j%5==0)
julia> X1 = [(i,i^2) for i in 1:100 if tmp(i)];
julia> #btime [(i,i^2) for i in 1:100 if tmp(i)];
471.883 ns (7 allocations: 528 bytes)
julia> X2 = [(i,i^2) for i in 1:isqrt(1000) if i^2%5==0];
julia> #btime [(i,i^2) for i in 1:isqrt(1000) if i^2%5==0];
281.435 ns (7 allocations: 528 bytes)
julia> function goodsquares()
res = Vector{Tuple{Int,Int}}()
for i=1:100
if i^2%5==0 && i^2<=1000
push!(res,(i,i^2))
elseif i^2>1000
break
end
end
return res
end
julia> X3 = goodsquares();
julia> #btime goodsquares();
129.123 ns (3 allocations: 304 bytes)
So, another 2x improvement is nothing to disregard and the long function gives plenty of room for illuminating comments.
The following function nested_arrays generates (surprisingly) a nested array of "depth" n. However, when running with even small values of n (2, 3, etc.), it takes a reasonably long time to run and display the output.
julia> nested_arrays(n) = n == 1 ? [1] : [nested_arrays(n - 1)]
nested_arrays (generic function with 1 method)
julia> nested_arrays(1)
1-element Array{Int64,1}:
1
julia> nested_arrays(2)
1-element Array{Array{Int64,1},1}:
[1]
julia> nested_arrays(3)
1-element Array{Array{Array{Int64,1},1},1}:
Array{Int64,1}[[1]]
julia> nested_arrays(10)
1-element Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Int64,1},1},1},1},1},1},1},1},1},1}:
Array{Array{Array{Array{Array{Array{Array{Array{Int64,1},1},1},1},1},1},1},1}[Array{Array{Array{Array{Array{Array{Array{Int64,1},1},1},1},1},1},1}[Array{Array{Array{Array{Array{Array{Int64,1},1},1},1},1},1}[Array{Array{Array{Array{Array{Int64,1},1},1},1},1}[Array{Array{Array{Array{Int64,1},1},1},1}[Array{Array{Array{Int64,1},1},1}[Array{Array{Int64,1},1}[Array{Int64,1}[[1]]]]]]]]]
Interestingly, when using the #time macro or a ; at the end of the line, the result is taking relatively little of the time to calculate. Instead, the actual displaying of the result in the REPL takes the majority of the time.
This strange behavior is not shown in, for example, Python.
In [1]: def nested_lists(n):
...: if n == 1:
...: return [1]
...: return [nested_lists(n - 1)]
...:
In [2]: nested_lists(10)
Out[2]: [[[[[[[[[[1]]]]]]]]]]
In [3]: %time nested_lists(100)
CPU times: user 0 ns, sys: 0 ns, total: 0 ns
Wall time: 37.7 µs
Out[3]: [[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[1]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]
Why is this function so slow in Julia? Is Julia recompiling the display function for different types T in Array{T, 1}? If so, why is this?
Can the speed of this code be improved, or should this just not be done in Julia? My main concern for this in a practical sense would be, for example, loading a complex, nested JSON file, where simply using an n-dimensional array would not be possible.
Yes, this is entirely due to compilation time. You can see this by #time-ing the display. The second time you display it is fast:
julia> nested_arrays(n) = n == 1 ? [1] : [nested_arrays(n - 1)]
nested_arrays (generic function with 1 method)
julia> #time display(nested_arrays(15));
1-element Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Int64,1},1},1},1},1},1},1},1},1},1},1},1},1},1},1}:
Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Int64,1},1},1},1},1},1},1},1},1},1},1},1},1}[Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Int64,1},1},1},1},1},1},1},1},1},1},1},1}[Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Int64,1},1},1},1},1},1},1},1},1},1},1}[Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Int64,1},1},1},1},1},1},1},1},1},1}[Array{Array{Array{Array{Array{Array{Array{Array{Array{Int64,1},1},1},1},1},1},1},1},1}[Array{Array{Array{Array{Array{Array{Array{Array{Int64,1},1},1},1},1},1},1},1}[Array{Array{Array{Array{Array{Array{Array{Int64,1},1},1},1},1},1},1}[Array{Array{Array{Array{Array{Array{Int64,1},1},1},1},1},1}[Array{Array{Array{Array{Array{Int64,1},1},1},1},1}[Array{Array{Array{Array{Int64,1},1},1},1}[Array{Array{Array{Int64,1},1},1}[Array{Array{Int64,1},1}[Array{Int64,1}[[1]]]]]]]]]]]]]]
11.682721 seconds (8.83 M allocations: 371.698 MB, 1.82% gc time)
julia> #time display(nested_arrays(15));
1-element Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Int64,1},1},1},1},1},1},1},1},1},1},1},1},1},1},1}:
Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Int64,1},1},1},1},1},1},1},1},1},1},1},1},1}[Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Int64,1},1},1},1},1},1},1},1},1},1},1},1}[Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Int64,1},1},1},1},1},1},1},1},1},1},1}[Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Int64,1},1},1},1},1},1},1},1},1},1}[Array{Array{Array{Array{Array{Array{Array{Array{Array{Int64,1},1},1},1},1},1},1},1},1}[Array{Array{Array{Array{Array{Array{Array{Array{Int64,1},1},1},1},1},1},1},1}[Array{Array{Array{Array{Array{Array{Array{Int64,1},1},1},1},1},1},1}[Array{Array{Array{Array{Array{Array{Int64,1},1},1},1},1},1}[Array{Array{Array{Array{Array{Int64,1},1},1},1},1}[Array{Array{Array{Array{Int64,1},1},1},1}[Array{Array{Array{Int64,1},1},1}[Array{Array{Int64,1},1}[Array{Int64,1}[[1]]]]]]]]]]]]]]
0.001688 seconds (2.38 k allocations: 102.766 KB)
So why is this so slow? The display here recursively walks through all the arrays and prints them nested inside each other. This is recursively calling show with 14 different types — one with 14 nested arrays, then its element with 13 nested arrays, then its element with 12… and so on! Each of those show methods gets independently compiled. Compiling specialized methods for specific element types is a key part of how Julia is able to produce very efficient code. It means that it's able to specialize every single operation done on each element without any runtime type checking or dispatch. Unfortunately in this case, it gets in the way.
You can work around this with an Any[] array; in the context of a JSON file this makes quite a lot of sense since you don't know if it'll contain strings or arrays or numbers, etc. This is much faster since it only needs to compile the show method for an Any[] array once, and then it recursively uses it.
# new session
julia> nested_arrays(n) = n == 1 ? Any[1] : Any[nested_arrays(n - 1)]
nested_arrays (generic function with 1 method)
julia> #time display(nested_arrays(15));
1-element Array{Any,1}:
Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[1]]]]]]]]]]]]]]
1.571632 seconds (767.12 k allocations: 32.472 MB, 1.04% gc time)
julia> #time display(nested_arrays(15));
1-element Array{Any,1}:
Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[1]]]]]]]]]]]]]]
0.000606 seconds (839 allocations: 30.859 KB)
julia> #time display(nested_arrays(100));
1-element Array{Any,1}:
Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[1]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]
0.002523 seconds (17.76 k allocations: 579.297 KB)
Generally having 1-based array for Julia is a good decision, but sometimes it is desirable to have Fortran-like array with indices that span some subranges of ℤ:
julia> x = FArray(Float64, -1:1,-7:7,-128:512)
where it would be useful:
in the codes accompanying the book Numerical Solution of Hyperbolic Partial Differential Equations by prof. John A. Trangenstein these negative indices are used intensively for ghost cells for boundary conditions.
The same is true for Clawpack (stands for “Conservation Laws Package”) by prof. Randall J. LeVeque http://depts.washington.edu/clawpack/ and there are many other codes where such indices would be natural.
So such auxiliary class would be useful for speedy translation of such codes.
I just started to implement such an auxiliary type but as I'm quite new to Julia your help would be greatly appreciated.
I started with:
type FArray
ranges
array::Array
function FArray(T, r::Range1{Int}...)
dims = map((x) -> length(x), r)
array = Array(T, dims)
new(r, array)
end
end
Output:
julia> include ("FortranArray.jl")
julia> x = FArray(Float64, -1:1,-7:7,-128:512)
FArray((-1:1,-7:7,-128:512),3x15x641 Array{Float64,3}:
[:, :, 1] =
6.90321e-310 2.6821e-316 1.96042e-316 0.0 0.0 0.0 9.84474e-317 … 1.83233e-316 2.63285e-316 0.0 9.61618e-317 0.0
6.90321e-310 6.32404e-322 2.63285e-316 0.0 0.0 0.0 2.63292e-316 2.67975e-316
...
[:, :, 2] =
...
As I'm completely new to Julia any recommendations especially that lead to more efficient would be greatly appreciated.
[Edit]
The topic has been discussed here:
https://groups.google.com/forum/#!topic/julia-dev/NOF6MA6tb9Y
During the discussion two ways to have Julia arrays with arbitrary base were elaborated:
SubArray-based, sample usage is with a helper function:
function farray(T, r::Range1{Int64}...)
dims = map((x) -> length(x), r)
array = Array(T, dims)
sub_indices = map((x) -> -minimum(x) + 2 : maximum(x), r)
sub(array, sub_indices)
end
julia> y[-1,-7,-128] = 777
777
julia> y[-1,-7,-128] + 33
810.0
julia> y[-2,-7,-128]
ERROR: BoundsError()
in getindex at subarray.jl:200
julia> y[2,-7,-128]
2.3977385e-316
Please note, that bounds are not checked fully more details are here:
https://github.com/JuliaLang/julia/issues/4044
At the moment SubArray has performance issues but eventually its performance might be improved, see also:
https://github.com/JuliaLang/julia/issues/5117
https://github.com/JuliaLang/julia/issues/3496
Another approach that has better performance at the moment, besides checks both bounds:
type FArray{T<:Number, N, A<:AbstractArray} <: AbstractArray
ranges
offsets::NTuple{N,Int}
array::A
function FArray(r::Range1{Int}...)
dims = map((x) -> length(x), r)
array = Array(T, dims)
offs = map((x) -> 1 - minimum(x), r)
new(r, offs, array)
end
end
FArray(T, r::Range1{Int}...) = FArray{T, length(r,), Array{T, length(r,)}}(r...)
getindex{T<:Number}(FA::FArray{T}, i1::Int) = FA.array[i1+FA.offsets[1]]
getindex{T<:Number}(FA::FArray{T}, i1::Int, i2::Int) = FA.array[i1+FA.offsets[1], i2+FA.offsets[2]]
getindex{T<:Number}(FA::FArray{T}, i1::Int, i2::Int, i3::Int) = FA.array[i1+FA.offsets[1], i2+FA.offsets[2], i3+FA.offsets[3]]
setindex!{T}(FA::FArray{T}, x, i1::Int) = arrayset(FA.array, convert(T,x), i1+FA.offsets[1])
setindex!{T}(FA::FArray{T}, x, i1::Int, i2::Int) = arrayset(FA.array, convert(T,x), i1+FA.offsets[1], i2+FA.offsets[2])
setindex!{T}(FA::FArray{T}, x, i1::Int, i2::Int, i3::Int) = arrayset(FA.array, convert(T,x), i1+FA.offsets[1], i2+FA.offsets[2], i3+FA.offsets[3])
getindex and setindex! methods for FArray were inspired by base/array.jl code.
Use cases:
julia> y = FArray(Float64, -1:1,-7:7,-128:512);
julia> y[-1,-7,-128] = 777
777
julia> y[-1,-7,-128] + 33
810.0
julia> y[-1,2,3]
0.0
julia> y[-2,-7,-128]
ERROR: BoundsError()
in getindex at FortranArray.jl:27
julia> y[2,-7,-128]
ERROR: BoundsError()
in getindex at FortranArray.jl:27
There are now two packages that provide this kind of functionality. For arrays with arbitrary start indices, see https://github.com/alsam/OffsetArrays.jl. For even more flexibility see https://github.com/mbauman/AxisArrays.jl, where indices do not have to be integers.