How can the speed of nested arrays in Julia be improved? - arrays

The following function nested_arrays generates (surprisingly) a nested array of "depth" n. However, when running with even small values of n (2, 3, etc.), it takes a reasonably long time to run and display the output.
julia> nested_arrays(n) = n == 1 ? [1] : [nested_arrays(n - 1)]
nested_arrays (generic function with 1 method)
julia> nested_arrays(1)
1-element Array{Int64,1}:
1
julia> nested_arrays(2)
1-element Array{Array{Int64,1},1}:
[1]
julia> nested_arrays(3)
1-element Array{Array{Array{Int64,1},1},1}:
Array{Int64,1}[[1]]
julia> nested_arrays(10)
1-element Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Int64,1},1},1},1},1},1},1},1},1},1}:
Array{Array{Array{Array{Array{Array{Array{Array{Int64,1},1},1},1},1},1},1},1}[Array{Array{Array{Array{Array{Array{Array{Int64,1},1},1},1},1},1},1}[Array{Array{Array{Array{Array{Array{Int64,1},1},1},1},1},1}[Array{Array{Array{Array{Array{Int64,1},1},1},1},1}[Array{Array{Array{Array{Int64,1},1},1},1}[Array{Array{Array{Int64,1},1},1}[Array{Array{Int64,1},1}[Array{Int64,1}[[1]]]]]]]]]
Interestingly, when using the #time macro or a ; at the end of the line, the result is taking relatively little of the time to calculate. Instead, the actual displaying of the result in the REPL takes the majority of the time.
This strange behavior is not shown in, for example, Python.
In [1]: def nested_lists(n):
...: if n == 1:
...: return [1]
...: return [nested_lists(n - 1)]
...:
In [2]: nested_lists(10)
Out[2]: [[[[[[[[[[1]]]]]]]]]]
In [3]: %time nested_lists(100)
CPU times: user 0 ns, sys: 0 ns, total: 0 ns
Wall time: 37.7 µs
Out[3]: [[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[1]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]
Why is this function so slow in Julia? Is Julia recompiling the display function for different types T in Array{T, 1}? If so, why is this?
Can the speed of this code be improved, or should this just not be done in Julia? My main concern for this in a practical sense would be, for example, loading a complex, nested JSON file, where simply using an n-dimensional array would not be possible.

Yes, this is entirely due to compilation time. You can see this by #time-ing the display. The second time you display it is fast:
julia> nested_arrays(n) = n == 1 ? [1] : [nested_arrays(n - 1)]
nested_arrays (generic function with 1 method)
julia> #time display(nested_arrays(15));
1-element Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Int64,1},1},1},1},1},1},1},1},1},1},1},1},1},1},1}:
Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Int64,1},1},1},1},1},1},1},1},1},1},1},1},1}[Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Int64,1},1},1},1},1},1},1},1},1},1},1},1}[Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Int64,1},1},1},1},1},1},1},1},1},1},1}[Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Int64,1},1},1},1},1},1},1},1},1},1}[Array{Array{Array{Array{Array{Array{Array{Array{Array{Int64,1},1},1},1},1},1},1},1},1}[Array{Array{Array{Array{Array{Array{Array{Array{Int64,1},1},1},1},1},1},1},1}[Array{Array{Array{Array{Array{Array{Array{Int64,1},1},1},1},1},1},1}[Array{Array{Array{Array{Array{Array{Int64,1},1},1},1},1},1}[Array{Array{Array{Array{Array{Int64,1},1},1},1},1}[Array{Array{Array{Array{Int64,1},1},1},1}[Array{Array{Array{Int64,1},1},1}[Array{Array{Int64,1},1}[Array{Int64,1}[[1]]]]]]]]]]]]]]
11.682721 seconds (8.83 M allocations: 371.698 MB, 1.82% gc time)
julia> #time display(nested_arrays(15));
1-element Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Int64,1},1},1},1},1},1},1},1},1},1},1},1},1},1},1}:
Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Int64,1},1},1},1},1},1},1},1},1},1},1},1},1}[Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Int64,1},1},1},1},1},1},1},1},1},1},1},1}[Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Int64,1},1},1},1},1},1},1},1},1},1},1}[Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Int64,1},1},1},1},1},1},1},1},1},1}[Array{Array{Array{Array{Array{Array{Array{Array{Array{Int64,1},1},1},1},1},1},1},1},1}[Array{Array{Array{Array{Array{Array{Array{Array{Int64,1},1},1},1},1},1},1},1}[Array{Array{Array{Array{Array{Array{Array{Int64,1},1},1},1},1},1},1}[Array{Array{Array{Array{Array{Array{Int64,1},1},1},1},1},1}[Array{Array{Array{Array{Array{Int64,1},1},1},1},1}[Array{Array{Array{Array{Int64,1},1},1},1}[Array{Array{Array{Int64,1},1},1}[Array{Array{Int64,1},1}[Array{Int64,1}[[1]]]]]]]]]]]]]]
0.001688 seconds (2.38 k allocations: 102.766 KB)
So why is this so slow? The display here recursively walks through all the arrays and prints them nested inside each other. This is recursively calling show with 14 different types — one with 14 nested arrays, then its element with 13 nested arrays, then its element with 12… and so on! Each of those show methods gets independently compiled. Compiling specialized methods for specific element types is a key part of how Julia is able to produce very efficient code. It means that it's able to specialize every single operation done on each element without any runtime type checking or dispatch. Unfortunately in this case, it gets in the way.
You can work around this with an Any[] array; in the context of a JSON file this makes quite a lot of sense since you don't know if it'll contain strings or arrays or numbers, etc. This is much faster since it only needs to compile the show method for an Any[] array once, and then it recursively uses it.
# new session
julia> nested_arrays(n) = n == 1 ? Any[1] : Any[nested_arrays(n - 1)]
nested_arrays (generic function with 1 method)
julia> #time display(nested_arrays(15));
1-element Array{Any,1}:
Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[1]]]]]]]]]]]]]]
1.571632 seconds (767.12 k allocations: 32.472 MB, 1.04% gc time)
julia> #time display(nested_arrays(15));
1-element Array{Any,1}:
Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[1]]]]]]]]]]]]]]
0.000606 seconds (839 allocations: 30.859 KB)
julia> #time display(nested_arrays(100));
1-element Array{Any,1}:
Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[1]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]
0.002523 seconds (17.76 k allocations: 579.297 KB)

Related

Is there a way to turn a array into a integer or float?

I'm trying to change an array with int into a single int in Julia 1.5.4 like that:
x = [1,2,3]
Here i would try or use a code/command (here: example())
x_new = example(x)
println(x_new)
typeof(x_new)
Ideal output would be something like this :
123
Int32
I already tried to solve this problem with parse() or push!() or something like this. But nothing worked well.
I couldn't find a similar problem...
You can find an issue about adding this functionality to Julia here: https://github.com/JuliaLang/julia/issues/40393
Bottom line, you don't want to use strings, and you should avoid unnecessary exponentiation, both of which will be really slow.
A very brief solution is
evalpoly(10, reverse([1,2,3]))
Spelling it out a bit more, you can do this
function joindigits(xs)
val = 0
for x in xs
val = 10*val + x
end
return val
end
Is this what you need?
julia> x = [1,2,3]
3-element Vector{Int64}:
1
2
3
julia> list2int(x) = sum(10 .^ (length(x)-1:-1:0) .* x)
list2int (generic function with 1 method)
julia> list2int(x)
123
You are looking for string concatenation and then parsing:
x_new = parse(Int64, string(x...))
Another interesting way to convert many small numbers to a bigger one is to combine raw bytes:
julia> reinterpret(Int16, [Int8(2),Int8(3)])
1-element reinterpret(Int16, ::Vector{Int8}):
770
Note that 770 = 256*3 + 2
Or for actual Ints:
julia> reinterpret(Int128, [10,1])
1-element reinterpret(Int128, ::Vector{Int64}):
18446744073709551626
(note that result is exactly Int128(2)^64+10)

Reducer for non-parallel for loops/multiline comprehensions

Julia has a parallel macro for for loops, which allows things like:
s = #sync #parallel vcat for i in 1:9
k = iseven(i) ? i÷2 : 3i+1
k^2
end
and since the reducer specified is vcat, we get back an array of numbers.
Is it possible to do something like this with a normal for loop (without having to explicitly initialize and push! into the array)?
Since I'm only looking to reduce using vcat, another way to ask this question is: is there a neat readable multiline form of array comprehensions? It's possible to stretch to usual comprehension syntax like this:
s = [
(k = iseven(i) ? i÷2 : 3i+1;
k^2)
for i in 1:9
]
but that seems messy and less readable compared to the #parallel vcat for syntax. Is there a better way of doing multiline comprehensions?
Extending on #Gnimuc's answer, I think mapreduce plus do-syntax is pretty nice:
julia> mapreduce(vcat, 1:9) do i
k = iseven(i) ? i÷2 : 3i+1
k^2
end
9-element Array{Int64,1}:
16
1
100
4
256
9
484
16
784
The short answer is to write multiline functions(or do-blocks as #phg reminds) with a single line array comprehension or map/mapreduce:
s = [
(k = iseven(i) ? i÷2 : 3i+1;
k^2)
for i in 1:9
]
This example is pure comprehension, no reducer is involved. Array comprehension is usually written in one line, for example, s = [iseven(i) ? i÷2 : 3i+1 |> x->x^2 for i in 1:9]. As #phg suggested, multi-line functions can be enclosed in a do-block:
julia> map(1:9) do x
k = iseven(x) ? x÷2 : 3x+1
k^2
end
However, no reducer such as vcat is needed in this case, but if the output of f in the above example is a vector:
julia> function f(x)
k = iseven(x) ? x÷2 : 3x+1
[k^2]
end
f (generic function with 1 method)
julia> s = [f(i) for i in 1:9]
9-element Array{Array{Int64,1},1}:
[16]
[1]
[100]
[4]
[256]
[9]
[484]
[16]
[784]
array comprehension will give you an array of vectors. This time you need to use mapreduce instead:
julia> mapreduce(f, vcat, 1:9)
9-element Array{Int64,1}:
16
1
100
4
256
9
484
16
784

Julia: rational behind array size and index for "extra" dimensions?

I am using Julia from time to time, however I am surprised by the following behavior:
Let's define an 3x4 array
julia> m=rand(3,4)
3×4 Array{Float64,2}:
0.889018 0.500847 0.539856 0.828231
0.492425 0.582958 0.521406 0.754102
0.28227 0.834333 0.669967 0.0939701
Now I check that
julia> size(m,1), size(m,2)
(3, 4)
as expected.
However, I am surprised by this:
julia> size(m,3), size(m,2018)
(1, 1)
-> I would have expected (0,0) or an error message
Looking the Julia code confirms this behavior:
size(t::AbstractArray{T,N}, d) where {T,N} = d <= N ? size(t)[d] : 1
Moreover:
julia> m[2,1,1,1,1]
0.4924252391289974
-> I would have expected an out of bounds error
So my question is: "what is the rationale?"
( I do not thing it is a bug, I use Julia version 0.6.2)
I believe it's for broadcasting.
julia> m=rand(3,4)
3×4 Array{Float64,2}:
0.139323 0.663912 0.994985 0.517332
0.423913 0.121753 0.0327054 0.0754665
0.392672 0.47006 0.351121 0.787318
julia> size(m)
(3, 4)
julia> n = rand(3)
3-element Array{Float64,1}:
0.716752
0.98755
0.661226
julia> m .* n
3×4 Array{Float64,2}:
0.09986 0.475861 0.713157 0.370799
0.418636 0.120237 0.0322983 0.074527
0.259645 0.310816 0.23217 0.520595
Notice that n is of one dimension less, so it's size 1 in the 2nd dimension and thus applies column-wise. Scalars in broadcast are treated differently and are generally inlined into the fused broadcasting function which you cannot do with a mutable type, so the size 1 = expand in higher dimensions rule for broadcast is a nice way to implement this.

Killing a For loop in Julia array comprehension

I have the following line of code in Julia:
X=[(i,i^2) for i in 1:100 if i^2%5==0]
Basically, it returns a list of tuples (i,i^2) from i=1 to 100 if the remainder of i^2 and 5 is zero. What I want to do is, in the array comprehension, break out of the for loop if i^2 becomes larger than 1000. However, if I implement
X=[(i,i^2) for i in 1:100 if i^2%5==0 else break end]
I get the error: syntax: expected "]".
Is there any way to easily break out of this for loop inside the array? I've tried looking online, but nothing came up.
It's a "fake" for-loop, so you can't break it. Take a look at the lowered code below:
julia> foo() = [(i,i^2) for i in 1:100 if i^2%5==0]
foo (generic function with 1 method)
julia> #code_lowered foo()
LambdaInfo template for foo() at REPL[0]:1
:(begin
nothing
#1 = $(Expr(:new, :(Main.##1#3)))
SSAValue(0) = #1
#2 = $(Expr(:new, :(Main.##2#4)))
SSAValue(1) = #2
SSAValue(2) = (Main.colon)(1,100)
SSAValue(3) = (Base.Filter)(SSAValue(1),SSAValue(2))
SSAValue(4) = (Base.Generator)(SSAValue(0),SSAValue(3))
return (Base.collect)(SSAValue(4))
end)
The output shows that array comprehension is implemented via Base.Generator which takes an iterator as input. It only supports the [if cond(x)::Bool] "guard" for now, so there is no way to use break here.
For your specific case, a workaround is to use isqrt:
julia> X=[(i,i^2) for i in 1:isqrt(1000) if i^2%5==0]
6-element Array{Tuple{Int64,Int64},1}:
(5,25)
(10,100)
(15,225)
(20,400)
(25,625)
(30,900)
I don't think so. You could always just
tmp(i) = (j = i^2; j > 1000 ? false : j%5==0)
X=[(i,i^2) for i in 1:100 if tmp(i)]
Using a for loop is considered idiomatic in Julia and could be more readable in this instance. Also, it could be faster.
Specifically:
julia> using BenchmarkTools
julia> tmp(i) = (j = i^2; j > 1000 ? false : j%5==0)
julia> X1 = [(i,i^2) for i in 1:100 if tmp(i)];
julia> #btime [(i,i^2) for i in 1:100 if tmp(i)];
471.883 ns (7 allocations: 528 bytes)
julia> X2 = [(i,i^2) for i in 1:isqrt(1000) if i^2%5==0];
julia> #btime [(i,i^2) for i in 1:isqrt(1000) if i^2%5==0];
281.435 ns (7 allocations: 528 bytes)
julia> function goodsquares()
res = Vector{Tuple{Int,Int}}()
for i=1:100
if i^2%5==0 && i^2<=1000
push!(res,(i,i^2))
elseif i^2>1000
break
end
end
return res
end
julia> X3 = goodsquares();
julia> #btime goodsquares();
129.123 ns (3 allocations: 304 bytes)
So, another 2x improvement is nothing to disregard and the long function gives plenty of room for illuminating comments.

Array of anonymous functions working in julia 0.4, not in 0.5.1

I'm porting some code from Julia 0.4.7 to 0.5.1. I've noticed that there is something not compatible related to the array of anonymous functions. The code is here:
f = x::Array{Function} -> size(x)
# Option 1
f([k -> k+1, k-> k+1]) # This works in 0.4 & 0.5
# Option 2
f(repmat([k -> k+1], 2)) # This only works in 0.4
As far as I can see, the difference is although in 0.4 the anonymous array is still internally seen as Array{Function, 1}, in 0.5 it's seen like Array{#11#12, 1} (the numbers may change), so then it raises a MethodError thus they don't match.
Although the example is stupid it shows what I really need: to replicate an anonymous function a variable number of times.
Thanks!
In Julia 0.5+, Function becomes an abstract type, so Array{Function} is a parametric type which is invariant.
julia> typeof(x -> 2x)
##1#2
julia> typeof(x -> 2x) <: Function
true
julia> typeof([x -> 2x]) <: Array{Function}
false
As a result, the correct way to define f is:
f{T<:Function}(x::Array{T}) = size(x)
julia> f(repmat([k -> k+1], 2))
(2,)

Resources