Killing a For loop in Julia array comprehension - arrays

I have the following line of code in Julia:
X=[(i,i^2) for i in 1:100 if i^2%5==0]
Basically, it returns a list of tuples (i,i^2) from i=1 to 100 if the remainder of i^2 and 5 is zero. What I want to do is, in the array comprehension, break out of the for loop if i^2 becomes larger than 1000. However, if I implement
X=[(i,i^2) for i in 1:100 if i^2%5==0 else break end]
I get the error: syntax: expected "]".
Is there any way to easily break out of this for loop inside the array? I've tried looking online, but nothing came up.

It's a "fake" for-loop, so you can't break it. Take a look at the lowered code below:
julia> foo() = [(i,i^2) for i in 1:100 if i^2%5==0]
foo (generic function with 1 method)
julia> #code_lowered foo()
LambdaInfo template for foo() at REPL[0]:1
:(begin
nothing
#1 = $(Expr(:new, :(Main.##1#3)))
SSAValue(0) = #1
#2 = $(Expr(:new, :(Main.##2#4)))
SSAValue(1) = #2
SSAValue(2) = (Main.colon)(1,100)
SSAValue(3) = (Base.Filter)(SSAValue(1),SSAValue(2))
SSAValue(4) = (Base.Generator)(SSAValue(0),SSAValue(3))
return (Base.collect)(SSAValue(4))
end)
The output shows that array comprehension is implemented via Base.Generator which takes an iterator as input. It only supports the [if cond(x)::Bool] "guard" for now, so there is no way to use break here.
For your specific case, a workaround is to use isqrt:
julia> X=[(i,i^2) for i in 1:isqrt(1000) if i^2%5==0]
6-element Array{Tuple{Int64,Int64},1}:
(5,25)
(10,100)
(15,225)
(20,400)
(25,625)
(30,900)

I don't think so. You could always just
tmp(i) = (j = i^2; j > 1000 ? false : j%5==0)
X=[(i,i^2) for i in 1:100 if tmp(i)]

Using a for loop is considered idiomatic in Julia and could be more readable in this instance. Also, it could be faster.
Specifically:
julia> using BenchmarkTools
julia> tmp(i) = (j = i^2; j > 1000 ? false : j%5==0)
julia> X1 = [(i,i^2) for i in 1:100 if tmp(i)];
julia> #btime [(i,i^2) for i in 1:100 if tmp(i)];
471.883 ns (7 allocations: 528 bytes)
julia> X2 = [(i,i^2) for i in 1:isqrt(1000) if i^2%5==0];
julia> #btime [(i,i^2) for i in 1:isqrt(1000) if i^2%5==0];
281.435 ns (7 allocations: 528 bytes)
julia> function goodsquares()
res = Vector{Tuple{Int,Int}}()
for i=1:100
if i^2%5==0 && i^2<=1000
push!(res,(i,i^2))
elseif i^2>1000
break
end
end
return res
end
julia> X3 = goodsquares();
julia> #btime goodsquares();
129.123 ns (3 allocations: 304 bytes)
So, another 2x improvement is nothing to disregard and the long function gives plenty of room for illuminating comments.

Related

Repeated membership tests in a loop

Consider this function
using Distributions
using StatsBase
test_vector = sample([1,2,3], 100000000)
function test_1( test_vector)
rand_vector = randn(3333)
sum = 0.0
for t in 1:1000
if t in test_vector
sum = sum +rand_vector[t]
else
sum = sum - rand_vector[t]
end
end
end
I applied #profview to understand performance and it turns out most of the time is spent on if t in test_vector. Is there a way to speed up this part of the program? I thought about excluding test_vector from 1:1000 and run two loops, but this creates memory allocation. Can I get a hint?
P.S. I intend to let the user pass in any test_vector. I'm using sample to create a test_vector just for illustration.
If the vector is large, changing the check for element membership will be faster if you create a Set for the check:
using Distributions
using StatsBase
using BenchmarkTools
test_vector = sample([1,2,3], 1000000)
function test_1(test_vector)
rand_vector = randn(3333)
sum1 = 0.0
for t in 1:1000
if t in test_vector
sum1 = sum1 + rand_vector[t]
else
sum1 = sum1 - rand_vector[t]
end
end
return sum1
end
function test_1_set(test_vector)
rand_vector = randn(3333)
test_set = Set(test_vector)
sum2 = 0.0
for t in 1:1000
if t in test_set
sum2 += rand_vector[t]
else
sum2 -= rand_vector[t]
end
end
return sum2
end
#btime test_1(test_vector)
#btime test_1_set(test_vector)
677.818 ms (3 allocations: 26.12 KiB)
8.795 ms (10 allocations: 18.03 MiB)
Bill's version is the fastest if you don't count the construction of the set:
julia> test_vector = rand(1:3, 10000);
julia> rand_vector = randn(3333);
julia> range = 1:1000;
julia> #btime test_1($(Set(test_vector)), $rand_vector, $range)
3.692 μs (0 allocations: 0 bytes)
14.82533505498519
However, the set creation itself is more costly, especially in terms of memory:
julia> #btime test_1(Set($test_vector), $rand_vector, $range)
52.731 μs (7 allocations: 144.59 KiB)
14.82533505498519
Here's a variant more optimized for memory:
julia> function test_2(xs, ys, range)
range = Set(range)
positive = intersect(range, xs)
negative = setdiff!(range, positive)
return sum(ys[i] for i in positive) - sum(ys[i] for i in negative)
end
test_2 (generic function with 1 method)
julia> #btime test_2($test_vector, $rand_vector, $range)
96.020 μs (11 allocations: 18.98 KiB)
14.825335054985187
Use a Set. They have O(1) lookup.

Is there a way to turn a array into a integer or float?

I'm trying to change an array with int into a single int in Julia 1.5.4 like that:
x = [1,2,3]
Here i would try or use a code/command (here: example())
x_new = example(x)
println(x_new)
typeof(x_new)
Ideal output would be something like this :
123
Int32
I already tried to solve this problem with parse() or push!() or something like this. But nothing worked well.
I couldn't find a similar problem...
You can find an issue about adding this functionality to Julia here: https://github.com/JuliaLang/julia/issues/40393
Bottom line, you don't want to use strings, and you should avoid unnecessary exponentiation, both of which will be really slow.
A very brief solution is
evalpoly(10, reverse([1,2,3]))
Spelling it out a bit more, you can do this
function joindigits(xs)
val = 0
for x in xs
val = 10*val + x
end
return val
end
Is this what you need?
julia> x = [1,2,3]
3-element Vector{Int64}:
1
2
3
julia> list2int(x) = sum(10 .^ (length(x)-1:-1:0) .* x)
list2int (generic function with 1 method)
julia> list2int(x)
123
You are looking for string concatenation and then parsing:
x_new = parse(Int64, string(x...))
Another interesting way to convert many small numbers to a bigger one is to combine raw bytes:
julia> reinterpret(Int16, [Int8(2),Int8(3)])
1-element reinterpret(Int16, ::Vector{Int8}):
770
Note that 770 = 256*3 + 2
Or for actual Ints:
julia> reinterpret(Int128, [10,1])
1-element reinterpret(Int128, ::Vector{Int64}):
18446744073709551626
(note that result is exactly Int128(2)^64+10)

Reducer for non-parallel for loops/multiline comprehensions

Julia has a parallel macro for for loops, which allows things like:
s = #sync #parallel vcat for i in 1:9
k = iseven(i) ? i÷2 : 3i+1
k^2
end
and since the reducer specified is vcat, we get back an array of numbers.
Is it possible to do something like this with a normal for loop (without having to explicitly initialize and push! into the array)?
Since I'm only looking to reduce using vcat, another way to ask this question is: is there a neat readable multiline form of array comprehensions? It's possible to stretch to usual comprehension syntax like this:
s = [
(k = iseven(i) ? i÷2 : 3i+1;
k^2)
for i in 1:9
]
but that seems messy and less readable compared to the #parallel vcat for syntax. Is there a better way of doing multiline comprehensions?
Extending on #Gnimuc's answer, I think mapreduce plus do-syntax is pretty nice:
julia> mapreduce(vcat, 1:9) do i
k = iseven(i) ? i÷2 : 3i+1
k^2
end
9-element Array{Int64,1}:
16
1
100
4
256
9
484
16
784
The short answer is to write multiline functions(or do-blocks as #phg reminds) with a single line array comprehension or map/mapreduce:
s = [
(k = iseven(i) ? i÷2 : 3i+1;
k^2)
for i in 1:9
]
This example is pure comprehension, no reducer is involved. Array comprehension is usually written in one line, for example, s = [iseven(i) ? i÷2 : 3i+1 |> x->x^2 for i in 1:9]. As #phg suggested, multi-line functions can be enclosed in a do-block:
julia> map(1:9) do x
k = iseven(x) ? x÷2 : 3x+1
k^2
end
However, no reducer such as vcat is needed in this case, but if the output of f in the above example is a vector:
julia> function f(x)
k = iseven(x) ? x÷2 : 3x+1
[k^2]
end
f (generic function with 1 method)
julia> s = [f(i) for i in 1:9]
9-element Array{Array{Int64,1},1}:
[16]
[1]
[100]
[4]
[256]
[9]
[484]
[16]
[784]
array comprehension will give you an array of vectors. This time you need to use mapreduce instead:
julia> mapreduce(f, vcat, 1:9)
9-element Array{Int64,1}:
16
1
100
4
256
9
484
16
784

How can the speed of nested arrays in Julia be improved?

The following function nested_arrays generates (surprisingly) a nested array of "depth" n. However, when running with even small values of n (2, 3, etc.), it takes a reasonably long time to run and display the output.
julia> nested_arrays(n) = n == 1 ? [1] : [nested_arrays(n - 1)]
nested_arrays (generic function with 1 method)
julia> nested_arrays(1)
1-element Array{Int64,1}:
1
julia> nested_arrays(2)
1-element Array{Array{Int64,1},1}:
[1]
julia> nested_arrays(3)
1-element Array{Array{Array{Int64,1},1},1}:
Array{Int64,1}[[1]]
julia> nested_arrays(10)
1-element Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Int64,1},1},1},1},1},1},1},1},1},1}:
Array{Array{Array{Array{Array{Array{Array{Array{Int64,1},1},1},1},1},1},1},1}[Array{Array{Array{Array{Array{Array{Array{Int64,1},1},1},1},1},1},1}[Array{Array{Array{Array{Array{Array{Int64,1},1},1},1},1},1}[Array{Array{Array{Array{Array{Int64,1},1},1},1},1}[Array{Array{Array{Array{Int64,1},1},1},1}[Array{Array{Array{Int64,1},1},1}[Array{Array{Int64,1},1}[Array{Int64,1}[[1]]]]]]]]]
Interestingly, when using the #time macro or a ; at the end of the line, the result is taking relatively little of the time to calculate. Instead, the actual displaying of the result in the REPL takes the majority of the time.
This strange behavior is not shown in, for example, Python.
In [1]: def nested_lists(n):
...: if n == 1:
...: return [1]
...: return [nested_lists(n - 1)]
...:
In [2]: nested_lists(10)
Out[2]: [[[[[[[[[[1]]]]]]]]]]
In [3]: %time nested_lists(100)
CPU times: user 0 ns, sys: 0 ns, total: 0 ns
Wall time: 37.7 µs
Out[3]: [[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[1]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]
Why is this function so slow in Julia? Is Julia recompiling the display function for different types T in Array{T, 1}? If so, why is this?
Can the speed of this code be improved, or should this just not be done in Julia? My main concern for this in a practical sense would be, for example, loading a complex, nested JSON file, where simply using an n-dimensional array would not be possible.
Yes, this is entirely due to compilation time. You can see this by #time-ing the display. The second time you display it is fast:
julia> nested_arrays(n) = n == 1 ? [1] : [nested_arrays(n - 1)]
nested_arrays (generic function with 1 method)
julia> #time display(nested_arrays(15));
1-element Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Int64,1},1},1},1},1},1},1},1},1},1},1},1},1},1},1}:
Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Int64,1},1},1},1},1},1},1},1},1},1},1},1},1}[Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Int64,1},1},1},1},1},1},1},1},1},1},1},1}[Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Int64,1},1},1},1},1},1},1},1},1},1},1}[Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Int64,1},1},1},1},1},1},1},1},1},1}[Array{Array{Array{Array{Array{Array{Array{Array{Array{Int64,1},1},1},1},1},1},1},1},1}[Array{Array{Array{Array{Array{Array{Array{Array{Int64,1},1},1},1},1},1},1},1}[Array{Array{Array{Array{Array{Array{Array{Int64,1},1},1},1},1},1},1}[Array{Array{Array{Array{Array{Array{Int64,1},1},1},1},1},1}[Array{Array{Array{Array{Array{Int64,1},1},1},1},1}[Array{Array{Array{Array{Int64,1},1},1},1}[Array{Array{Array{Int64,1},1},1}[Array{Array{Int64,1},1}[Array{Int64,1}[[1]]]]]]]]]]]]]]
11.682721 seconds (8.83 M allocations: 371.698 MB, 1.82% gc time)
julia> #time display(nested_arrays(15));
1-element Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Int64,1},1},1},1},1},1},1},1},1},1},1},1},1},1},1}:
Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Int64,1},1},1},1},1},1},1},1},1},1},1},1},1}[Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Int64,1},1},1},1},1},1},1},1},1},1},1},1}[Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Int64,1},1},1},1},1},1},1},1},1},1},1}[Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Int64,1},1},1},1},1},1},1},1},1},1}[Array{Array{Array{Array{Array{Array{Array{Array{Array{Int64,1},1},1},1},1},1},1},1},1}[Array{Array{Array{Array{Array{Array{Array{Array{Int64,1},1},1},1},1},1},1},1}[Array{Array{Array{Array{Array{Array{Array{Int64,1},1},1},1},1},1},1}[Array{Array{Array{Array{Array{Array{Int64,1},1},1},1},1},1}[Array{Array{Array{Array{Array{Int64,1},1},1},1},1}[Array{Array{Array{Array{Int64,1},1},1},1}[Array{Array{Array{Int64,1},1},1}[Array{Array{Int64,1},1}[Array{Int64,1}[[1]]]]]]]]]]]]]]
0.001688 seconds (2.38 k allocations: 102.766 KB)
So why is this so slow? The display here recursively walks through all the arrays and prints them nested inside each other. This is recursively calling show with 14 different types — one with 14 nested arrays, then its element with 13 nested arrays, then its element with 12… and so on! Each of those show methods gets independently compiled. Compiling specialized methods for specific element types is a key part of how Julia is able to produce very efficient code. It means that it's able to specialize every single operation done on each element without any runtime type checking or dispatch. Unfortunately in this case, it gets in the way.
You can work around this with an Any[] array; in the context of a JSON file this makes quite a lot of sense since you don't know if it'll contain strings or arrays or numbers, etc. This is much faster since it only needs to compile the show method for an Any[] array once, and then it recursively uses it.
# new session
julia> nested_arrays(n) = n == 1 ? Any[1] : Any[nested_arrays(n - 1)]
nested_arrays (generic function with 1 method)
julia> #time display(nested_arrays(15));
1-element Array{Any,1}:
Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[1]]]]]]]]]]]]]]
1.571632 seconds (767.12 k allocations: 32.472 MB, 1.04% gc time)
julia> #time display(nested_arrays(15));
1-element Array{Any,1}:
Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[1]]]]]]]]]]]]]]
0.000606 seconds (839 allocations: 30.859 KB)
julia> #time display(nested_arrays(100));
1-element Array{Any,1}:
Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[1]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]
0.002523 seconds (17.76 k allocations: 579.297 KB)

Julia: does an Array contain a specific sub-array

In julia we can check if an array contains a value, like so:
> 6 in [4,6,5]
true
However this returns false, when attempting to check for a sub-array in a specific order:
> [4,6] in [4,6,5]
false
What is the correct syntax to verify if a specific sub-array exists in an array?
I think it is worth mentioning that in Julia 1.0 you have the function issubset
> issubset([4,6], [4,6,5])
true
You can also quite conveniently call it using the \subseteq latex symbol
> [4,6] ⊆ [4,6,5]
true
This looks pretty optimized to me:
> using Random
> x, y = randperm(10^3)[1:10^2], randperm(10^3);
> #btime issubset(x, y);
16.153 μs (12 allocations: 45.96 KiB)
It takes a little bit of code to make a function that performs well, but this is much faster than the issubvec version above:
function subset2(x,y)
lenx = length(x)
first = x[1]
if lenx == 1
return findnext(y, first, 1) != 0
end
leny = length(y)
lim = length(y) - length(x) + 1
cur = 1
while (cur = findnext(y, first, cur)) != 0
cur > lim && break
beg = cur
#inbounds for i = 2:lenx
y[beg += 1] != x[i] && (beg = 0 ; break)
end
beg != 0 && return true
cur += 1
end
false
end
Note: it would also be much more useful if the function actually returned the position of the beginning of the subarray if found, or 0 if not, similarly to the findfirst/findnext functions.
Timing information (the second one is using my subset2 function):
0.005273 seconds (65.70 k allocations: 4.073 MB)
0.000086 seconds (4 allocations: 160 bytes)
For the third condition i.e. vector [4,6] appears as a sub-vector of 4,6,5 the following function is suggested:
issubvec(v,big) =
any([v == slice(big,i:(i+length(v)-1)) for i=1:(length(big)-length(v)+1)])
For the second condition, that is, give a boolean for each element in els vectors which appears in set vector, the following is suggested:
function vecin(els,set)
res = zeros(Bool,size(els))
res[findin(els,set)]=true
res
end
With the vector in the OP, these result in:
julia> vecin([4,6],[4,6,5])
2-element Array{Bool,1}:
true
true
julia> issubvec([4,6],[4,6,5])
true
note that you can now vectorize in with a dot:
julia> in([4,6,5]).([4, 6])
2-element BitArray{1}:
true
true
and chain with all to get your answer:
julia> all(in([4,6,5]).([4, 6]))
true
I used this recently to find subsequences in arrays of integers. It's not as good or as fast as #scott's subset2(x,y)... but it returns the indices.
function findsequence(arr::Array{Int64}, seq::Array{Int64})
indices = Int64[]
i = 1
n = length(seq)
if n == 1
while true
occurrence = findnext(arr, seq[1], i)
if occurrence == 0
break
else
push!(indices, occurrence)
i = occurrence +1
end
end
else
while true
occurrence = Base._searchindex(arr, seq, i)
if occurrence == 0
break
else
push!(indices, occurrence)
i = occurrence +1
end
end
end
return indices
end
julia> #time findsequence(rand(1:9, 1000), [2,3])
0.000036 seconds (29 allocations: 8.766 KB)
16-element Array{Int64,1}:
80
118
138
158
234
243
409
470
539
589
619
629
645
666
762
856
Here is a more up-to-date implementation using findall
function issubsequence(A, B)
B1inA = findall(isequal(B[1]), A) # indices of the first element of b occuring in a
matchFirstIndex = [] # Saves the first index in A of the occurances
for i in B1inA
if length(A[i:end]) < length(B) continue end
if A[i:i + length(B) - 1] == B
push!(matchFirstIndex, i)
end
end
return matchFirstIndex
end
I get a similar runtime to #daycaster
#time issubsequence(rand(1:9, 1000), [2,3])
0.000038 seconds (111 allocations: 20.844 KiB)
7-element Vector{Any}:
57
427
616
644
771
855
982
There is no standard Julia library function to determine if a particular sequence occurs as a subsequence of another. Probably this is because this is actually a fairly trickly problem (known as the String-searching algorithm) and quite how to do it depends on whether you'll be searching repeatedly, whether you want to do multiple matches, have multiple patterns, want fuzzy matches, etc.
The other answers here give reasonable answers but some are old and Julia has improved, and I wanted to offer a slightly more idiomatic solution.
function issubarray(needle, haystack)
getView(vec, i, len) = view(vec, i:i+len-1)
ithview(i) = getView(haystack, i, length(needle))
return any(i -> ithview(i) == needle, 1:length(haystack)-length(needle)+1)
end
This is lightening fast and requires almost no memory - Julia's view is lightweight and efficient. And, as always with Julia, the solution is generally to simply define more functions.

Resources