When is `.=` more efficient than `=`?

When is `.=` more efficient than `=`? - arrays

Consider the following REPL lines using BenchmarkTools:
julia> N = 10^2; M = collect(reshape(1:N^2,N,N)); e = collect(1:N); # N=100
julia> #btime M[:,1] .= e; #btime M[:,1] = e;
1.211 μs (6 allocations: 128 bytes)
364.623 ns (1 allocation: 16 bytes)
julia> N = 10^3; M = collect(reshape(1:N^2,N,N)); e = collect(1:N); # N=1000
julia> #btime M[:,1] .= e; #btime M[:,1] = e;
1.511 μs (6 allocations: 128 bytes)
1.634 μs (1 allocation: 16 bytes)
julia> N = 10^4; M = collect(reshape(1:N^2,N,N)); e = collect(1:N); # N=10000
julia> #btime M[:,1] .= e; #btime M[:,1] = e;
3.514 μs (6 allocations: 128 bytes)
13.230 μs (1 allocation: 16 bytes)
It seems that .= is more efficient than =, but only for large N. I still do not understand very well what's happening under the hood and do not find explanations in the Julia documentation. When should I use one or the other?

Related

Union of collection of sets in vector

If I have a vector of sets, say,
vec_of_sets = [Set(vec1), Set(vec2), ..., Set(vecp)]
how do I obtain a set equal to the union of sets in the vector? That is, how can I write the following efficiently?
S1 = Set(vec1);
union!(S1, Set(vec2))
union!(S1, Set(vec3))
...
union!(S1, Set(vecp))
I don't really know where to start!
Thanks in advance.
Edit: I have tried a solution using generating functions but it doesn't work:
union(j for j in vec_of_sets)

The best and fastest approach is:
Set(Iterators.flatten(vec_of_sets))
It is around twice as fast as other possible approaches proposed at the other post and has makes than half memory allocations.
Here are some benchmarks:
julia> v = [Set(1:3), Set(2:6), Set(4:8)];
julia> #btime Set(Iterators.flatten($v));
270.492 ns (4 allocations: 400 bytes)
julia> #btime reduce(union, $v);
550.000 ns (11 allocations: 1.25 KiB)
julia> #btime union($v...);
506.250 ns (11 allocations: 944 bytes)
julia> #btime union((j for j in $v)...);
699.286 ns (15 allocations: 1.03 KiB)

I guess you should use reduce:
reduce(union, vec_of_sets)
but you could also use splatting (with ...):
union(vec_of_sets...)
FWIW, you could have used splitting with your attempt, too:
union((j for j in vec_of_sets)...)

Why is allocating an array of Union{T, Missing} an order of magnitude slower than an array of T?

Allocating an array of Union{T, Missing} is very expensive in Julia. Is there any workaround it?
julia> #time Vector{Union{Missing, Int}}(undef, 10^7);
0.031052 seconds (2 allocations: 85.831 MiB)
julia> #time Vector{Union{Int}}(undef, 10^7);
0.000027 seconds (3 allocations: 76.294 MiB)

Because if you make a Union of Missing with a bitstype like Int then Julia sets the flag that such a vector initially stores missing in each of its entries:
julia> Vector{Union{Missing, Int}}(undef, 10^7)
10000000-element Vector{Union{Missing, Int64}}:
missing
missing
⋮
missing
missing
If you used non-bitstype then such a flag for each entry does not have to be set as you can see here:
julia> Vector{Union{Missing, String}}(undef, 10^7)
10000000-element Vector{Union{Missing, String}}:
#undef
#undef
⋮
#undef
#undef
and in consequence the performance is the same:
julia> #btime Vector{Union{String}}(undef, 10^7);
11.672 ms (3 allocations: 76.29 MiB)
julia> #btime Vector{Union{Missing, String}}(undef, 10^7);
11.480 ms (2 allocations: 76.29 MiB)

The difference is that union arrays get zero-initialized. You can see the code that decides this here:
https://github.com/JuliaLang/julia/blob/3f024fd0ab9e68b37d29fee6f2a9ab19819102c5/src/array.c#L191
This ends up as a call to memset:
https://github.com/JuliaLang/julia/blob/3f024fd0ab9e68b37d29fee6f2a9ab19819102c5/src/array.c#L144-L145
So as a check, we can compare zeros vs allocating the union array:
julia> #time Vector{Union{Missing, Int}}(undef, 10^7);
0.020609 seconds (2 allocations: 85.831 MiB)
julia> #time zeros(Int, 10^7);
0.018375 seconds (2 allocations: 76.294 MiB)
Quite comparable timings.
However, I don't think this performance difference should end up mattering in your application unless you have structured it in a quite strange way. There is very little work you can do with that array until the allocation time becomes insignificant. For example, just setting the values of the uninitialized array makes the timing vs the union array quite similar:
julia> function f()
a = Vector{Int}(undef, 10^7)
for i in eachindex(a)
a[i] = 1
end
a
end;
julia> function f_union()
a = Vector{Union{Missing, Int}}(undef, 10^7)
for i in eachindex(a)
a[i] = 1
end
a
end;
julia> #time f();
0.015566 seconds (2 allocations: 76.294 MiB)
julia> #time f_union();
0.026414 seconds (2 allocations: 85.831 MiB)

We had the same problem and as a workaround we used
x = Vector{Union{T,Missing}}(undef,1)
resize!(x, newlen)

Performance assigning and copying with StaticArrays.jl in Julia

I was thinking of using the package StaticArrays.jl to enhance the performance of my code. However, I only use arrays to store computed variables and use them later after certain conditions are set. Hence, I was benchmarking the type SizedVector in comparison with normal vector, but I do not understand to code below. I also tried StaticVector and used the work around Setfield.jl.
using StaticArrays, BenchmarkTools, Setfield
function copySized(n::Int64)
v = SizedVector{n, Int64}(zeros(n))
w = Vector{Int64}(undef, n)
for i in eachindex(v)
v[i] = i
end
for i in eachindex(v)
w[i] = v[i]
end
end
function copyStatic(n::Int64)
v = #SVector zeros(n)
w = Vector{Int64}(undef, n)
for i in eachindex(v)
#set v[i] = i
end
for i in eachindex(v)
w[i] = v[i]
end
end
function copynormal(n::Int64)
v = zeros(n)
w = Vector{Int64}(undef, n)
for i in eachindex(v)
v[i] = i
end
for i in eachindex(v)
w[i] = v[i]
end
end
n = 10
#btime copySized($n)
#btime copyStatic($n)
#btime copynormal($n)
3.950 μs (42 allocations: 2.08 KiB)
5.417 μs (98 allocations: 4.64 KiB)
78.822 ns (2 allocations: 288 bytes)
Why does the case with SizedVector does have some much more allocations and hence worse performance? Do I not use SizedVector correctly? Should it not at least have the same performance as normal arrays?
Thank you in advance.
Cross post of Julia Discourse

I feel this is apples-to oranges comparison (and size should be store in statically in type). More illustrative code could look like this:
function copySized(::Val{n}) where n
v = SizedVector{n}(1:n)
w = Vector{Int64}(undef, n)
w .= v
end
function copyStatic(::Val{n}) where n
v = SVector{n}(1:n)
w = Vector{Int64}(undef, n)
w .= v
end
function copynormal(n)
v = [1:n;]
w = Vector{Int64}(undef, n)
w .= v
end
And now benchamrks:
julia> n = 10
10
julia> #btime copySized(Val{$n}());
248.138 ns (1 allocation: 144 bytes)
julia> #btime copyStatic(Val{$n}());
251.507 ns (1 allocation: 144 bytes)
julia> #btime copynormal($n);
77.940 ns (2 allocations: 288 bytes)
julia>
julia>
julia> n = 1000
1000
julia> #btime copySized(Val{$n}());
840.000 ns (2 allocations: 7.95 KiB)
julia> #btime copyStatic(Val{$n}());
830.769 ns (2 allocations: 7.95 KiB)
julia> #btime copynormal($n);
1.100 μs (2 allocations: 15.88 KiB)

#phipsgabler is right! Statically sized arrays have their performance advantages when the size is known statically, at compile time. My arrays are, however, dynamically sized, with the size n being a runtime variable.
Changing this yields more sensible results:
using StaticArrays, BenchmarkTools, Setfield
function copySized()
v = SizedVector{10, Float64}(zeros(10))
w = Vector{Float64}(undef, 10*2)
for i in eachindex(v)
v[i] = rand()
end
for i in eachindex(v)
j = i+floor(Int64, 10/4)
w[j] = v[i]
end
end
function copyStatic()
v = #SVector zeros(10)
w = Vector{Int64}(undef, 10*2)
for i in eachindex(v)
#set v[i] = rand()
end
for i in eachindex(v)
j = i+floor(Int64, 10/4)
w[j] = v[i]
end
end
function copynormal()
v = zeros(10)
w = Vector{Float64}(undef, 10*2)
for i in eachindex(v)
v[i] = rand()
end
for i in eachindex(v)
j = i+floor(Int64, 10/4)
w[j] = v[i]
end
end
#btime copySized()
#btime copyStatic()
#btime copynormal()
110.162 ns (3 allocations: 512 bytes)
48.133 ns (1 allocation: 224 bytes)
92.045 ns (2 allocations: 368 bytes)

Skip every nth element of array

How can I remove every nth element from an array in julia? Let's say I have the following array: a = [1 2 3 4 5 6] and I want b = [1 2 4 5]
In javascript I would do something like:
b = a.filter(e => e % 3);
How can it be done in Julia?

Your question title and text ask different questions. The title asks how to skip the Nth element, whereas the Javascript code snippet details how to skip elements based on their value, not their index.
Skipping by Value
We can do this using filter.
filter((x) -> x % 3 != 0, a)
This is basically equivalent to your Javascript code. We can, incidentally, also use broadcasting.
a[a .% 3 .!= 0]
This is more akin to code you would see in array-oriented languages like MATLAB and R.
Skipping by Index
With an extra enumerate call, we can get the indices to operate on.
map((x) -> x[2], Iterators.filter(((x) -> x[1] % 3 != 0), enumerate(a)))
This is roughly what you'd do in Python. enumerate to get the indices, filter to purge, then map to eliminate the now-unnecessary indices.
Or we can, again, use broadcasting.
a[(1:length(a)) .% 3 .!= 0]

If you need skipping by index the most elegant way is to use InvertedIndices
julia> using InvertedIndices # or using DataFrames
julia> a[Not(3:3:end)]
4-element Vector{Int64}:
1
2
4
5
As you can see all your job here is to provide a range of indices you wish to skip.

If you want to filter by the index, one convenient way is using a comprehension:
julia> a = 10:10:100;
julia> [a[i] for i in eachindex(a) if i % 3 != 0] |> permutedims
1×7 Matrix{Int64}:
10 20 40 50 70 80 100
julia> vec(ans) == [a[1 + 3(j-1)÷2] for j in 1:7]
true
This implicitly involves Iterators.filter, and collects the generator. You can also use this to filter by value, although the eager filter is probably more efficient:
julia> a = 1:10;
julia> [x for x in a if x%3!=0] |> permutedims
1×7 Matrix{Int64}:
1 2 4 5 7 8 10
Perhaps it's interesting to time all of these:
julia> using BenchmarkTools, InvertedIndices
julia> a = rand(1000); # filter by index
julia> i1 = #btime [$a[1 + 3(j-1)÷2] for j in 1:667];
373.162 ns (1 allocation: 5.38 KiB)
julia> i2 = #btime $a[eachindex($a) .% 3 .!= 0];
1.387 μs (4 allocations: 9.80 KiB)
julia> i3 = #btime [$a[i] for i in eachindex($a) if i % 3 != 0];
3.557 μs (11 allocations: 16.47 KiB)
julia> i4 = #btime map((x) -> x[2], Iterators.filter(((x) -> x[1] % 3 != 0), enumerate($a)));
4.202 μs (11 allocations: 16.47 KiB)
julia> i5 = #btime $a[Not(3:3:end)];
84.333 μs (4655 allocations: 182.28 KiB)
julia> i1 == i2 == i3 == i4 == i5
true
julia> a = rand(1:99, 1000); # filter by value
julia> v1 = #btime filter(x -> x%3!=0, $a);
532.185 ns (1 allocation: 7.94 KiB)
julia> v2 = #btime [x for x in $a if x%3!=0];
5.465 μs (11 allocations: 16.47 KiB)
julia> v1 == v2
true

This should help you:
b = a[Bool[i %3 != 0 for i = 1:length(a)]]

a[a .% 2 .!= 0]
please find the link with code.

Julia: A fast and elegant way to get a matrix from an array of arrays

There is an array of arrays containing more than 10,000 pairs of Float64 values. Something like this:
v = [[rand(),rand()], ..., [rand(),rand()]]
I want to get a matrix with two columns from it. It is possible to bypass all pairs with a cycle, it looks cumbersome, but gives the result in a fraction of a second:
x = Vector{Float64}()
y = Vector{Float64}()
for i = 1:length(v)
push!(x, v[i][1])
push!(y, v[i][2])
end
w = hcat(x,y)
The solution with permutedims(reshape(hcat(v...), (length(v[1]), length(v)))), which I found in this task, looks more elegant but completely suspends Julia, is needed to restart the session. Perhaps it was optimal six years ago, but now it is not working in the case of large arrays. Is there a solution that is both compact and fast?

I hope this is short and efficient enough for you:
getindex.(v, [1 2])
and if you want something simpler to digest:
[v[i][j] for i in 1:length(v), j in 1:2]
Also the hcat solution could be written as:
permutedims(reshape(reduce(hcat, v), (length(v[1]), length(v))));
and it should not hang your Julia (please confirm - it works for me).
#Antonello: to understand why this works consider a simpler example:
julia> string.(["a", "b", "c"], [1 2])
3×2 Matrix{String}:
"a1" "a2"
"b1" "b2"
"c1" "c2"
I am broadcasting a column Vector ["a", "b", "c"] and a 1-row Matrix [1 2]. The point is that [1 2] is a Matrix. Thus it makes broadcasting to expand both rows (forced by the vector) and columns (forced by a Matrix). For such expansion to happen it is crucial that the [1 2] matrix has exactly one row. Is this clearer now?

Your own example is pretty close to a good solution, but does some unnecessary work, by creating two distinct vectors, and repeatedly using push!. This solution is similar, but simpler. It is not as terse as the broadcasted getindex by #BogumilKaminski, but is faster:
function mat(v)
M = Matrix{eltype(eltype(v))}(undef, length(v), 2)
for i in eachindex(v)
M[i, 1] = v[i][1]
M[i, 2] = v[i][2]
end
return M
end
You can simplify it a bit further, without losing performance, like this:
function mat_simpler(v)
M = Matrix{eltype(eltype(v))}(undef, length(v), 2)
for (i, x) in pairs(v)
M[i, 1], M[i, 2] = x
end
return M
end

A benchmark of the various solutions posted so far...
using BenchmarkTools
# Creating the vector
v = [[i, i+0.1] for i in 0.1:0.2:2000]
M1 = #btime vcat([[e[1] e[2]] for e in $v]...)
M2 = #btime getindex.($v, [1 2])
M3 = #btime [v[i][j] for i in 1:length($v), j in 1:2]
M4 = #btime permutedims(reshape(reduce(hcat, $v), (length($v[1]), length($v))))
M5 = #btime permutedims(reshape(hcat($v...), (length($v[1]), length($v))))
function original(v)
x = Vector{Float64}()
y = Vector{Float64}()
for i = 1:length(v)
push!(x, v[i][1])
push!(y, v[i][2])
end
return hcat(x,y)
end
function mat(v)
M = Matrix{eltype(eltype(v))}(undef, length(v), 2)
for i in eachindex(v)
M[i, 1] = v[i][1]
M[i, 2] = v[i][2]
end
return M
end
function mat_simpler(v)
M = Matrix{eltype(eltype(v))}(undef, length(v), 2)
for (i, x) in pairs(v)
M[i, 1], M[i, 2] = x
end
return M
end
M6 = #btime original($v)
M7 = #btime mat($v)
M8 = #btime mat($v)
M1 == M2 == M3 == M4 == M5 == M6 == M7 == M8 # true
Output:
1.126 ms (10010 allocations: 1.53 MiB) # M1
54.161 μs (3 allocations: 156.42 KiB) # M2
809.000 μs (38983 allocations: 765.50 KiB) # M3
98.935 μs (4 allocations: 312.66 KiB) # M4
244.696 μs (10 allocations: 469.23 KiB) # M5
219.907 μs (30 allocations: 669.61 KiB) # M6
34.311 μs (2 allocations: 156.33 KiB) # M7
34.395 μs (2 allocations: 156.33 KiB) # M8
Note that the dollar sign in the benchmarked code is just to force #btime to consider the vector as a local variable.