Union of collection of sets in vector - arrays

If I have a vector of sets, say,
vec_of_sets = [Set(vec1), Set(vec2), ..., Set(vecp)]
how do I obtain a set equal to the union of sets in the vector? That is, how can I write the following efficiently?
S1 = Set(vec1);
union!(S1, Set(vec2))
union!(S1, Set(vec3))
...
union!(S1, Set(vecp))
I don't really know where to start!
Thanks in advance.
Edit: I have tried a solution using generating functions but it doesn't work:
union(j for j in vec_of_sets)

The best and fastest approach is:
Set(Iterators.flatten(vec_of_sets))
It is around twice as fast as other possible approaches proposed at the other post and has makes than half memory allocations.
Here are some benchmarks:
julia> v = [Set(1:3), Set(2:6), Set(4:8)];
julia> #btime Set(Iterators.flatten($v));
270.492 ns (4 allocations: 400 bytes)
julia> #btime reduce(union, $v);
550.000 ns (11 allocations: 1.25 KiB)
julia> #btime union($v...);
506.250 ns (11 allocations: 944 bytes)
julia> #btime union((j for j in $v)...);
699.286 ns (15 allocations: 1.03 KiB)

I guess you should use reduce:
reduce(union, vec_of_sets)
but you could also use splatting (with ...):
union(vec_of_sets...)
FWIW, you could have used splitting with your attempt, too:
union((j for j in vec_of_sets)...)

Related

How to convert string to array with no spaces

Related:
How to convert from string to array?
This is a follow-up question. How would I make a list of all the digits in this number (currently as a string)?
"123" -> [1,2,3]
There are no delimiters here so how should I go about doing this?
Note as of now I am using the latest version of Julia, v1.8.3 so parse doesn't seem to work in the other question's answers. Error when I use parse():
ERROR: LoadError: MethodError: no method matching parse(::SubString{String})
Closest candidates are:
parse(::Type{T}, ::AbstractString) where T<:Complex at parse.jl:381
parse(::Type{Sockets.IPAddr}, ::AbstractString) at ~/usr/share/julia/stdlib/v1.8/Sockets/src/IPAddr.jl:246
parse(::Type{T}, ::AbstractChar; base) where T<:Integer at parse.jl:40
...
Stacktrace:
[1] iterate
# ./generator.jl:47 [inlined]
[2] _collect
# ./array.jl:807 [inlined]
[3] collect_similar
# ./array.jl:716 [inlined]
[4] map
# ./abstractarray.jl:2933 [inlined]
[5] top-level scope
# ~/proc/self/fd/0:1
in expression starting at /proc/self/fd/0:1
exit status 1
Easy peasy like this:
function str2vec(s::String)
return map(x->parse(Int,x), split(s,""))
end
julia> str2vec("124")
3-element Vector{Int64}:
1
2
4
Or by broadcasting:
julia> parse.(Int, split("124",""))
3-element Vector{Int64}:
1
2
4
By piping functions:
julia> "124" |> x->split(x, "") |> x->parse.(Int, x)
3-element Vector{Int64}:
1
2
4
Utilizing the eachsplit function, which is a lazy function and returns a generator object (introduced in Julia 1.8):
julia> eachsplit("124", "") |> x->parse.(Int, x)
3-element Vector{Int64}:
1
2
4
According to Dan's advice, you try another ways:
Using the Int8 on the collected chars:
julia> Int8.(collect("124")).-48
3-element Vector{Int64}:
1
2
4
Using the Iterators.map:
julia> collect(Iterators.map(x->Int8(x)-48,"124"))
3-element Vector{Int64}:
1
2
4
Also, one can consider the DNF's proposal:
julia> [Int(x)-48 for x in "124"]
3-element Vector{Int64}:
1
2
4
Benchmarking
julia> using BenchmarkTools
julia> #btime str2vec("124");
#btime parse.(Int, split("124",""));
#btime "124" |> x->split(x, "") |> x->parse.(Int, x);
#btime eachsplit("124", "") |> x->parse.(Int, x);
#btime Int8.(collect("124")).-48;
#btime collect(Iterators.map(x->Int8(x)-48,"123"));
#btime [Int(x)-48 for x in "123"]
681.250 ns (11 allocations: 864 bytes)
675.460 ns (11 allocations: 864 bytes)
679.747 ns (11 allocations: 864 bytes)
1.280 μs (14 allocations: 816 bytes)
92.412 ns (2 allocations: 160 bytes)
61.711 ns (1 allocation: 80 bytes)
45.152 ns (1 allocation: 80 bytes)
You can also use the inbuilt digits function.
By default, it returns the digits last-to-first:
julia> digits(parse(Int, "1234"))
4-element Vector{Int64}:
4
3
2
1
You can reverse! the result if you want them in the same order as in the string:
julia> digits(parse(Int, "1234")) |> reverse!
4-element Vector{Int64}:
1
2
3
4
This runs much faster than parseing each digit individually. The Int8(...) .- 48 method is still faster, but it fails silently if the input string happens to be invalid, which could be dangerous further down the line. Since we're using parse here, this method reports the error correctly in such cases.
julia> Int8.(collect("invalid")).-48
7-element Vector{Int64}:
57
62
70
49
60
57
52
julia> digits(parse(Int, "invalid")) |> reverse!
ERROR: ArgumentError: invalid base 10 digit 'i' in "invalid"
Both other answers are very good, but they have forgotten about comprehensions. Using a comprehension gives both the fastest safe solution, and the absolute fastest solution, tied with the Iterators.map.
Fastest unsafe (based on the answer by #Shayan with input from #DanGetz):
julia> #btime [Int(c)-48 for c in "123"]
34.372 ns (1 allocation: 80 bytes)
3-element Vector{Int64}:
1
2
3
The above will silently return the wrong answer for invalid inputs, as noted by #SundarR.
Here's an even nicer and more intuitive version of the above, which is the same under the hood:
[c - '0' for c in "123"]
It works because Int('0') equals 48, and subtraction of Chars yields an Int.
Fastest safe solution (based on #SundarR's answer):
julia> #btime [parse(Int, c) for c in "123"]
47.822 ns (1 allocation: 80 bytes)
3-element Vector{Int64}:
1
2
3
julia> [parse(Int, c) for c in "invalid"]
ERROR: ArgumentError: invalid base 10 digit 'i'
I would probably recommend the latter in most cases.
One more thing you may or may not be aware of: You can create a generator instead of a vector, in case you don't actually need the vector itself, but want to iterate over the converted numbers for some other purpose. The syntax is almost identical to an array comprehension, just use () instead:
g = (parse(Int, c) for c in "123")
for val in g
println(val, " squared equals ", val^2)
end
1 squared equals 1
2 squared equals 4
3 squared equals 9
This will not allocate an intermediate temporary vector, and creating the generator is essentially free:
julia> #btime (parse(Int, c) for c in "123")
1.900 ns (0 allocations: 0 bytes)
The computational cost is paid during iteration instead. This is similar to using Iterators.map without collect, but arguably has nicer syntax.

Why is allocating an array of Union{T, Missing} an order of magnitude slower than an array of T?

Allocating an array of Union{T, Missing} is very expensive in Julia. Is there any workaround it?
julia> #time Vector{Union{Missing, Int}}(undef, 10^7);
0.031052 seconds (2 allocations: 85.831 MiB)
julia> #time Vector{Union{Int}}(undef, 10^7);
0.000027 seconds (3 allocations: 76.294 MiB)
Because if you make a Union of Missing with a bitstype like Int then Julia sets the flag that such a vector initially stores missing in each of its entries:
julia> Vector{Union{Missing, Int}}(undef, 10^7)
10000000-element Vector{Union{Missing, Int64}}:
missing
missing
⋮
missing
missing
If you used non-bitstype then such a flag for each entry does not have to be set as you can see here:
julia> Vector{Union{Missing, String}}(undef, 10^7)
10000000-element Vector{Union{Missing, String}}:
#undef
#undef
⋮
#undef
#undef
and in consequence the performance is the same:
julia> #btime Vector{Union{String}}(undef, 10^7);
11.672 ms (3 allocations: 76.29 MiB)
julia> #btime Vector{Union{Missing, String}}(undef, 10^7);
11.480 ms (2 allocations: 76.29 MiB)
The difference is that union arrays get zero-initialized. You can see the code that decides this here:
https://github.com/JuliaLang/julia/blob/3f024fd0ab9e68b37d29fee6f2a9ab19819102c5/src/array.c#L191
This ends up as a call to memset:
https://github.com/JuliaLang/julia/blob/3f024fd0ab9e68b37d29fee6f2a9ab19819102c5/src/array.c#L144-L145
So as a check, we can compare zeros vs allocating the union array:
julia> #time Vector{Union{Missing, Int}}(undef, 10^7);
0.020609 seconds (2 allocations: 85.831 MiB)
julia> #time zeros(Int, 10^7);
0.018375 seconds (2 allocations: 76.294 MiB)
Quite comparable timings.
However, I don't think this performance difference should end up mattering in your application unless you have structured it in a quite strange way. There is very little work you can do with that array until the allocation time becomes insignificant. For example, just setting the values of the uninitialized array makes the timing vs the union array quite similar:
julia> function f()
a = Vector{Int}(undef, 10^7)
for i in eachindex(a)
a[i] = 1
end
a
end;
julia> function f_union()
a = Vector{Union{Missing, Int}}(undef, 10^7)
for i in eachindex(a)
a[i] = 1
end
a
end;
julia> #time f();
0.015566 seconds (2 allocations: 76.294 MiB)
julia> #time f_union();
0.026414 seconds (2 allocations: 85.831 MiB)
We had the same problem and as a workaround we used
x = Vector{Union{T,Missing}}(undef,1)
resize!(x, newlen)

Skip every nth element of array

How can I remove every nth element from an array in julia? Let's say I have the following array: a = [1 2 3 4 5 6] and I want b = [1 2 4 5]
In javascript I would do something like:
b = a.filter(e => e % 3);
How can it be done in Julia?
Your question title and text ask different questions. The title asks how to skip the Nth element, whereas the Javascript code snippet details how to skip elements based on their value, not their index.
Skipping by Value
We can do this using filter.
filter((x) -> x % 3 != 0, a)
This is basically equivalent to your Javascript code. We can, incidentally, also use broadcasting.
a[a .% 3 .!= 0]
This is more akin to code you would see in array-oriented languages like MATLAB and R.
Skipping by Index
With an extra enumerate call, we can get the indices to operate on.
map((x) -> x[2], Iterators.filter(((x) -> x[1] % 3 != 0), enumerate(a)))
This is roughly what you'd do in Python. enumerate to get the indices, filter to purge, then map to eliminate the now-unnecessary indices.
Or we can, again, use broadcasting.
a[(1:length(a)) .% 3 .!= 0]
If you need skipping by index the most elegant way is to use InvertedIndices
julia> using InvertedIndices # or using DataFrames
julia> a[Not(3:3:end)]
4-element Vector{Int64}:
1
2
4
5
As you can see all your job here is to provide a range of indices you wish to skip.
If you want to filter by the index, one convenient way is using a comprehension:
julia> a = 10:10:100;
julia> [a[i] for i in eachindex(a) if i % 3 != 0] |> permutedims
1×7 Matrix{Int64}:
10 20 40 50 70 80 100
julia> vec(ans) == [a[1 + 3(j-1)÷2] for j in 1:7]
true
This implicitly involves Iterators.filter, and collects the generator. You can also use this to filter by value, although the eager filter is probably more efficient:
julia> a = 1:10;
julia> [x for x in a if x%3!=0] |> permutedims
1×7 Matrix{Int64}:
1 2 4 5 7 8 10
Perhaps it's interesting to time all of these:
julia> using BenchmarkTools, InvertedIndices
julia> a = rand(1000); # filter by index
julia> i1 = #btime [$a[1 + 3(j-1)÷2] for j in 1:667];
373.162 ns (1 allocation: 5.38 KiB)
julia> i2 = #btime $a[eachindex($a) .% 3 .!= 0];
1.387 μs (4 allocations: 9.80 KiB)
julia> i3 = #btime [$a[i] for i in eachindex($a) if i % 3 != 0];
3.557 μs (11 allocations: 16.47 KiB)
julia> i4 = #btime map((x) -> x[2], Iterators.filter(((x) -> x[1] % 3 != 0), enumerate($a)));
4.202 μs (11 allocations: 16.47 KiB)
julia> i5 = #btime $a[Not(3:3:end)];
84.333 μs (4655 allocations: 182.28 KiB)
julia> i1 == i2 == i3 == i4 == i5
true
julia> a = rand(1:99, 1000); # filter by value
julia> v1 = #btime filter(x -> x%3!=0, $a);
532.185 ns (1 allocation: 7.94 KiB)
julia> v2 = #btime [x for x in $a if x%3!=0];
5.465 μs (11 allocations: 16.47 KiB)
julia> v1 == v2
true
This should help you:
b = a[Bool[i %3 != 0 for i = 1:length(a)]]
a[a .% 2 .!= 0]
please find the link with code.

Concatenate Multidimensional arrays in Julia

I have an array of multi-dim arrays Array{Array{Float64,3},1} and what I want is a single 4 dimensional array Array{Float64,4}.
I have gone through the other responses
concatenate array in julia
Concatenating arrays in Julia
Multidimensional Array Comprehension in Julia
But no combination of cat and reshape seems to do the trick.
There must be a good idiomatic way... what is it?
Your answer is correct and generic. Note, however, that assuming your inner arrays have the same size (not just same dimensionality), there is also the following faster way:
julia> matrix = [rand(1,2,3) for _ in 1:4]; # some test data
julia> #btime a = cat($matrix..., dims=4); # your solution
11.519 μs (80 allocations: 3.83 KiB)
julia> #btime b = reshape(collect(Iterators.flatten($matrix)), (1,2,3,4)); # much faster solution
611.960 ns (55 allocations: 2.27 KiB)
julia> a == b
true
Sorry to bother you, I figured it out soon after posting
julia> typeof(matrix)
Array{Array{Float64,3},1}
julia> typeof(matrix[1])
Array{Float64,3}
julia> typeof(cat(matrix...,dims=4))
Array{Float64,4}

Efficiently construct array with 2-element Array{Float64,1} returned by a function

I have a function, which returns a two dimensional Array:
2-element Array{Float64,1}:
0.809919
2.00754
I now want to efficiently sample over it and store all the results in an array with 2 rows and n columns. The problem is that I get a Vector of vectors. How could I flatten it or construct it?
A toy example is the following:
julia> [rand(2) for i=1:3]
3-element Array{Array{Float64,1},1}:
[0.906644, 0.614673]
[0.426492, 0.67645]
[0.473704, 0.726284]
julia> [rand(2)' for i=1:3]
3-element Array{RowVector{Float64,Array{Float64,1}},1}:
[0.403384 0.431918]
[0.410625 0.546614]
[0.224933 0.118778]
And I would like to have the result in a form like this:
julia> [rand(2) rand(2) rand(2)]
2×3 Array{Float64,2}:
0.360833 0.205969 0.209643
0.507417 0.317295 0.588516
Actually my dream would be:
julia> [rand(2) rand(2) rand(2)]'
3×2 Array{Float64,2}:
0.0320955 0.821869
0.358808 0.26685
0.230355 0.31273
Any ideas? I know that I could construct it via a for loop, but was looking for a more efficient way.
Thanks!
RecursiveArrayTools.jl has a VectorOfArray type which dispatches in the way you'd want:
julia> using RecursiveArrayTools
julia> A = [rand(2) for i=1:3]
3-element Array{Array{Float64,1},1}:
[0.957228, 0.104218]
[0.293985, 0.83882]
[0.788157, 0.454772]
julia> VectorOfArray(A)'
3×2 Array{Float64,2}:
0.957228 0.104218
0.293985 0.83882
0.788157 0.454772
As for timing:
julia> #benchmark VectorOfArray(A)'
BenchmarkTools.Trial:
memory estimate: 144 bytes
allocs estimate: 2
--------------
minimum time: 100.658 ns (0.00% GC)
median time: 111.740 ns (0.00% GC)
mean time: 127.159 ns (3.29% GC)
maximum time: 1.360 μs (82.71% GC)
--------------
samples: 10000
evals/sample: 951
VectorOfArray itself is almost no overhead, and the ' uses the Cartesian indexing to be fast.
Something along these lines
using BenchmarkTools
function createSample!(vec::AbstractVector)
vec .= randn(length(vec))
return vec
end
function createSamples!(A::Matrix)
for row in indices(A, 1)
createSample!(view(A, row, :))
end
return A
end
A = zeros(10, 2)
#benchmark createSamples!(A)
might help. The timing on my laptop gives:
Main> #benchmark createSamples!(A)
BenchmarkTools.Trial:
memory estimate: 1.41 KiB
allocs estimate: 20
--------------
minimum time: 539.104 ns (0.00% GC)
median time: 581.194 ns (0.00% GC)
mean time: 694.601 ns (13.34% GC)
maximum time: 10.324 μs (90.10% GC)
--------------
samples: 10000
evals/sample: 193

Resources