Julia: Add missing value to array - arrays

I have an array that can take Float64 and Missing values:
local x::Array{Union{Float64, Missing}, 1} = [1.0, missing, 3.0]
I can add more Float64 values using the append! function, but I can't add a missing value this way. I get the following error:
julia> append!(x, missing)
ERROR: MethodError: no method matching length(::Missing)
What's the correct way to add missing values to this array?

Yes you are right that push! should be used.
Additionally your code does not need to be so verbose:
julia> x = [1.0, missing, 3.0]
3-element Array{Union{Missing, Float64},1}:
1.0
missing
3.0
julia> y = Union{Missing, Float64}[]
0-element Array{Union{Missing, Float64},1}
julia> push!(y,1);
julia> push!(y,missing)
2-element Array{Union{Missing, Float64},1}:
1.0
missing
Moreover, instead of Array{Union{Float64, Missing}, 1} the shorter and more readable version Vector{Union{Float64, Missing}} can be used.

I should have been using push! - append! is for adding collections, while push! is for single values.

Related

How to export/import an array in Julia?

I want to move an array from my laptop (Julia 1.3.1) to my desktop PC (Julia 1.6.2).
I make an array in Julia 1.3.1 as follows.
using LinearAlgebra
H = ... #give a matrix H
eigen,vector = eigen(H)
Then, I'd like to move "vector" to Julia 1.6.2.
How do you do that?
The simplest way is by using DelimitedFiles:
julia> v = [1.0,2.0,3.0]
julia> using DelimitedFiles
julia> writedlm("f.txt", v)
julia> readdlm("f.txt")
3×1 Matrix{Float64}:
1.0
2.0
3.0
julia> vec(readdlm("f.txt"))
3-element Vector{Float64}:
1.0
2.0
3.0
Note that DelmitedFiles works with matrices so the last example shows what to do if you rather store a vector.
Edit following Bogumil's comment
When you have a Matrix of Complex numbers you need to provide the output type for readdlm:
julia> v = Complex.(rand(2,3), rand(2,3))
2×3 Matrix{ComplexF64}:
0.282157+0.540556im 0.757765+0.103518im 0.979935+0.212347im
0.557499+0.934859im 0.604032+0.338489im 0.431962+0.945946im
julia> writedlm("f.txt", v)
julia> readdlm("f.txt",'\t',Complex{Float64})
2×3 Matrix{ComplexF64}:
0.282157+0.540556im 0.757765+0.103518im 0.979935+0.212347im
0.557499+0.934859im 0.604032+0.338489im 0.431962+0.945946im
julia> readdlm("f.txt",'\t',Complex{Float64}) == v
true
Another way is to use a binary format. For long term in-between version serialization BSON (binary json) could be a good option:
julia> using BSON
julia> BSON.bson("v.bson", v = v)
julia> v2 = BSON.load("v.bson")[:v]
2×3 Matrix{ComplexF64}:
0.282157+0.540556im 0.757765+0.103518im 0.979935+0.212347im
0.557499+0.934859im 0.604032+0.338489im 0.431962+0.945946im

Two different types of arrays of arrays

I'm confused about the different types of arrays of arrays. Consider these two examples
a = Array{Float64}[]
push!(a,[1, 2])
push!(a,[3, 4])
push!(a,[1 2; 3 4])
b = Array[[1.0, 2.0], [3.0,4.0], [1.0 2.0; 3.0 4.0]]
I'm not sure how a and b differ. Suppose I intend to run a for loop over each of the elements in a or b, and multiple each element by 2. That is,
for i in 1:3 a[i] = a[i]*2 end
for i in 1:3 b[i] = b[i]*2 end
I time the run time of both lines respectively, but they are equally fast. Are a and b the same? If so, why does a even exist? It looks fairly complicated, because typeof(a) yields "Array{Array{Float64,N} where N,1}". What does where do here?
Both a and b are vectors but they allow different type of elements. You can check this by writing:
julia> typeof(a)
Array{Array{Float64,N} where N,1}
julia> typeof(b)
Array{Array,1}
Now Array isa parametric type taking two parameters. The first parameter is type of elements that it allows. The second parameter is the dimension. You can see that in both cases the dimension is 1 which means both a and b are vectors. You can also check it using the ndims function:
julia> ndims(a)
1
julia> ndims(b)
1
The first parameter is allowed element type. In the case of a it is Array{Float64,N} where N while in the case of b just Array is printed. Before I explain how to read them note that the first parameter can be extracted using the eltype function:
julia> eltype(a)
Array{Float64,N} where N
julia> eltype(b)
Array
You can see that both a and b allow Array to be stored. First let me explain how to read Array{Float64, N} where N. It means that a allows storing of arrays of Float64 of any dimension. Actually you could have written in a shorter way like this Array{Float64} as you can check that:
julia> (Array{Float64,N} where N) === Array{Float64}
true
The reason is that if you do not put a restriction on a tail parameter it can be dropped in syntax. The where N part is a restriction on the parameter. In this case there is no restriction on the second parameter.
Now we can turn to b. You see its eltype is just Array, so both parameters are dropped, thus there are no restrictions on them as explained above. So Array is the same as Array{T, N} where {T,N} as you can see here:
julia> (Array{T, N} where {T,N}) === Array
true
So the difference is that a can store arrays of any dimension but they must have Float64 element type, while b can store arrays of any dimension and any element type. The distinction, in this case, has no performance impact as you have noted, but will have an impact on what can be stored in a and b. Here are some examples.
In this case they work the same, as you try to store an Int in them, but they allow only arrays:
julia> a[1] = 1
ERROR: MethodError: Cannot `convert` an object of type
julia> b[1] = 1
ERROR: MethodError: Cannot `convert` an object of type
but here they differ:
julia> a[1] = ["a"]
ERROR: MethodError: Cannot `convert` an object of type String to an object of type Float64
julia> b[1] = ["a"]
1-element Array{String,1}:
"a"
julia> a
3-element Array{Array{Float64,N} where N,1}:
[1.0, 2.0]
[3.0, 4.0]
[1.0 2.0; 3.0 4.0]
julia> b
3-element Array{Array,1}:
["a"]
[3.0, 4.0]
[1.0 2.0; 3.0 4.0]
As you can see you can store an array of String in b, but this is not allowed to be stored in a.
Two additional comments (both topics are a bit complex so I leave out the details, but just give you hints what is going on):
You are allowed not to specify element type of an array when defining it. In this case Julia will automatically pick its element type:
julia> [[1.0, 2.0], [3.0,4.0], [1.0 2.0; 3.0 4.0]]
3-element Array{Array{Float64,N} where N,1}:
[1.0, 2.0]
[3.0, 4.0]
[1.0 2.0; 3.0 4.0]
The performance impact of the choice of element type of an array depends on the fact:
if the type is abstract (that can be checked by isabstracttype; this impact type inference made by the compiler),
if it is bits type (that can be checked by isbitstype; this impacts the storage layout),
if the element type is Union (small unions can are handled more efficiently).
In your case both element types are abstract and non-bits so the performance will be the same.

Julia: Initializing numeric arrays of different types

I am trying to build a two element array in Julia, where each sub-array has a different type (one is a vector of Int64s, the other is an array of Float32s).
The code belows automatically converts the element that I want to be an Int64 into a Float32, which is what I don't want:
my_multitype_array = [ collect(1:5), rand(Float32,3) ]
The resulting array automatically converts the Int64s in the first array (defined via collect(1:5)) into a Float32, and the resulting my_multitype_array has type 2-element Array{Array{Float32,1}}. How do I force it to make the first sub-array remain Int64s? Do I need to perhaps pre-define my_multitype_array to be an empty array with two elements of the desired types, before filling it out with values?
And finally, once I do have the desired array with different types, how would I refer to it, when pre-stating its type in a function? See below for what I mean:
function foo_function(first_scalar_arg::Float32, multiple_array_arg::Array{Array{Float32,1}})
# do stuff
return
end
Instead of ::Array{Array{Float32,1}}, would I write ::Array{Array{Any,1}} or something?
I think that the following code matches better what was asked in the question:
julia> a = Union{Array{Int},Array{Float64}}[[1,2,3],rand(2,2)]
2-element Array{Union{Array{Float64,N} where N, Array{Int64,N} where N},1}:
[1, 2, 3]
[0.834902264215698 0.42258382777543124; 0.5856562680004389 0.6654033155981287]
This creates an actual data structure which knows that it contains either Float64 or Int arrays.
Some usage
julia> a[1]
3-element Array{Int64,1}:
1
2
3
julia> a[2]
2×2 Array{Float64,2}:
0.834902 0.422584
0.585656 0.665403
And manipulating the structure:
julia> push!(a, [1, 1]); #works
julia> push!(a, [true, false]);
ERROR: MethodError: no method matching Union{Array{Float64,N} where N, Array{Int64,N} where N}(::Array{Bool,1})
How to instantiate a vector of different types:
If you type the vector in a terminal, it will be promoted to the largest common type:
julia> [[1], [1.0]]
2-element Array{Array{Float64,1},1}:
[1.0]
[1.0]
The reason for that is that you don't specify the type of the outer vector, so Julia will try to infer the type based on the contents. More specific types are always more efficient, so if the vector types can be converted to a single type that can represent all the inner vectors, this will be done (through the promote mechanism). To avoid it, you need to manually specify the outer vector type e.g.:
julia> Any[[1], [1.0]]
2-element Array{Any,1}:
[1]
[1.0]
How to refer to vectors of differently-typed vectors
When you think about it, "vectors of differently-typed vectors" is not a single type, but an infinite set of types. These kind of types are called "unionall types" in Julia, and are represented by the where keyword. In this case, you want Vector{T} where T <: Vector.
But wait! Then how come:
julia> Any[[1], [1.0]] isa Vector{T} where T <: Vector
false
Well, a vector that can contain any element is not really a vector of vectors. So here you have two options:
Either relax your function signature by just removing the type annotations or relatixing them significantly (this is preferred because the value you pass in may actually be a vector of vectors even if its type is e.g. Vector{Any}):
function foo_function(first_scalar_arg, multiple_array_arg::AbstractArray)
# do stuff
return
end
Or else be vigilant that you make sure to construct a "vector of vectors" initially:
julia> Vector[[1], [1.0]]
2-element Array{Array{T,1} where T,1}:
[1]
[1.0]
julia> Vector[[1], [1.0]] isa Vector{T} where T <: Vector
true
To expand a little on #Przemyslaw Szufel's answer...
Creating vectors with elements of mixed types is tricky, as you've seen, since the literal array constructor attempts to promote the elements to a common type. There is a special syntax to get around that, which is described in the manual here.
In your case, you can construct your vector of vectors as follows:
julia> Union{Vector{Int64}, Vector{Float32}}[[1, 2], [1.0f0, 2.0f0]]
2-element Array{Union{Array{Float32,1}, Array{Int64,1}},1}:
[1, 2]
Float32[1.0, 2.0]
The prefix to the literal array constructor specifies the element type of the array. So in this case, the element type of the vector is constrained to be
Union{Vector{Int64}, Vector{Float32}}
In other words, the elements of the outer vector must be either vectors of Int64 or vectors of Float32.

Julia: rational behind array size and index for "extra" dimensions?

I am using Julia from time to time, however I am surprised by the following behavior:
Let's define an 3x4 array
julia> m=rand(3,4)
3×4 Array{Float64,2}:
0.889018 0.500847 0.539856 0.828231
0.492425 0.582958 0.521406 0.754102
0.28227 0.834333 0.669967 0.0939701
Now I check that
julia> size(m,1), size(m,2)
(3, 4)
as expected.
However, I am surprised by this:
julia> size(m,3), size(m,2018)
(1, 1)
-> I would have expected (0,0) or an error message
Looking the Julia code confirms this behavior:
size(t::AbstractArray{T,N}, d) where {T,N} = d <= N ? size(t)[d] : 1
Moreover:
julia> m[2,1,1,1,1]
0.4924252391289974
-> I would have expected an out of bounds error
So my question is: "what is the rationale?"
( I do not thing it is a bug, I use Julia version 0.6.2)
I believe it's for broadcasting.
julia> m=rand(3,4)
3×4 Array{Float64,2}:
0.139323 0.663912 0.994985 0.517332
0.423913 0.121753 0.0327054 0.0754665
0.392672 0.47006 0.351121 0.787318
julia> size(m)
(3, 4)
julia> n = rand(3)
3-element Array{Float64,1}:
0.716752
0.98755
0.661226
julia> m .* n
3×4 Array{Float64,2}:
0.09986 0.475861 0.713157 0.370799
0.418636 0.120237 0.0322983 0.074527
0.259645 0.310816 0.23217 0.520595
Notice that n is of one dimension less, so it's size 1 in the 2nd dimension and thus applies column-wise. Scalars in broadcast are treated differently and are generally inlined into the fused broadcasting function which you cannot do with a mutable type, so the size 1 = expand in higher dimensions rule for broadcast is a nice way to implement this.

Array of anonymous functions working in julia 0.4, not in 0.5.1

I'm porting some code from Julia 0.4.7 to 0.5.1. I've noticed that there is something not compatible related to the array of anonymous functions. The code is here:
f = x::Array{Function} -> size(x)
# Option 1
f([k -> k+1, k-> k+1]) # This works in 0.4 & 0.5
# Option 2
f(repmat([k -> k+1], 2)) # This only works in 0.4
As far as I can see, the difference is although in 0.4 the anonymous array is still internally seen as Array{Function, 1}, in 0.5 it's seen like Array{#11#12, 1} (the numbers may change), so then it raises a MethodError thus they don't match.
Although the example is stupid it shows what I really need: to replicate an anonymous function a variable number of times.
Thanks!
In Julia 0.5+, Function becomes an abstract type, so Array{Function} is a parametric type which is invariant.
julia> typeof(x -> 2x)
##1#2
julia> typeof(x -> 2x) <: Function
true
julia> typeof([x -> 2x]) <: Array{Function}
false
As a result, the correct way to define f is:
f{T<:Function}(x::Array{T}) = size(x)
julia> f(repmat([k -> k+1], 2))
(2,)

Resources