Julia: Converting Vector of Arrays to Array for Arbitrary Dimensions - arrays

Using timing tests, I found that it's much more performant to grow Vector{Array{Float64}} objects using push! than it is to simply use an Array{Float64} object and either hcat or vcat. However, after the computation is completed, I need to change the resulting object to an Array{Float64} for further analysis. Is there a way that works regardless of the dimensions? For example, if I generate the Vector of Arrays via
u = [1 2 3 4
1 3 3 4
1 5 6 3
5 2 3 1]
uFull = Vector{Array{Int}}(0)
push!(uFull,u)
for i = 1:10000
push!(uFull,u)
end
I can do the conversion like this:
fill = Array{Int}(size(uFull)...,size(u)...)
for i in eachindex(uFull)
fill[i,:,:] = uFull[i]
end
but notice this requires that I know the arrays are matrices (2-dimensional). If it's 3-dimensional, I would need another :, and so this doesn't work for arbitrary dimensions.
Note that I also need a form of the "inverse transform" (except first indexed by the last index of the full array) in arbitrary dimensions, and I currently have
filla = Vector{Array{Int}}(size(fill)[end])
for i in 1:size(fill)[end]
filla[i] = fill[:,:,i]'
end
I assume the method for the first conversion will likely solve the second as well.

This is the sort of thing that Julia's custom array infrastructure excels at. I think the simplest solution here is to actually make a special array type that does this transformation for you:
immutable StackedArray{T,N,A} <: AbstractArray{T,N}
data::A # A <: AbstractVector{<:AbstractArray{T,N-1}}
dims::NTuple{N,Int}
end
function StackedArray(vec::AbstractVector)
#assert all(size(vec[1]) == size(v) for v in vec)
StackedArray(vec, (length(vec), size(vec[1])...))
end
StackedArray{T, N}(vec::AbstractVector{T}, dims::NTuple{N}) = StackedArray{eltype(T),N,typeof(vec)}(vec, dims)
Base.size(S::StackedArray) = S.dims
#inline function Base.getindex{T,N}(S::StackedArray{T,N}, I::Vararg{Int,N})
#boundscheck checkbounds(S, I...)
S.data[I[1]][Base.tail(I)...]
end
Now just wrap your vector in a StackedArray and it'll behave like an N+1 dimensional array. This could be expanded and made more featureful (it could similarly support setindex! or even push!ing arrays to concatenate natively), but I think that it's sufficient to solve your problem. By simply wrapping uFull in a StackedArray you get an object that acts like an Array{T, N+1}. Make a copy, and you get exactly a dense Array{T, N+1} without ever needing to write a for loop yourself.
julia> S = StackedArray(uFull)
10001x4x4 StackedArray{Int64,3,Array{Array{Int64,2},1}}:
[:, :, 1] =
1 1 1 5
1 1 1 5
1 1 1 5
…
julia> squeeze(S[1:1, :, :], 1) == u
true
julia> copy(S) # returns a dense Array{T,N}
10001x4x4 Array{Int64,3}:
[:, :, 1] =
1 1 1 5
1 1 1 5
…
Finally, I'll just note that there's another solution here: you could introduce the custom array type sooner, and make a GrowableArray that internally stores its elements as a linear Vector{T}, but allows pushing entire columns or arrays directly.

Matt B.'s answer is great, because it "simulates" an array without actually having to create or store it. When you can use this solution, it's likely to be your best choice.
However, there might be circumstances where you need to create a concatenated array (e.g., if you're passing this to some C code which requires contiguous memory). In that case you can just call cat, which is generic (it can handle arbitrary dimensions).
For example:
u = [1 2 3 4
1 3 3 4
1 5 6 3
5 2 3 1]
uFull = Vector{typeof(u)}(0)
push!(uFull,u)
for i = 1:10000
push!(uFull,u)
end
ucat = cat(ndims(eltype(uFull))+1, uFull)
I took the liberty of making one important change to your code: uFull = Vector{typeof(u)}(0) because it ensures that the objects stored in the Vector container have concrete type. Array{Int} is actually an abstract type, because you'd need to specify the dimensionality too (Array{Int,2}).

Related

Is there any mechanism to auto squeeze in Matlab / Octave

For an nD array, it would be nice to be able to auto squeeze to remove singleton dimensions. Is there a way to do this that I don't know about? This would be especially useful for aggregate functions (e.g. sum, mean, etc) where you always expect a result with fewer dimensions.
Here's a simple example:
>> A = ones(3,3,3);
>> B = mean(A);
>> size(B)
ans =
1 3 3
>> squeeze(B)
ans =
1 1 1
1 1 1
1 1 1
It would be nice if Matlab/Octave would automatically do the squeezing for me. Or if there was a way to turn that option on (something similar to hold on for plots).
As far as I know, Matlab does not have that. And I don't think it would be a good idea. Consider a modified version of your example:
>> A = ones(3,1,1,3);
>> B = mean(A);
>> size(B)
ans =
1 1 1 3
What should "auto-squeeze" do here? Reduce B to size [1 1 3] or to [1 3]?
You could argue that it should remove the same dimension that mean has turned into a singleton. But then it would have to be done within the mean function, perhaps with an optional input argument. Once you get the function output, there is no information how it was obtained.
Or you could argue that it should remove all singleton dimensions, like squeeze (more or less) does. But then it would remove dimensions that were already singleton in the function input, which is probably unwanted.
If you ask me, having a second input in squeeze specifiyng which (singleton) dimensions to remove would be a nice addition (in the same vein as you can use mean(A, 1) to force the operation to be applied along the first dimension even if A happens to be a row vector).
I agree with Luis and Cris, but I would add the following.
Both Matlab and Octave do automatically squeeze extra dimensions, in a very particular scenario: any dimensions at the end that have been reduced to singletons, are automatically squeezed out.
E.g.
A = ones([1,2,3,4]);
B = mean(A, 4);
size(B)
% ans = 1 2 3
Note, how the answer is [1,2,3], and not [1,2,3,1]. This is in contrast to languages like python, for instance, where a size of (1,1) is very different to a size of (1,).
Therefore, with regard to your questions, one way to use this to your advantage could be to ensure that the dimension that is to be reduced is always found at the end, and thus automatically simplified.
This becomes even more useful when you realise that:
size(A(:)) % ans = 24 1 (i.e. 24)
size(A(:,:)) % ans = 1 24
size(A(:,:,:)) % ans = 1 2 12
size(A(:,:,:,:)) % ans = 1 2 3 4
Meaning, if you order your dimensions hierarchically you can ensure that any operations that need to take place over the higher dimensions, can a) be vectorised easily, and b) give a natural result, without the need to waste time squeezing or permuting the resulting dimensions.

Repeat array rows specified number of times

New to julia, so this is probably very easy.
I have an n-by-m array and a vector of length n and want to repeat each row of the array the number of times in the corresponding element of the vector. For example:
mat = rand(3,6)
v = vec([2 3 1])
The result should be a 6-by-6 array. I tried the repeat function but
repeat(mat, inner = v)
yields a 6×18×1 Array{Float64,3}: array instead so it takes v to be the dimensions along which to repeat the elements. In matlab I would use repelem(mat, v, 1) and I hope julia offers something similar. My actual matrix is a lot bigger and I will have to call the function many times, so this operation needs to be as fast as possible.
It has been discussed to add a similar thing to Julia Base, but currently it is not implemented yet AFAIK. You can achieve what you want using the inverse_rle function from StatsBase.jl:
julia> row_idx = inverse_rle(axes(v, 1), v)
6-element Array{Int64,1}:
1
1
2
2
2
3
and now you can write:
mat[row_idx, :]
or
#view mat[row_idx, :]
(the second option creates a view which might be relevant in your use case if you say that your mat is large and you need to do such indexing many times - which option is faster will depend on your exact use case).

Updating a static array, in a nested function without making a temporary array? <Julia>

I have been banging my head against a wall trying to use static arrays in julia.
https://github.com/JuliaArrays/StaticArrays.jl
They are fast but updating them is a pain. This is no surprise, they are meant to be immutable!
But it is continuously recommended to me that I use static arrays even though I have to update them. In my case, the static arrays are small, just length 3, and i have a vector of them, but I only update 1 length three SVector at a time.
Option 1
There is a really neat package called Setfield that allows you to do inplace updates of SVectors in Julia.
https://github.com/jw3126/Setfield.jl
The catch... it updates the local copy. So if you are in a nested function, it updates the local copy. So it comes with some book keeping since you have to inplace update the local copy and then return that copy and update the actual array of interest. You can't pass in your desired array and update it in place, at least, not that I can figure out! Now, I do not mind bookeeping, but I feel like updating a local copy, then returning the value, updating another local copy, and then returning the values and finally updating the actual array must come with a speed penalty. I could be wrong.
Option 2
It bugs me that in order to do an update a static array I must
exampleSVector::SVector{3,Float64} <-- just to make clear its type and size
exampleSVector = [value1, value2, value3]
This will update the desired array even if it is inside a function, which is nice and the goal, but if you do this inside a function it creates a temporary array. And this kills me because my function is in a loop that gets called 4+ million times, so this creates a ton of allocations and slows things down.
How do I update an SVector for the Option 2 scenario without creating a temporary array?
For the Option 1 scenario, can I update the actual array of interest rather than the local copy?
If this requires a simple example code, please say so in the comments, and I will make one. My thinking is that it is answerable without one, but I will make one if it is needed.
EDIT:
MCVE code - Option 1 works, option 2 does not.
using Setfield
using StaticArrays
struct Keep
dreaming::Vector{SVector{3,Float64}}
end
function INNER!(vec::SVector{3,Float64},pre::SVector{3,Float64})
# pretend series of calculations
for i = 1:3 # illustrate use of Setfield (used in real code for this)
pre = #set pre[i] = rand() * i * 1000
end
# more pretend calculations
x = 25.0 # assume more calculations equals x
################## OPTION 1 ########################
vec = #set vec = x * [ pre[1], pre[2], pre[3] ] # UNCOMMENT FOR FOR OPTION 1
return vec # UNCOMMENT FOR FOR OPTION 1
################## OPTION 2 ########################
#vec = x * [ pre[1], pre[2], pre[3] ] # UNCOMMENT FOR FOR OPTION 2
#nothing # UNCOMMENT FOR FOR OPTION 2
end
function OUTER!(always::Keep)
preAllocate = SVector{3}(0.0,0.0,0.0)
for i=1:length(always.dreaming)
always.dreaming[i] = INNER!(always.dreaming[i], preAllocate) # UNCOMMENT FOR FOR OPTION 1
#INNER!(always.dreaming[i], preAllocate) # UNCOMMENT FOR FOR OPTION 2
end
end
code = Keep([zero(SVector{3}) for i=1:5])
OUTER!(code)
println(code.dreaming)
I hope that I have understood your question correctly. It's a bit hard with a MWE like this, that does a lot of things that are mostly redundant and a bit confusing.
There seems to be two alternative interpretations here: Either you really need to update ('mutate') an SVector, but your MWE fails to demonstrate why. Or, you have convinced yourself that you need to mutate, but you actually don't.
I have decided to focus on alternative 2: You don't really need to 'mutate'. Rewriting your code from that point of view simplifies it greatly.
I couldn't find any reason for you to mutate any static vectors here, so I just removed that. The behaviour of the INNER! function with the inputs was very confusing. You provide two inputs but don't use either of them, so I removed those inputs.
function inner()
pre = #SVector [rand() * 1000i for i in 1:3]
x = 25
return pre .* x
end
function outer!(always::Keep)
always.dreaming .= inner.() # notice the dot in inner.()
end
code = Keep([zero(SVector{3}) for i in 1:5])
outer!(code)
display(code.dreaming)
This runs fast and with zero allocations. In general with StaticArrays, don't try to mutate things, just create new instances.
Even though it's not clear from your MWE, there may be some legitimate reason why you may want to 'mutate' an SVector. In that case you can use the setindex method of StaticArrays, you don't need Setfield.jl:
julia> v = rand(SVector{3})
3-element SArray{Tuple{3},Float64,1,3}:
0.4730258499237898
0.23658547518737905
0.9140206579322541
julia> v = setindex(v, -3.1, 2)
3-element SArray{Tuple{3},Float64,1,3}:
0.4730258499237898
-3.1
0.9140206579322541
To clarify: setindex (without a !) does not mutate its input, but creates a new instance with one index value changed.
If you really do need to 'mutate', perhaps you can make a new MWE that shows this. I would recommend that you try to simplify it a bit, because it is quite confusing now. For example, the inclusion of the type Keep seems entirely unnecessary and distracting. Just make a Vector of SVectors and show what you want to do with that.
Edit: Here's an attempt based on the comments below. As far as I understand it now, the question is about modifying a vector of SVectors. You cannot really mutate the SVectors, but you can replace them using a convenient syntax, setindex, where you can keep some of the elements and change some of the others:
oldvec = [zero(SVector{3}) for _ in 1:5]
replacevec = [rand(SVector{3}) for _ in 1:5]
Now we replace the second element of each element of oldvec with the corresponding one in replacevec. First a one-liner:
oldvec .= setindex.(oldvec, getindex.(replacevec, 2), 2)
Then an even faster one with a loop:
for i in eachindex(oldvec, replacevec)
#inbounds oldvec[i] = setindex(oldvec[i], replacevec[i][2], 2)
end
There are two types of static arrays - mutable (starting with M in type name) and immutable ones (starting with S) - just use the mutable ones! have a look at the example below:
julia> mut = MVector{3,Int64}(1:3);
julia> mut[1]=55
55
julia> mut
3-element MArray{Tuple{3},Int64,1,3}:
55
2
3
julia> immut = SVector{3,Int64}(1:3);
julia> inmut[1]=55
ERROR: setindex!(::SArray{Tuple{3},Int64,1,3}, value, ::Int) is not defined.
Let us see some simple benchmark (ordinary array, vs mutable static vs immutable static):
using BenchmarkTools
julia> ord = [1,2,3];
julia> #btime $ord.*$ord;
39.680 ns (1 allocation: 112 bytes)
3-element Array{Int64,1}:
1
4
9
julia> #btime $mut.*$mut
8.533 ns (1 allocation: 32 bytes)
3-element MArray{Tuple{3},Int64,1,3}:
3025
4
9
julia> #btime $immut.*$immut
2.133 ns (0 allocations: 0 bytes)
3-element SArray{Tuple{3},Int64,1,3}:
1
4
9

Is there a way to quickly extract the parts from a vector without looping?

Consider that I have a vector/array such that it looks as follows:
each part is a sub array of some size fixed and known size (that can only be accessed through indexing, i.e. its not a tensor nor a higher order array). So for example:
x1 = x(1:d);
if d is the size of each sub array. The size of each sub array is the same but it might vary depending on the current x we are considering. However, we do know n (the number of sub arrays) and d (the size of all of the sub arrays).
I know there is usually really strange but useful tricks in matlab to do things more optimized. Is there a way to extract those using maybe indexing and and make a matrix where the rows (or columns) are those parts? as in:
X = [x_1, ..., x_n]
the caveat is that n is a variable and we don't know aprior what it is. We can find what n is, but its not fixed.
I want to minimize the amount of for loops I actually write in matlab to hope its faster...just to add some more context.
First I would consider simple reshaping to keep the output as a simple double matrix
x = (1:15).' %'
d = 3;
out = reshape(x,d,[])
and further on just use indexing to access the columns out(:,idx);
There is no need to know n in advance, as reshape is calculating it based on d and the number of elements in x.
out =
1 4 7 10 13
2 5 8 11 14
3 6 9 12 15
If you'd insist on something like cell arrays, use accumarray with ceil to get the subs:
out = accumarray( ceil( (1:numel(x))/d ).', x(:), [], #(x) {x})

What is the purpose of "post" in Matlab's padarray function?

In Matlab, there is a function called padarray. I didn't understand the "post" value of the function. Can you describe it in terms of an example?
So I will try to be clear:
A = magic(2);
A =
1 3
4 2
B = padarray(A,[0 1],'circular','post')
B =
1 3 1
4 2 4
post only pads after the last array element along each dimension: in this particular case, only along dimension 2 because of [0 1] as second input in padarray.
P.S.: MATLAB user manual is usually quite plenty of examples about any built-in function.

Resources