How to concatenate array of tuples that contains arrays in Julia - arrays

Let's say I have an array x like this:
x = [(i*ones(4,4,3),rand(11),rand(1:10)) for i=1:5];
Now, I want to concatenate them from the last dimension. I mean, at the end of operation, I want to have 3 arrays. The size of the first array need to be (4,4,3,5) [concatenation of 5 ones(4,4,3) array)], the second one (11,5) and the last one is (1,5) size.How can I do it in julia ?
EDIT
Of course, I can do it like below, but I want to hear if there is a clever way(in terms of memory consumption and speed performance):
julia> i=[ t[1] for t in x];
julia> q=[ t[2] for t in x];
julia> l=[ t[3] for t in x];
julia> (cat(4,i...),cat(2,q...),reshape(l,1,length(l))

Another way could be:
ntuple(s->reshape(
[x[i][s][j] for j in eachindex(first(x)[s]), i=1:length(x)],
size(first(x)[s])..., length(x)
), length(first(x)))
which saves a bit of time & memory (depending on the sizes/shapes in x) but the longer solution in the question should be OK. BTW, because this version works for different shapes and lengths of x (unlike the version in the question) it looks a bit more cryptic.

Related

Repeat array rows specified number of times

New to julia, so this is probably very easy.
I have an n-by-m array and a vector of length n and want to repeat each row of the array the number of times in the corresponding element of the vector. For example:
mat = rand(3,6)
v = vec([2 3 1])
The result should be a 6-by-6 array. I tried the repeat function but
repeat(mat, inner = v)
yields a 6×18×1 Array{Float64,3}: array instead so it takes v to be the dimensions along which to repeat the elements. In matlab I would use repelem(mat, v, 1) and I hope julia offers something similar. My actual matrix is a lot bigger and I will have to call the function many times, so this operation needs to be as fast as possible.
It has been discussed to add a similar thing to Julia Base, but currently it is not implemented yet AFAIK. You can achieve what you want using the inverse_rle function from StatsBase.jl:
julia> row_idx = inverse_rle(axes(v, 1), v)
6-element Array{Int64,1}:
1
1
2
2
2
3
and now you can write:
mat[row_idx, :]
or
#view mat[row_idx, :]
(the second option creates a view which might be relevant in your use case if you say that your mat is large and you need to do such indexing many times - which option is faster will depend on your exact use case).

Julia: three dimensional arrays (performance)

Going thought the Julia's performance tips I haven't found any suggestions regarding how to speed up a code with three dimensional arrays.
From my understanding d-element Array{Array{Float64,2},1} would perform best when d (the third dimension) is small. However, I am not sure whether this is the case when d is large.
Is there any tutorial on this topic for Julia?
Example 1a (d=50)
x = [zeros(100, 10) for d=1:50];
#time for d=1:50
x[d] = rand(100,10);
end
0.000100 seconds (50 allocations: 396.875 KB)
Example 1b (d=50)
y=zeros(100, 10, 50);
#time for d=1:50
y[:,:,d] = rand(100,10);
end
0.000257 seconds (200 allocations: 400.781 KB)
Example 2a (d=50000)
x = [zeros(100, 10) for d=1:50000];
#time for d=1:50000
x[d] = rand(100,10);
end
0.410813 seconds (99.49 k allocations: 388.328 MB, 81.88% gc time)
Example 2b (d=50000)
y=zeros(100, 10, 50000);
#time for d=1:50000
y[:,:,d] = rand(100,10);
end
0.185929 seconds (298.98 k allocations: 392.898 MB, 6.83% gc time)
From my understanding d-element Array{Array{Float64,2},1} would perform best when d (the third dimension) is small. However, I am not sure whether this is the case when d is large.
No, it's moreso how you use it. A = Array{Array{Float64,2},1} is an array of pointers to matrices. The value of an array is the pointer or the reference. Thus A[i] returns a reference, i.e. it's cheap. A2 = Array{Float64,3} is a contiguous array of floats. It's really just an indexing setup over a linear slab of memory (and has a linear index A2[i] which runs through the whole thing using that linear form).
The latter has some advantages because it is contiguous. There's no indirection, so looping over all of A2s values will be faster. A has to deference two pointers to get a value, so a simple 3D loop will be slower if you don't know to deference each internal matrix only once. Also, you can get views to the matrices via #view A2[:,:,1] etc., but you have to take note that A2[:,:,1] by itself will make a copy of the matrix. A[1] is natural a view because it returns the reference to the matirx, and if you want to copy you'd have to explicitly do copy(A[1]). Because A is just a linear array of pointers, push!ing a new matrix onto it is cheap since it's just increasing a relatively small array (and push! is automatically amortized) to add a new pointer on the end (this is why things like DifferentialEqautions.jl use arrays of arrays to build timeseries instead of the more traditional matrix).
So they are different tools with different advantages and disadvantages.
As for your timings, you're doing two different things. x[d] = rand(100,10) is creating a new matrix and adding its reference to x. y[:,:,d] = rand(100,10) is creating a new matrix and looping through the values of y to change the values of y. You can see why that's slower. But what you're leaving out is the allocation-free cases.
function f2()
y=zeros(100, 10, 50);
#time for i in eachindex(y)
y[i] = rand()
end
y
end
In the small case this matches the array creation. You can't naively do this on case one, but as I said, if you dereference the pointer for the matrix once you do really well:
function f()
x = [zeros(100, 10) for d=1:5000];
#time #inbounds for d=1:50
xd = x[d]
for i in eachindex(xd)
xd[i] = rand()
end
end
x
end
So arrays of arrays can be great data structures in the right cases. The library RecursiveArrayTools.jl was created to take better advantage of it. For example, A3 = VectorOfArrays(A) gives A3 the same indexing structure as A2 by lazily transforming A[i,j,k] to A[k][i,j]. However, it keeps the advantages of A, but will automatically make sure to broadcast in the correct way like f. Another tool like this is the ArrayPartition which allows for heterogeneous typing in a broadcast-performant way.
So yeah, it's not always the right tool, but these heterogeneous and recursive arrays are great tools when used correctly.

Converting Array{Array{Float64,1},1} to Array{Float64,2} in Julia

My problem is similar to the problem described earlier,
with the difference that I don't input numbers manually. Thus the accepted answer there does not work for me.
I want to convert the vector of cartesian coordinates to polars:
function cart2pol(x0,
x1)
rho = sqrt(x0^2 + x1^2)
phi = atan2(x1, x0)
return [rho, phi]
end
#vectorize_2arg Number cart2pol
function cart2pol(x)
x1 = view(x,:,1)
x2 = view(x,:,2)
return cart2pol(x1, x2)
end
x = rand(5,2)
vcat(cart2pol(x))
The last command does not collect Arrays for some reason, returning the output of type 5-element Array{Array{Float64,1},1}. Any idea how to cast it to Array{Float64,2}?
If you look at the definition of cat (which is the underlying function for hcat and vcat), you see that you can collect several arrays into one single array of dimension 2:
cat(2, [1,2], [3,4], [5,6])
2×3 Array{Int64,2}:
1 3 5
2 4 6
This is basically what you want. The problem is that you have all your output polar points in an array itself. cat expects you to provide them as several arguments. This is where ... comes in.
... used to cause a single function argument to be split apart into many different arguments when used in the context of a function call.
Therefore, you can write
cat(2, [[1,2], [3,4], [5,6]]...)
2×3 Array{Int64,2}:
1 3 5
2 4 6
In your situation, it works exactly in the same way (I changed your x to have the points in columns):
x=rand(2,5)
cat(2, cart2pol.(view(x,1,:),view(x,2,:))...)
2×5 Array{Float64,2}:
0.587301 0.622 0.928159 0.579749 0.227605
1.30672 1.52956 0.352177 0.710973 0.909746
The function mapslices can also do this, essentially transforming the rows of the input:
julia> x = rand(5,2)
5×2 Array{Float64,2}:
0.458583 0.205246
0.285189 0.992547
0.947025 0.0853141
0.79599 0.67265
0.0273176 0.381066
julia> mapslices(row->cart2pol(row[1],row[2]), x, [2])
5×2 Array{Float64,2}:
0.502419 0.420827
1.03271 1.291
0.95086 0.0898439
1.04214 0.701612
0.382044 1.49923
The last argument specifies dimensions to operate over; e.g. passing [1] would transform columns.
As an aside, I would encourage one or two stylistic changes. First, it's good to map like to like, so if we stick with the row representation then cart2pol should accept a 2-element array (since that's what it returns). Then this call would just be mapslices(cart2pol, x, [2]). Or, if what we're really trying to represent is an array of coordinates, then the data could be an array of tuples [(x1,y1), (x2,y2), ...], and cart2pol could accept and return a tuple. In either case cart2pol would not need to be able to operate on arrays, and it's partly for this reason that we've deprecated the #vectorize_ macros.

Is there a way to quickly extract the parts from a vector without looping?

Consider that I have a vector/array such that it looks as follows:
each part is a sub array of some size fixed and known size (that can only be accessed through indexing, i.e. its not a tensor nor a higher order array). So for example:
x1 = x(1:d);
if d is the size of each sub array. The size of each sub array is the same but it might vary depending on the current x we are considering. However, we do know n (the number of sub arrays) and d (the size of all of the sub arrays).
I know there is usually really strange but useful tricks in matlab to do things more optimized. Is there a way to extract those using maybe indexing and and make a matrix where the rows (or columns) are those parts? as in:
X = [x_1, ..., x_n]
the caveat is that n is a variable and we don't know aprior what it is. We can find what n is, but its not fixed.
I want to minimize the amount of for loops I actually write in matlab to hope its faster...just to add some more context.
First I would consider simple reshaping to keep the output as a simple double matrix
x = (1:15).' %'
d = 3;
out = reshape(x,d,[])
and further on just use indexing to access the columns out(:,idx);
There is no need to know n in advance, as reshape is calculating it based on d and the number of elements in x.
out =
1 4 7 10 13
2 5 8 11 14
3 6 9 12 15
If you'd insist on something like cell arrays, use accumarray with ceil to get the subs:
out = accumarray( ceil( (1:numel(x))/d ).', x(:), [], #(x) {x})

Converting 2D cell of 2D matrices (consistent sizes) into 4D matlab double

Searching around here one finds many questions how one can convert cell arrays of doubles into one big matrix.
In my application I have a two dimensional cell array (lets call it celldata of size m times n) of all same sized double matrices (lets say of size a times b).
I want to convert that data structure into one bit 4D double (m times n times a times b).
At the moment I do that by
reshape(cat(3,celldata{:}),m,n,a,b)
but maybe there are other methods doing that directly? Maybe with a call like
cat([3 4],celldata{:,:})
or similar.
I think
cell2mat(permute(celldata, [3 4 1 2]))
will do the trick. However,
%// create some bogus data
m = 1.1e2;
n = 1.2e2;
a = 1.3e2;
b = 1.4e2;
celldata = cellfun(#(~) randi(10, a,b, 'uint8'), cell(m,n), 'UniformOutput', false);
%// new method
tic
cell2mat(permute(celldata, [3 4 1 2]));
toc
%// your current method
tic
reshape(cat(3,celldata{:}),m,n,a,b);
toc
Results:
Elapsed time is 1.745495 seconds. % cell2mat/permute
Elapsed time is 0.305368 seconds. % reshape/cat
cell2mat is a matlab m-file (with necessary inefficiencies in the loop due to compatibility issues), while reshape and cat are built-ins. This is where that difference comes from.
I'd stick with your current method :)
Now, I'm asking you why you'd want to do this convesion in the first place. Is it an indexing problem? Because
celldata{x,y}(w,z)
prevents you from having to do the conversion, so you can index like
converted_celldata(x,y,w,z)
I don't see other reasons, because matrix/vector operations don't work anyway on 4D arrays...

Resources