Fast indexing of arrays - arrays

What is the most efficient way to access (and perhaps replace) an entry in a large multidimensional array? I am using something like this inside a loop:
tup = (16,45,6,40,3)
A[tup...] = 100
but I'm wondering if there is a more efficient way. In particular, is there a way I can avoid using ...?

There's not always a penalty involved with splatting, but determining where it is efficient isn't always obvious (or easy). Your trivial example is actually just as efficient as writing A[16,45,6,40,3] = 100. You can see this by comparing
function f(A)
tup = (16,45,6,40,3)
A[tup...] = 100
A
end
function g(A)
A[16,45,6,40,3] = 100
A
end
julia> code_llvm(f, Tuple{Array{Int, 5}})
# Lots of output (bounds checks).
julia> code_llvm(g, Tuple{Array{Int, 5}})
# Identical to above
If there was a splatting penalty, you'd see it in the form of allocations. You can test for this with the #allocated macro or by simply inspecting code_llvm for a reference to #jl_pgcstack — that's the garbage collector, which is required any time there's an allocation. Note that there is very likely other things in a more complicated function that will also cause allocations, so it's presence doesn't necessarily mean that there's a splatting pessimization. But if this is in a hot loop, you want to minimize all allocations, so it's a great target… even if your problem isn't due to splatting. You should also be using #code_warntype, as poorly typed code will definitely pessimize splats and many other operations. Here's what will happen if your tuple isn't well typed:
function h(A)
tup = ntuple(x->x+1, 5) # type inference doesn't know the type or size of this tuple
A[tup...] = 100
A
end
julia> code_warntype(h, Tuple{Array{Int,5}})
# Lots of red flags
So optimizing this splat will be highly dependent upon how you construct or obtain tup.

To iterate over multidimensional arrays, it is recommended to do for index in eachindex(A); see e.g.
https://groups.google.com/forum/#!msg/julia-users/CF_Iphgt2Wo/V-b31-6oxSkJ
If A is a standard array, then this corresponds to just indexing using a single integer, which is the fastest way to access your array (your original question):
A = rand(3, 3)
for i in eachindex(A)
println(i)
end
However, if A is a more complicated object, e.g. a subarray, then eachindex(A) will give you a different, efficient, access object:
julia> for i in eachindex(slice(A, 1:3, 2:3))
println(i)
end
gives
CartesianIndex{2}((1,1))
CartesianIndex{2}((2,1))
etc.

The fastest way to index a multidimensional array is to index it linearly.
Base.Base.linearindexing is a related function from Base module to find most efficiently accesses way to elements of an array.
julia> a=rand(1:10...);
julia> Base.Base.linearindexing(a)
Base.LinearFast()
one could use ii=sub2ind(size(A),tup...) syntax to convert a tuple of index to one linear index or for i in eachindex(A) to traverse it.

Related

Why [1:2] != Array[1:2]

I am learning Julia following the Wikibook, but I don't understand why the following two commands give different results:
julia> [1:2]
1-element Array{UnitRange{Int64},1}:
1:2
julia> Array[1:2]
1-element Array{Array,1}:
[1,2]
Apologies if there is an explanation I haven't seen in the Wikibook, I have looked briefly but didn't find one.
Type[a] runs convert on the elements, and there is a simple conversion between a Range to an Array (collect). So Array[1:2] converts 1:2 to an array, and then makes an array of objects like that. This is the same thing as why Float64[1;2;3] is an array of Float64.
These previous parts answer answered the wrong thing. Oops...
a:b is not an array, it's a UnitRange. Why would you create an array for A = a:b? It only takes two numbers to store it, and you can calculate A[i] basically for free for any i. Using an array would take an amount of memory which is proportional to the b-a, and thus for larger arrays would take a lot of time to allocate, whereas allocation for UnitRange is essentially free.
These kinds of types in Julia are known as lazy iterators. LinSpace is another. Another interesting set of types are the special matrix types: why use more than an array to store a Diagonal? The UniformScaling operator acts as the identity matrix while only storing one value (it's scale) to make A-kI efficient.
Since Julia has a robust type system, there is no reason to make all of these things arrays. Instead, you can make them a specialized type which will act (*, +, etc.) and index like an array, but actually aren't. This will make them take less memory and be faster. If you ever need the array, just call collect(A) or full(A).
I realized that you posted something a little more specific. The reason here is that Array[1:2] calls the getindex function for an array. This getindex function has a special dispatch on a Range so that way it "acts like it's indexed by an array" (see the discussion from earlier). So that's "special-cased", but in actuality it just has dispatches to act like an array just like it does with every other function. [A] gives an array of typeof(A) no matter what A is, so there's no magic here.

MATLAB sum over all elements of array valued expression

So I've been wondering about this for a while now. Summing up over some array variable A is as easy as
sum(A(:))
% or
sum(...sum(sum(A,n),n-2)...,1) % where n is the dimension of A
However once it gets to expressions the (:) doesn't work anymore, like
sum((A-2*A)(:))
is no valid matlab syntax, instead we need to write
foo = A-2*A;
sum(foo(:))
%or the one liner
sum(sum(...sum(A-2*A,n)...,2),1) % n is the dimension of A
The one liner above will only work, if the dimension of A is fixed which, depending on what you are doing, may not necessary be the case. The downside of the two lines is, that foo will be kept in memory until you run clear foo or may not even be possible depending on the size of A and what else is in your workspace.
Is there a general way to circumvent this issue and sum up all elements of an array valued expression in a single line / without creating temporal variables? Something like sum(A-2*A,'-all')?
Edit: It differes from How can I index a MATLAB array returned by a function without first assigning it to a local variable?, as it doesn't concern general (nor specific) indexing of array valued expressions or return values, but rather the summation over each possible index.
While it is possible to solve my problem with the answer given in the link, gnovice says himself that using subref is a rather ugly solution. Further Andras Deak posted a much cleaner way of doing this in the comments below.
While the answers to the linked duplicate can indeed be applied to your problem, the narrower scope of your question allows us to give a much simpler solution than the answers provided there.
You can sum all the elements in an expression (including the return value of a function) by reshaping your array first to 1d:
sum(reshape(A-2*A,1,[]))
%or even sum(reshape(magic(3),1,[]))
This will reshape your array-valued expression to size [1, N] where N is inferred from the size of the array, i.e. numel(A-2*A) (but the above syntax of reshape will compute the missing dimension for you, no need to evaluate your expression twice). Then a single call to sum will sum all the elements, as needed.
The actual case where you have to resort to something like this is when a function returns an array with an unknown number of dimensions, and you want to use its sum in an anonymous function (making temporary variables unavailable):
fun = #() rand(2*ones(1,randi(10))); %function returning random 2 x 2 x ... x 2 array with randi(10) dimensions
sumfun = #(A) sum(reshape(A,1,[]));
sumfun(fun()) %use it

Array ordering in Julia

Is there a way to work with C-ordered or non-contiguous arrays natively in Julia?
For example, when using NumPy, C-ordered arrays are the default, but I can initialize a Fortran ordered array and do computations with that as well.
One easy way to do this was to take the Transpose of a matrix.
I can also work with non-contiguous arrays that are made via slicing.
I have looked through the documentation, etc. and can't find a way to make, declare, or work with a C-ordered array in Julia.
The transpose appears to return a copy.
Does Julia allow a user to work with C-ordered and non-contiguous arrays?
Is there currently any way to get a transpose or a slice without taking a copy?
Edit: I have found how to do slicing.
Currently it is available as a different type called a SubArray.
As an example, I could do the following to get the first row of a 100x100 array A
sub(A, 1, 1:100)
It looks like there are plans to improve this, as can be seen in https://github.com/JuliaLang/julia/issues/5513
This still leaves open the question of C-ordered arrays.
Is there an interface for C-ordered arrays?
Is there a way to do a transpose via a view instead of a copy?
Naturally, there's nothing that prevents you from working with row-major arrays as a chunk of memory, and certain packages (like Images.jl) support arbitrary ordering of arbitrary-dimensional arrays.
Presumably the main issue you're wondering about is linear algebra. Currently I don't know of anything out-of-the-box, but note that matrix multiplication in Julia is implemented through a series of functions with names like A_mul_B, At_mul_B, Ac_mul_Bc, etc, where t means transpose and c means conjugate. The parser replaces expressions like A'*b with Ac_mul_B(A, b) without actually taking the transpose.
Consequently, you could implement a RowMajorMatrix <: AbstractArray type yourself, and set up special multiplication rules:
A_mul_B(A::RowMajorMatrix, B::RowMajorMatrix) = At_mul_Bt(A, B)
A_mul_B(A::RowMajorMatrix, B::AbstractArray) = At_mul_B(A, B)
A_mul_B(A::AbstractArray, B::RowMajorMatrix) = A_mul_Bt(A, B)
etc. In addition to these two-argument versions, there are 3-argument versions (like A_mul_B!) that store the result in a pre-allocated output; you'd need to implement those, too. Finally, you'd also have to set up appropriate show methods (to display them appropriately), size methods, etc.
Finally, Julia's transpose function has been implemented in a cache-friendly manner, so it's quite a lot faster than the naive
for j = 1:n, i = 1:m
At[j,i] = A[i,j]
end
Consequently there are occasions where it's not worth worrying about creating custom implementations of algorithms, and you can just call transpose.
If you implement something like this, I'd encourage you to contribute it as a package, as it's likely that others may be interested.

Making a for-loop in Matlab faster by using arrayfun?

currently I have the following portion of code:
for i = 2:N-1
res(i) = k(i)/m(i)*x(i-1) -(c(i)+c(i+1))/m(i)*x(N+i) +e(i+1)/m(i)*x(i+1);
end
where as the variables k, m, c and e are vectors of size N and x is a vector of size 2*N. Is there any way to do this a lot faster using something like arrayfun!? I couldn't figure this out :( I especially want to make it faster by running on the GPU later and thus, arrayfun would be also helpfull since matlab doesn't support parallelizing for-loops and I don't want to buy the jacket package...
Thanks a lot!
You don't have to use arrayfun. It works if use use some smart indexing:
clear all
N=20;
k=rand(N,1);
m=rand(N,1);
c=rand(N,1);
e=rand(N,1);
x=rand(2*N,1);
% for-based implementation
%Watch out, you are not filling the first element of forres!
forres=zeros(N-1,1); %Initialize array first to gain some speed.
for i = 2:N-1
forres(i) = k(i)/m(i)*x(i-1) -(c(i)+c(i+1))/m(i)*x(N+i) +e(i+1)/m(i)*x(i+1);
end
%vectorized implementation
parres=k(2:N-1)./m(2:N-1).*x(1:N-2) -(c(2:N-1)+c(3:N))./m(2:N-1).*x(N+2:2*N-1) +e(3:N)./m(2:N-1).*x(3:N);
%compare results; strip the first element from forres
difference=forres(2:end)-parres %#ok<NOPTS>
Firstly, MATLAB does support parallel for loops via PARFOR. However, that doesn't have much chance of speeding up this sort of computation since the amount of computation is small compared to the amount of data you're reading and writing.
To restructure things for GPUArray "arrayfun", you need to make all the array references in the loop body refer to the loop iterate, and have the loop run across the full range. You should be able to do this by offsetting some of the arrays, and padding with dummy values. For example, you could prepend all your arrays with NaN, and replace x(i-1) with a new variable x_1 = [x(2:N) NaN]

How can I efficiently copy 2-dimensional arrays of bytes into a larger 2D array?

I have a structure called Patch that represents a 2D array of data.
newtype Size = (Int, Int)
data Patch = Patch Size Strict.ByteString
I want to construct a larger Patch from a set of smaller Patches and their assigned positions. (The Patches do not overlap.) The function looks like this:
newtype Position = (Int, Int)
combinePatches :: [(Position, Patch)] -> Patch
combinePatches plan = undefined
I see two sub-problems. First, I must define a function to translate 2D array copies into a set of 1D array copies. Second, I must construct the final Patch from all those copies.
Note that the final Patch will be around 4 MB of data. This is why I want to avoid a naive approach.
I'm fairly confident that I could do this horribly inefficiently, but I would like some advice on how to efficiently manipulate large 2D arrays in Haskell. I have been looking at the "vector" library, but I have never used it before.
Thanks for your time.
If the spec is really just a one-time creation of a new Patch from a set of previous ones and their positions, then this is a straightforward single-pass algorithm. Conceptually, I'd think of it as two steps -- first, combine the existing patches into a data structure with reasonable lookup for any give position. Next, write your new structure lazily by querying the compound structure. This should be roughly O(n log(m)) -- n being the size of the new array you're writing, and m being the number of patches.
This is conceptually much simpler if you use the Vector library instead of a raw ByteString. But it is simpler still if you simply use Data.Array.Unboxed. If you need arrays that can interop with C, then use Data.Array.Storable instead.
If you ditch purity, at least locally, and work with an ST array, you should be able to trivially do this in O(n) time. Of course, the constant factors will still be worse than using fast copying of chunks of memory at a time, but there's no way to keep that code from looking low-level.

Resources