Irregular array subsetting in R

Irregular array subsetting in R - arrays

Let's say I have the array
TestArray=array(1:(3*3*4),c(3,3,4))
In the following I will refer to TestArray[i,,], TestArray[,j,] and TestArray[,,k] as the x=i, y=j and z=k subsets, respectively. In this specific example, the indices i and j can go from 1 to 3 and k from 1 to 4.
Now, I want to subset this 3 dimensional array so that I get the x=y subset. The output should be
do.call("cbind",
list(TestArray[1,1,,drop=FALSE],
TestArray[2,2,,drop=FALSE],
TestArray[3,3,,drop=FALSE]
)
)
I have (naively) thought that such an operation should be possible by
library(Matrix)
TestArray[as.array(Diagonal(3,TRUE)),]
This works in 2 dimensions
matrix(1:9,3,3)[as.matrix(Diagonal(3,TRUE))]
However, in 3 dimensions it gives an error.
I know that I could produce an index array
IndexArray=outer(diag(1,3,3),c(1,1,1,1),"*")
mode(IndexArray)="logical"
and access the elements by
matrix(TestArray[IndexArray],nrow=4,ncol=3,byrow=TRUE)
But the first method would be much nicer and would need less memory as well. Do you know how I could fix TestArray[as.array(Diagonal(3,TRUE)),] so that it works as desired? Maybe I am just missing some syntactic sugar...

I don't know if abind::asub will do what I (you) want. This uses a more efficient form of matrix indexing than you have above, but I still have to coerce the results into the right shape ...
indmat <- cbind(1:3,as.matrix(expand.grid(1:3,1:4)))
matrix(TestArray[indmat],nrow=4,ncol=3,byrow=TRUE)
Slightly more generally:
d <- dim(TestArray)[1]
d2 <- dim(TestArray)[3]
indmat <- cbind(1:d,as.matrix(expand.grid(1:d,1:d2))
matrix(TestArray[indmat],nrow=d2,ncol=d,byrow=TRUE)

In addition to Ben's answer, here a surprisingly simple modification to my original line of code that does the job.
matrix(TestArray[as.matrix(Diagonal(3,TRUE))],ncol=3,nrow=4,byrow=TRUE)
This works because as.matrix(Diagonal(3,TRUE)) gets recycled.

Related

MATLAB sum over all elements of array valued expression

So I've been wondering about this for a while now. Summing up over some array variable A is as easy as
sum(A(:))
% or
sum(...sum(sum(A,n),n-2)...,1) % where n is the dimension of A
However once it gets to expressions the (:) doesn't work anymore, like
sum((A-2*A)(:))
is no valid matlab syntax, instead we need to write
foo = A-2*A;
sum(foo(:))
%or the one liner
sum(sum(...sum(A-2*A,n)...,2),1) % n is the dimension of A
The one liner above will only work, if the dimension of A is fixed which, depending on what you are doing, may not necessary be the case. The downside of the two lines is, that foo will be kept in memory until you run clear foo or may not even be possible depending on the size of A and what else is in your workspace.
Is there a general way to circumvent this issue and sum up all elements of an array valued expression in a single line / without creating temporal variables? Something like sum(A-2*A,'-all')?
Edit: It differes from How can I index a MATLAB array returned by a function without first assigning it to a local variable?, as it doesn't concern general (nor specific) indexing of array valued expressions or return values, but rather the summation over each possible index.
While it is possible to solve my problem with the answer given in the link, gnovice says himself that using subref is a rather ugly solution. Further Andras Deak posted a much cleaner way of doing this in the comments below.

While the answers to the linked duplicate can indeed be applied to your problem, the narrower scope of your question allows us to give a much simpler solution than the answers provided there.
You can sum all the elements in an expression (including the return value of a function) by reshaping your array first to 1d:
sum(reshape(A-2*A,1,[]))
%or even sum(reshape(magic(3),1,[]))
This will reshape your array-valued expression to size [1, N] where N is inferred from the size of the array, i.e. numel(A-2*A) (but the above syntax of reshape will compute the missing dimension for you, no need to evaluate your expression twice). Then a single call to sum will sum all the elements, as needed.
The actual case where you have to resort to something like this is when a function returns an array with an unknown number of dimensions, and you want to use its sum in an anonymous function (making temporary variables unavailable):
fun = #() rand(2*ones(1,randi(10))); %function returning random 2 x 2 x ... x 2 array with randi(10) dimensions
sumfun = #(A) sum(reshape(A,1,[]));
sumfun(fun()) %use it

Failed to plot graph of two arrays in Octave

I have created two arrays in octave using a for loop and I want to create a graph using the data of the two arrays. But it showed an error " invalid value for array property "xdata"" and displayed an empty graph.
for i=1:16
x=1+(10^6)*2
h{i}=1/(10.^i)
fdd1{i}=(sin(1+h{i})-sin(1))/h{i}
error_f1{i}=fdd1{i}-cos(1)
endfor
**fplot(loglog(h,error_f1));**
Am I making mistakes in plotting the graph? May I know how to solve this problem?

Yes, you are doing all the possible mistakes in that snippet.
your variables h and error_f are cell arrays. The function loglog takes numeric arrays. I believe your specific error comes from there. You can convert them with cell2mat as in loglog (cell2mat (h), cell2mat (error_f1)) but I would argue that would still be incorrect since you should have never created a cell array in the first place (see point 4).
your data has non-positive values which you can't plot with logarithmic scale.
the fplot function takes a function handle as argument. Why are you passing a figure handle (the output of loglog) to it?
Octave is a language designed around vectorized operations. It's syntax has a strong emphasis and you will suffer if you don't. You should not have a for loop for this. Just remove your indexing and make your multiplication and division element-wise. This also fixes problem 1 since you will end up with a numeric array
r = 1:16;
x = 1 + (10^6)*2;
h = 1 ./ (10.^r);
fdd1 = (sin (1+h) - sin (1)) ./ h;
error_f1 = fdd1 - cos(1);
Rule of thumb in Octave: if you ever see a for loop, chances are you are doing it wrong.

Postgresql: dynamic slice notation for arrays

After reading this I wrote a naive attempt to produce this
col1
---------
1
4
7
from this
ARRAY[[1,2,3], [4,5,6], [7,8,9]]
This works
SELECT unnest((ARRAY[[1,2,3], [4,5,6], [7,8,9]])[1:3][1:1]);
But I in my case, I don't know the length of the outer array.
So is there a way to hack together the slice "string" to take into account this variability?
Here was my attempt. I know, it's a bit funny
_ids := _ids_2D[('1:' || array_length(_ids_2D, 1)::text)::int][1:1];
As you can see, I just want to create the effect of [1:n]. Obviously '1:3' ain't going to parse nicely into what the array slice needs.
I could obviously use something like the unnest_2d_1d Erwin mentions in the answer linked above, but hoping for something more elegant.

If you are trying to get the first element of all nested (2nd dimension) arrays inside an array (1st dimension) then you may use
array_upper(anyarray, 1)
to get all elements of a specific dimension
anyarray[1:array_upper(anyarray, 1)][<dimension num>:<dimension num>]
e.g, to get all elements of the first dimension
anyarray[1:array_upper(anyarray, 1)][1:1]
as in the code above. Please refer to PostgreSQL manual section on Arrays for more information.

Matlab: multiply subset of three dimensional array with two dimensional array

I have a AxBxC array where AXB are pointing to individual grids of a field that i sampled (like coordinates) and C corresponds to the layers underneath. Now I want to calculate the impact of certain activities on these individual points by multiplying it with a 2D matrix.
E.g.
x=5; %x-Dimensions of the sampled area
y=5; %y-Dimensions of the sampled area
z=3; %z-number of layers sampled
Area= zeros(x,y,z);
AreaN= zeros(x,y,z);
now I want to multiply every layer of a given point in X*Y with:
AppA=[0.4,0.4,0.2;0.4,0.5,0.1;0.1,0.2,0.7];
I tried:
for i=1:x
for j=1:y
AreaN(i,j,:)= AppA*Area(i,j,:);
end
end
Unfotunately I get the error:
Error using *
Inputs must be 2-D, or at least one input must be scalar.
To compute elementwise TIMES, use TIMES (.*) instead.
Any help to this is appreciated since I am not yet really familiar with matlab.

Correct Approach
I think, to correct your code, you need to convert that Area(i,j,:) to a column vector, which you can do with squeeze. Thus, the correct loop-based code would look something like this -
AreaN= zeros(x,y,z);
for i=1:x
for j=1:y
AreaN(i,j,:)= AppA*squeeze(Area(i,j,:));
end
end
Now, there are efficient no-loop/vectorized approaches that can be suggested here to get to the output.
Vectorized Approach #1
First approach could be with matrix multiplication and has to be pretty efficient one -
AreaN = reshape(reshape(Area,x*y,z)*AppA.',x,y,z)
Vectorized Approach #2
Second one with bsxfun -
AreaN = squeeze(sum(bsxfun(#times,Area,permute(AppA,[3 4 2 1])),3))
Vectorized Approach #2 Rev 1
If you would like to get rid of the squeeze in the bsxfun code, you need to use an extra permute in there -
AreaN = sum(bsxfun(#times,permute(Area,[1 2 4 3]),permute(AppA,[4 3 1 2])),4)

This would solve the matrix multiplication problem:
AreaN(i,j,:)= AppA*reshape(Area(i,j,:),3,[]);
You might want to consider using bsxfun to aviod loops.

Vectorized sum over slices of an array

Suppose I have an array of three dimensions:
set.seed(1)
foo <- array(rnorm(250),dim=c(5,10,5))
And I want to create a matrix of each row and layer summed over columns 4, 5 and 6. I can write do this like this:
apply(foo[,4:6,],c(1,3),sum)
But this splits the array per row and layer and is pretty slow since it is not vectorized. I could also just add the slices:
foo[,4,]+foo[,5,]+foo[,6,]
Which is faster but gets abit tedious to do manually for multiple slices. Is there a function that does the above expression without manually specifying each slice?

I think you are looking for rowSums / colSums (fast implementations of apply)
colSums(aperm(foo[,4:6,], c(2,1,3)))
> all.equal(colSums(aperm(foo[,4:6,], c(2,1,3))), foo[,4,]+foo[,5,]+foo[,6,])
[1] TRUE

How about this:
eval(parse(text=paste(sprintf('foo[,%i,]',4:6),collapse='+')))
I am am aware that there are reasons to avoid parse, but I am not sure how to avoid it in this case.