Splitting a vector/array in R - arrays

I would like to use R for performing an operation similar to merge sort. I would like to split my vector/array in to two parts. My input is present in a variable named inp.
> inp <- c(5,6,7,8,9,1,2,3,4)
> inplen <- length(inp)
> left <- inp[1:ceiling(inplen/2)]
> right <- inp[ceiling(inplen/2)+1:inplen]
> left
[1] 5 6 7 8 9
> right
[1] 1 2 3 4 NA NA NA NA NA
> length(left)
[1] 5
> length(right)
[1] 9
Here you can see that though I split the vector in two halves the size of the right half is larger than the size of the left half. Also there are some entries in the right half that have the value NA. I am not able to figure out as to why the second vector created (called right) is having these extra entries.

You are running in to a (well-known) problem caused by a higher operator precedence for ":" than for "+":
ceiling(inplen/2)+1:inplen
[1] 6 7 8 9 10 11 12 13 14
NAs are being returned because your index exceeded the length of the vector.

You're missing a bracket:
right = inp[(ceiling(inplen/2)+1):inplen]
To expand, suppose we have
1 + 1:3
does this mean 1+(1:3) or (1+1):3. R interprets this as 1+(1:3) so when you type 1+1:3 you get:
1 + c(1,2,3)
which becomes
c(2,3,4)
Another common gotcha is:
R> x = 1:5
R> x[2:length(x)-1]
[1] 1 2 3 4
Instead of selecting elements 2 to 4, we have selected elements 1 to 4 by mistake.

You can use split for this, with cut to create the breakpoints:
split(inp,cut(seq(inplen),breaks=c(0,ceiling(inplen/2),inplen),labels=c("left","right")))
$left
[1] 5 6 7 8 9
$right
[1] 1 2 3 4

Related

Extracting array values based on values in different dimension

I've got a problem with subsetting values of an array.
raw.table <- array(data = c(1:12,13:24,rep(1:6, each=2)),
dim=c(3,4,3),
dimnames=list(LETTERS[1:3],1:4,c("target","ctrl","samples")))
The first two dimensions of my array represent some values that I want to do statistics on and the higher dimensions contain different attributes I want to use to access specific subsets. In this case I have only sample numbers, whereas there are always two values assigned to the same sample number (measurement replicates).
, , target
1 2 3 4
A 1 4 7 10
B 2 5 8 11
C 3 6 9 12
, , ctrl
1 2 3 4
A 13 16 19 22
B 14 17 20 23
C 15 18 21 24
, , samples
1 2 3 4
A 1 2 4 5
B 1 3 4 6
C 2 3 5 6
How do I access the values in dimension 1 (= target) that have the same sample number denoted in dimension 3 (= samples)? I tried out different approaches using unique(), duplicated() and match() but without coming to a result. I just cannot wrap my head about the indexing of arrays -.-
Cheers,
zuup
Form a logical index with a logical test (across dimensions):
> raw.table[,,1] == raw.table[,,3]
1 2 3 4
A TRUE FALSE FALSE FALSE
B FALSE FALSE FALSE FALSE
C FALSE FALSE FALSE FALSE
And use it to select items from the first dimension (and since they will be equal length there is no recycling):
> raw.table[, , 1 ][ raw.table[,,1] == raw.table[,,3] ]
[1] 1
Chaining calls to the Extract-operator is perfectly acceptable in R

Efficient "window-select" array blocks?

Suppose I have the following array:
x = [a b
c d
e f
g h
i j];
I want to "swipe a window of two rows" progressively (one row at a time) along the array to generate the following array:
y = [a b c d e f g h
c d e f g h i j];
What is the most efficient way to do this? I don't want to use cellfun or arrayfun or for loops.
im2col is going to be your best bet here if you have the Image Processing Toolbox.
x = [1 2
3 4
5 6
7 8];
im2col(x.', [1 2])
% 1 2 3 4 5 6
% 3 4 5 6 7 8
If you don't have the Image Processing Toolbox, you can also easily do this with built-ins.
reshape(permute(cat(3, x(1:end-1,:), x(2:end,:)), [3 2 1]), 2, [])
% 1 2 3 4 5 6
% 3 4 5 6 7 8
This combines the all rows with the next row by concatenating a row-shifted version along the third dimension. Then we use permute to shift the dimensions around and then we reshape it to be the desired size.
If you don't have the Image Processing Toolbox, you can do this using simple indexing:
x =
1 2
3 4
5 6
7 8
9 10
y = x.'; %% Transpose it, for simplicity
z = [y(1:end-2); y(3:end)] %% Take elements 1:end-2 and 3:end and concatenate them
z =
1 2 3 4 5 6 7 8
3 4 5 6 7 8 9 10
You can do the transposing and reshaping in a simple step (see Suever's edit), but the above might be easier to read, understand and debug for beginners.
Here's an approach to solve it for a generic case of selecting L rows per window -
[m,n] = size(x) % Store size
% Extend rows by indexing into them with a progressive array of indices
x_ext = x(bsxfun(#plus,(1:L)',0:m-L),:);
% Split the first dim at L into two dims, out of which "push" back the
% second dim thus created as the last dim. This would bring in the columns
% as the second dimension. Then, using linear indexing reshape into the
% desired shape of L rows for output.
out = reshape(permute(reshape(x_ext,L,[],n),[1,3,2]),L,[])
Sample run -
x = % Input array
9 1
3 1
7 5
7 8
4 9
6 2
L = % Window length
3
out =
9 1 3 1 7 5 7 8
3 1 7 5 7 8 4 9
7 5 7 8 4 9 6 2

average operation in the first 2 of 3 dimensions of a matrix

Suppose A is a 3-D matrix as below (2 rows-2 columns-2 pages).
A(:,:,1)=[1,2;3,4];
A(:,:,2)=[5,6;7,8];
I want to have a vector, say "a", whose inputs are the average of diagonal elements of matrices on each page. So in this simple case, a=[(1+4)/2;(5+8)/2].
But I have difficulties in matlab to do so. I tried the codes below but failed.
mean(A(1,1,:),A(2,2,:))
You can use "partially linear indexing" in the two dimensions that define the diagonal, as follows:
Since partially linear indexing can only be applied on trailing dimensions, you first need to apply permute to rearrange dimensions, so that the first and second dimensions become second and third.
Now you leave the first dimension untouched, linearly-index the diagonals in the second and third dimensions (which effectly reduces those two dimensions to one), and apply mean along the (combined) second dimension.
Code:
B = permute(A, [3 1 2]); %// step 1: permute
result = mean(B(:,1:size(A,1)+1:size(A,1)*size(A,2)), 2); %// step 2: index and mean
In your example,
A(:,:,1)=[1,2;3,4];
A(:,:,2)=[5,6;7,8];
this gives
result =
2.5000
6.5000
You can use bsxfun for a generic solution -
[m,n,r] = size(A)
mean(A(bsxfun(#plus,[1:n+1:n^2]',[0:r-1]*m*n)),1)
Sample run -
>> A
A(:,:,1) =
8 4 1
7 6 3
1 5 8
A(:,:,2) =
1 7 6
8 5 2
1 2 7
A(:,:,3) =
6 2 8
1 1 6
1 4 5
A(:,:,4) =
8 1 6
1 5 1
9 2 7
>> [m,n,r] = size(A);
>> sum(A(bsxfun(#plus,[1:n+1:n^2]',[0:r-1]*m*n)),1)
ans =
22 13 12 20
>> mean(A(bsxfun(#plus,[1:n+1:n^2]',[0:r-1]*m*n)),1)
ans =
7.3333 4.3333 4 6.6667

How to ignore NA in subscripted assignment

Given two (named) arrays x and y, where all dimnames(y) exist in x.
How can I fill (update) x with values from y, but ignoring NAs in y?
I have come so far:
x<-array(1:15,dim=c(5,3),dimnames=list(1:5,1:3))
y<-(NA^!diag(1:3))*diag(1:3)
dimnames(y)<-list(1:3,1:3)
x[match(names(y[,1]),names(x[,1])),match(names(y[1,]),names(x[1,]))]<-y
But this also overwrites x with "NA"s from y.
1 2 3
1 1 NA NA
2 NA 2 NA
3 NA NA 3
4 4 9 14
5 5 10 15
I guess it's something involving a filter !is.na(y) but I haven't found the right place to put it?
Thanks for any hint
We match the rownames of 'y' with rownames of 'x' to create the row index ('rn'), similarly get the corresponding column index ('cn') by matching. Get the index of values in 'y' that are non-NAs ('indx'). Subset the 'x' with row index, column index and resubset with 'indx' and replace those values with the non-NA values in y (y[indx]).
rn <- match(rownames(y), rownames(x))
cn <- match(colnames(y), colnames(x))
indx <- which(!is.na(y), arr.ind=TRUE)
x[rn,cn][indx] <- y[indx]
Or instead of matching, we can subset the 'x' with rownames(y) and colnames(y) and replace it as before.
x[rownames(y), colnames(y)][indx] <- y[indx]
You can index directly with rownames and colnames to get the relevant parts of x covered by y, and replace conditionally using ifelse:
x[rownames(y),colnames(y)] <- ifelse(is.na(y),x[rownames(y),colnames(y)],y)
x
1 2 3
1 1 6 11
2 2 2 12
3 3 8 3
4 4 9 14
5 5 10 15
just for completeness:
The accepted answer works under the assumption that we have a 2d-array (row/colnames).
But as the real problem was in higher dimension space (and this may the case for later readers) I show here how the solution can also be applied to the initial dimension-independent approach:
indx <- !is.na(y)
x[match(names(y[,1]),names(x[,1])),match(names(y[1,]),names(x[1,]))][indx] <- y[indx]
Thanks!

Looping through a Vector

I am using Rstudio and trying to create a function that will loop through a vector and perform a calculation with a while condition. The function should then return a data frame with the entered vector in one column and the number of iterations it took to satisfy the while condition in the another.
I have already created a function that preformed the calculation with the while condition which serves as the basic operation for the function I am having problems with. Here it is:
t5<-function(x){
z=x
while(x != 1){
if(x %% 2 == 0)
x= x/2
else x= (3 * x +1)
z=c(z, x)
}
return (z)
}
Here is what I have for the new function...my problem function (t7):
t7<-function(x){
y=0
i=0
for(i in 1:length(x)){
y[i]=length(t5(x[i]))-1
print(y[i])
}
#m<-data.frame(x, y[i])
}
I had it print y[i] because that is the only way the function does something. here is the output it shows (which is only half of what I need):
t7(2:10)
[1] 1
[1] 7
[1] 2
[1] 5
[1] 8
[1] 16
[1] 3
[1] 19
[1] 6
Can anybody help me understand how to make t7(2:10) run through this array and return a data frame listing the array and the number of iterations it took to reach the number 1 for each number in the array? Any help would be appreciated.
You can obtain the vector you need with the sapply function:
data.frame(x=2:10, iters=sapply(2:10, function(x) length(t5(x))-1))
# x iters
# 1 2 1
# 2 3 7
# 3 4 2
# 4 5 5
# 5 6 8
# 6 7 16
# 7 8 3
# 8 9 19
# 9 10 6

Resources