I have an array where I have to omit NA values. I know that it is an array full of matrices where every row has exactly one NA value. My approach works well for >2 columns of the matrices, but apply() drops one dimension when there are only two columns (as after omitting the NA values, one column disappears).
As this step is part of a much larger code, I would like to avoid recoding the rest and make this step robust to the case when the number of columns is two. Here is a simple example:
#create an array
arr1 <- array(rnorm(3000),c(500,2,3))
#randomly distribute 1 NA value per row of the array
for(i in 1:500){
arr1[i,,sample(3,1)] <- NA
}
#omit the NAs from the array
arr1.apply <- apply(arr1, c(1,2),na.omit)
#we lose no dimension as every dimension >1
dim(arr1.apply)
[1] 2 500 2
#now repeat with a 500x2x2 array
#create an array
arr2 <- array(rnorm(2000),c(500,2,2))
#randomly distribute 1 NA value per row of the array
for(i in 1:500){
arr2[i,,sample(2,1)] <- NA
}
#omit the NAs from the array
arr2.apply <- apply(arr2, c(1,2),na.omit)
#we lose one dimension because the last dimension collapses to size 1
dim(arr2.apply)
[1] 500 2
I do not want apply() to drop the last dimension as it breaks the rest of my code.
I am aware that this is a known issue with apply(), however, I am eager to resolve the problem in this very step, so any help would be appreciated. So far I've tried to wrap apply() in an array() command using the dimensions that should result, however, I think this mixes up the values in the matrix in a way that is not desirable.
Thanks for your help.
I propose a stupid solution, but I think you have no choice if you want to keep it this way:
arr1.apply <- if(dim(arr1)[3] > 2){
apply(arr1, c(1,2),na.omit)} else{
array(apply(arr1, c(1,2),na.omit),dim = c(1,dim(arr1)[1:2]))}
Related
I have a string array:
size(entries)
ans =
1 19413
I would like to rearrange the array to 4853 rows and 4 columns:
output=permute(entries,[4853 4]);
but get following error:
Error using permute ORDER contains an invalid permutation index.
What is the (probably obvious thing) I am doing wrong? thanks
You currently have 19413 elements, yet you wish to reshape this into a 4853 x 4 matrix that consists of 4853 * 4 = 19412 elements. No function in the world will help you do this because the original and target amount of elements don't match - they're off by one element. If you remove one of the elements...say... the last one, then we're getting somewhere.
Supposing you made a mistake and included that extra element by accident, you don't use permute here, but you use reshape. The second argument to reshape is the amount of elements to spread out for each target dimension, and that's what you're looking for. First remove the extraneous element that appears at the end of the array, then reshape the matrix:
output = reshape(entries(1:end-1),[4853 4]);
I'm 3 years late, but here's to anyone still looking for an answer.
In your case as mentioned above, yes you should use reshape() while minding that you preserve the total number of elements.
You use permute() when you want to reorder the dimensionality of an n-dimensional (ND) matrix.
The ORDER parameter specifies the order of the columns.
For example, if matrix A is LxMxN, the following line would make it MxLxN.
A = permute(A,[2 1 3]);
Hope this clears things.
I convert between data formats a lot. I'm sure this is quite common. In particular, I switch between arrays and lists. I'm trying to figure out if I'm doing it right, or if I'm missing any schemas that would greatly improve quality of life. Below I'll give some examples of how to achieve desired results in a couple situations.
Begin with the following array:
dat <- array(1:60, c(5,4,3))
Then, convert one or more of the dimensions of that array to a list. For clarification and current approaches, see the following:
1 dimension, array to list
# Convert 1st dim
dat_list1 <- unlist(apply(dat, 1, list),F,F) # this is what I usually do
# Convert 1st dim, (alternative approach)
library(plyr) # I don't use this approach often b/c I try to go base if I can
dat_list1a <- alply(dat, 1) # points for being concise!
# minus points to alply for being slow (in this case)
> microbenchmark(unlist(apply(dat, 1, list),F,F), alply(dat, 1))
Unit: microseconds
expr min lq mean median uq max neval
unlist(apply(dat, 1, list), F, F) 40.515 43.519 50.6531 50.4925 53.113 88.412 100
alply(dat, 1) 1479.418 1511.823 1684.5598 1595.4405 1842.693 2605.351 100
1 dimension, list to array
# Convert elements of list into new array dimension
# bonus points for converting to original array
dat_array1_0 <- simplify2array(dat_list1)
aperm.key1 <- sapply(dim(dat), function(x)which(dim(dat_array1_0)==x))
dat_array1 <- aperm(dat_array1_0,aperm.key1)
In general, these are the tasks I'm trying to accomplish, although sometimes it's in multiple dimensions or the lists are nested, or some such other complication. So I'm asking if anyone has a "better" (concise, efficient) way of doing either of these things, but bonus points if a suggested approach can handle other related scenarios too.
I currently have a 3 dimensional matrix and I want to extract a single row (into the third dimension) from it by index (say matrix(2,1,:)). I initially anticipated that the result of this would be a 1 dimensional matrix however what I got was a 1 by 1 by n matrix. Usually this wouldn't be a problem but some of the functions I'm using don't like 3D matrices. For example see the problem replicated below:
threeDeeMatrix=rand(3,3,3);
oneDeeAttempt=threeDeeMatrix(1,1,:);
norm(oneDeeAttempt)
Which returns the error message:
Error using norm
Input must be 2-D.
This is because oneDeeAttempt is
oneDeeAttempt(:,:,1) =
0.8400
oneDeeAttempt(:,:,2) =
0.0700
oneDeeAttempt(:,:,3) =
0.7663
rather than [0.8400 0.0700 0.7663]
How can I strip these extra dimensions? The only solution I can come up with is to use a loop to manually copy the values but that seems a little excessive.
Using permute to rearrange the matrix
The solution (which I found in the final stages of asking this) is to use permute which rearranges the order of the dimensions (similar to a=a' for 2D matrices). Once the unit dimensions are last they are stripped from the matrix and it becomes 1 dimensional.
oneDee=permute(oneDeeAttempt,[3 1 2]) %rearrange so the previous third dimension is now the first
%the matrix is now 3 by 1 by 1 which becomes 3
Using squeeze to remove leading singleton dimensions
As pointed out by Luis Mendo squeeze will very simply remove these leading singleton dimensions without having to worry about which dimensions are non singleton
oneDee=squeeze(oneDeeAttempt);
In the code below "G" returns a (10*4) matrix which is in the correct orientation.
All I want then is to be able to view/call these (10*4) matrices indexed by (j,k). However when I store the matrix "G" in the 4-D matrix "test" the data is displayed in a way that is a little counter intuitive? When I look at test in the variable editor I get:
val(:,:,1,1) =
1
val(:,:,2,1) =
0
val(:,:,3,1) =
0
val(:,:,4,1) =
0
.
.
.
val(:,:,1,10) =
1
val(:,:,2,10) =
0
val(:,:,3,10) =
0
val(:,:,4,10) =
0
So all the data is there but I want it displayed at a 10*4 matrix?
Also as you will see I had to change the code for "Correl_betas" and transpose "G" to get to where I am above. However I felt I did this by playing around rather than what I thought the code should be doing. Why does the original code not work? I am having to change the order of the 3rd and 4th dimensions when declaring "Correl_betas" and then pass the transpose of G, but this seems totally counter intuitive as the last two dimensions in the original "Correl_betas" and the original (un-transposed) "G" also match? But when I did it this way the ordering seemed even further from the 10*$ matrix I want.
So I have 2 questions?
1.) How can I get to the 2-dimentional (10*4) matrices I want indexed by j,k from where I am?
2.) How come the original code above doesn't result in the last two columns producing (10,4) matrices?
A large part of the problem is that I have very little experience working with matrices that have more than 2 dimensions so sorry if this question shows a lack of understanding. Maybe a pointer to a god tutorial on how to interpret manipulate higher dimensional matrices would help too.
%Correl_betas=zeros(50,50,10,4);
Correl_betas=zeros(50,50,4,10);
mats=[1:10]';
L1=-1;
for j=1:51
L1=L1+1;
L2=-1;
for k=1:51
L2=L2+1;
lambda=[ L1; L2 ];
nObs=size(mats,1);
G= [ones(nObs,1) (1-exp(-mats./lambda(1)))./(mats./lambda(1)) ((1-exp(-mats./lambda(1)))./(mats./lambda(1))-exp(-mats./lambda(1))) ((1-exp(-mats./lambda(2)))./(mats./lambda(2))-exp(-mats./lambda(2)))];
%Correl_betas(j,k,:,:)=G;
Correl_betas(j,k,:,:)=G';
test=Correl_betas(j,k,:,:);
temp1=corrcoef(Correl_betas(j,k,:,2),Correl_betas(j,k,:,3),'rows','complete');
temp2=corrcoef(Correl_betas(j,k,:,2),Correl_betas(j,k,:,4),'rows','complete');
temp3=corrcoef(Correl_betas(j,k,:,3),Correl_betas(j,k,:,4),'rows','complete');
F2_F3(j,k)=temp1(1,2);
F2_F4(j,k)=temp2(1,2);
F3_F4(j,k)=temp3(1,2);
end
end
To reshape the matrix as desired,
val2 = permute(val,[4 3 2 1]);
This brings the 4th dimension (size of 10) onto the first, and the the 3rd dimension (size of 4) onto the second.
In your loop, both j and k cycle through 1:51 so the first two dimensions of Correl_betas will end up being length 51 too.
I am not 100% what the role of the 1: is here. At which index to start the copy? But then why not two such parameters for the rank 2 array?
To be a little more explicit:
In Fortran90 and up you access a single array value by giving a single index, and access a subarray (a window of the array) by giving a range of indices separated by a colon.
a(1) = 0.
a(2:5) = (/3.14,2.71,1.62,0./)
You can also give a step size.
a(1:5:2) = (/0,2.71,0./)
And finally, you can leave out values and a default will be inserted. So if a runs from index 1 to 5 then I could write the above as
a(::2) = (/0,2.71,0./)
and the 1 and 5 are implied. Obviously, you shouldn't leave these out if it makes the code unclear.
With a multidimensional array, you can mix and match these on each dimension, as in your example.
You're taking a slice of array2, namely the elements in the D'th column from row 1 to C and putting them in the slice of array1 which is elements 1 through A
So both slices are 1-dimensional arrays
Slice may not be the correct term in Fortran