How to ignore NA in subscripted assignment - arrays

Given two (named) arrays x and y, where all dimnames(y) exist in x.
How can I fill (update) x with values from y, but ignoring NAs in y?
I have come so far:
x<-array(1:15,dim=c(5,3),dimnames=list(1:5,1:3))
y<-(NA^!diag(1:3))*diag(1:3)
dimnames(y)<-list(1:3,1:3)
x[match(names(y[,1]),names(x[,1])),match(names(y[1,]),names(x[1,]))]<-y
But this also overwrites x with "NA"s from y.
1 2 3
1 1 NA NA
2 NA 2 NA
3 NA NA 3
4 4 9 14
5 5 10 15
I guess it's something involving a filter !is.na(y) but I haven't found the right place to put it?
Thanks for any hint

We match the rownames of 'y' with rownames of 'x' to create the row index ('rn'), similarly get the corresponding column index ('cn') by matching. Get the index of values in 'y' that are non-NAs ('indx'). Subset the 'x' with row index, column index and resubset with 'indx' and replace those values with the non-NA values in y (y[indx]).
rn <- match(rownames(y), rownames(x))
cn <- match(colnames(y), colnames(x))
indx <- which(!is.na(y), arr.ind=TRUE)
x[rn,cn][indx] <- y[indx]
Or instead of matching, we can subset the 'x' with rownames(y) and colnames(y) and replace it as before.
x[rownames(y), colnames(y)][indx] <- y[indx]

You can index directly with rownames and colnames to get the relevant parts of x covered by y, and replace conditionally using ifelse:
x[rownames(y),colnames(y)] <- ifelse(is.na(y),x[rownames(y),colnames(y)],y)
x
1 2 3
1 1 6 11
2 2 2 12
3 3 8 3
4 4 9 14
5 5 10 15

just for completeness:
The accepted answer works under the assumption that we have a 2d-array (row/colnames).
But as the real problem was in higher dimension space (and this may the case for later readers) I show here how the solution can also be applied to the initial dimension-independent approach:
indx <- !is.na(y)
x[match(names(y[,1]),names(x[,1])),match(names(y[1,]),names(x[1,]))][indx] <- y[indx]
Thanks!

Related

How do you fill-in missing values despite differences in index values?

Here's my situation. I have a predicted values in the form of array (i.e. ([1,3,1,2,3,...3]) ) and a data frame column of missing NA's. Both array and column of data frame have the same dimensions. But, the indices don't match another.
For instance, the indices of predicted array are 0:100.
On the other hand, the indices of the column of NA's don't begin with 0, rather the first index where NA is observed in the dataFrame.
What's Pandas function will fill-in the first missing value with the first element of predicted array, second missing value with the second element, and so forth?
Assuming your missing data is represented in the DF as NaN/None values:
df = pd.DataFrame({'col1': [2,3,4,5,7,6,5], 'col2': [2,3,None,5,None,None,5],}) # Column 2 has missing values
pred_vals = [11, 22, 33] # Predicted values to be inserted in place of the missing values
print 'Original:'
print df
missing = df[pd.isnull(df['col2'])].index # Find indices of missing values
df.loc[missing, 'col2'] = pred_vals # Replace missing values
print '\nFilled:'
print df
Result:
Original:
col1 col2
0 2 2
1 3 3
2 4 NaN
3 5 5
4 7 NaN
5 6 NaN
6 5 5
Filled:
col1 col2
0 2 2
1 3 3
2 4 11
3 5 5
4 7 22
5 6 33
6 5 5

How to logically index entire columns in MATLAB

Given a logical column vector (size n x 1) v and an array a (size m x n) how do I generate a new array consisting of all the columns in a where the numerical index of said column (1...n) is 1 at the corresponding location in v.
So for example if v was
1
0
0
1
and a was
1 4 7 10
2 5 8 11
3 6 9 12
the new array would be
1 10
2 11
3 12
because the first and fourth elements of v are 1 (true), so the new array should contain the first and fourth columns of a.
I have tried a bunch of things involving normal logical indexing and transpose but I can't get it to work. All help is appreciated
You want to use the logical indexing to select the columns and select all rows. In the example below, I have explicitly cast v as a logical just in case it's not a logical matrix already.
new = a(:, logical(v))
1 10
2 11
3 12

Automatic test for the equality of columns between two matrices

I have two matrices:
X =
1 2 3
4 5 6
7 8 9
`Y` =
1 10 11
4 12 13
7 14 15
I know that if I want to find the index of a specific element in X or Y, I can use the function find. For example:
index_3 = find(X==3)
What I want is to find or search in a very automatic way if a column in X is also present in Y. In other terms, I want a function which can tell me if a column in X is equal to a column in Y. In fact to to try this, one can use the function ismember which indeed has an optional flag to compare rows:
rowsX = ismember(X, Y, 'rows');
So a simple way to get columns is just by taking the transpose of both matrices:
rowsX = ismember(X.', Y.', 'rows')
rowsX =
1
0
0
But how can I do that in other manner?
Any help will be very appreciated!
You can do that with bsxfun and permute:
rowsX = any(all(bsxfun(#eq, X, permute(Y, [1 3 2])), 1), 3);
With
X = [ 1 2 3
4 5 6
7 8 9 ];
Y = [ 1 10 11
4 12 13
7 14 15 ];
this gives
rowsX =
1 0 0
How it works
permute "turns Y 90 degrees" along a vertical axis, so columns of Y are kept aligned with columns of X, but rows of Y are moved to the third dimension. Testing for equality with bsxfun and applying all(...,1) gives a matrix that tells which columns of X equal which columns of Y. Then any(...,3) produces the desired result: true if a column of X equals any column of Y.

Calling multiple values from data frame by row and column in R

I'm working in R and I'd like to call a selection of values from a data frame by their column and row indices. However doing this yields a matrix rather than an array. I shall demonstrate:
Given the data.frame:
a = data.frame( a = array(c(1,2,3,4,5,6,7,8,9), c(3,3)) )
(for those of you who don't want to plug it in, it looks like this)
a.1 a.2 a.3
1 1 4 7
2 2 5 8
3 3 6 9
And lets say I have two arrays pointing to the values I'd like to grab
grab_row = c(3,1,2)
grab_col = c(1,2,1)
Now I'd expect this to be the code I want...
a[ grab_row, grab_col ]
To get these results...
[1] 3 4 2
But that comes out as a 3x3 matrix, which makes enough sense in and of itself
a.1 a.2 a.1.1
3 3 6 3
1 1 4 1
2 2 5 2
Alright, I also see my answer is in the diagonal of the 3x3 matrix... but I'd really rather stick to an array as the output.
Any thoughts? Danka.
Passing the row and column indices in as a two-column matrix (here constructed using cbind()) will get you the elements you were expecting:
a[cbind(grab_row, grab_col)]
[1] 3 4 2
This form of indexing is documented in ?"[":
Matrices and array:
[...snip...]
A third form of indexing is via a numeric matrix with the one
column for each dimension: each row of the index matrix then
selects a single element of the array, and the result is a vector.
Try this:
> mapply(function(i,j)a[i,j], grab_row, grab_col)
[1] 3 4 2
Works for both dataframes and matrices.

Splitting a vector/array in R

I would like to use R for performing an operation similar to merge sort. I would like to split my vector/array in to two parts. My input is present in a variable named inp.
> inp <- c(5,6,7,8,9,1,2,3,4)
> inplen <- length(inp)
> left <- inp[1:ceiling(inplen/2)]
> right <- inp[ceiling(inplen/2)+1:inplen]
> left
[1] 5 6 7 8 9
> right
[1] 1 2 3 4 NA NA NA NA NA
> length(left)
[1] 5
> length(right)
[1] 9
Here you can see that though I split the vector in two halves the size of the right half is larger than the size of the left half. Also there are some entries in the right half that have the value NA. I am not able to figure out as to why the second vector created (called right) is having these extra entries.
You are running in to a (well-known) problem caused by a higher operator precedence for ":" than for "+":
ceiling(inplen/2)+1:inplen
[1] 6 7 8 9 10 11 12 13 14
NAs are being returned because your index exceeded the length of the vector.
You're missing a bracket:
right = inp[(ceiling(inplen/2)+1):inplen]
To expand, suppose we have
1 + 1:3
does this mean 1+(1:3) or (1+1):3. R interprets this as 1+(1:3) so when you type 1+1:3 you get:
1 + c(1,2,3)
which becomes
c(2,3,4)
Another common gotcha is:
R> x = 1:5
R> x[2:length(x)-1]
[1] 1 2 3 4
Instead of selecting elements 2 to 4, we have selected elements 1 to 4 by mistake.
You can use split for this, with cut to create the breakpoints:
split(inp,cut(seq(inplen),breaks=c(0,ceiling(inplen/2),inplen),labels=c("left","right")))
$left
[1] 5 6 7 8 9
$right
[1] 1 2 3 4

Resources