R - Vector/ Array Addition - arrays

I a having a little trouble with vector or array operations.
I have three 3D arrays and i wanna find the average of them. How can i do that? we can't use mean() as it only returns a single value.
The more important is some of the cells in the arrays are NA whic mean if i just add them like
A = (B + C + D)/3
The results of will show NA as well.
How can i let it recognise if the cell is NA then just skip it.
Like
A = c(NA, 10, 15, 15, NA)
B = c(10, 15, NA, 22, NA)
C = c(NA, NA, 20, 26, NA)
I wanna the output of average these vectors be
(10, (10+15)/2, (15+20)/2, (15+22+26)/3, NA)
We also can't use na.omit, because it will move the order of indexes.
This is the corresponding code. i wish it would be helpful.
for (yr in 1950:2011) {
temp_JFM <- sst5_sst2[,,year5_sst2==yr & (month5_sst2>=1 & month5_sst2<=3)]
k = 0
jfm=4*k+1
for (i in 1:72) {
for (j in 1:36) {
iposst5_sst2[i,j,jfm] <- (temp_JFM[i,j,1]+temp_JFM[i,j,2]+temp_JFM[i,j,3])/3
}
}
}
Thnk you.
It already been solved.
The easiest way to correct it can be shown below.
iposst5_sst2[i,j,jfm] <- mean(temp_JFM[i,j,],na.rm=TRUE)

I'm not entirely sure what your desired output is, but I'm guessing that what you really want to build is not three 3D arrays, but one 4D array that you can then use apply on.
Something like this:
#Three 3D arrays...
A <- array(runif(1:27),dim = c(3,3,3))
B <- array(runif(1:27),dim = c(3,3,3))
C <- array(runif(1:27),dim = c(3,3,3))
#Become one 4D array
D <- array(c(A,B,C),dim = c(3,3,3,3))
#Now we can simply apply the function mean
# and use it's na.rm = TRUE argument.
apply(D,1:3,mean,na.rm = TRUE)

Here's an example which makes a vector of the three values, which makes na.omit usable:
vectorAverage <- function(A,B,C) {
Z <- rep(NA, length(A))
for (i in 1:length(A)) {
x <- na.omit(c(A[i],B[i],C[i]))
if (length(x) > 0) Z[i] = mean(x)
}
Z
}
Resulting in:
vectorAverage(A,B,C)
[1] 10.0 12.5 17.5 21.0 NA
Edited: Missed the NaN in the output of the first version.

Related

Is there any function that calculate correlation between a set of matrices included in an array in R?

I have a list that includes 20 matrices. I want to calculate Pearson's correlation betweeen all matrices. but I can not find any possible code or functions? Could you please give some tips for doing so.
something like:
a=matrix(1:8100, ncol = 90)
b=matrix(8100:16199, ncol = 90)
c=matrix(sample(16200:24299),ncol = 90)
z=list(a,b,c)
I find this:
https://rdrr.io/cran/lineup/man/corbetw2mat.html and try it:
library(lineup)
corbetw2mat(z[a], z[b], what = "all")
I've got the following error:
Error in corbetw2mat(z[a], z[b], what = "all") :
(list) object cannot be coerced to type 'double'
I want a list like this for the result:
a & b
correlations
a & c
correlations
b & c
correlations
Thanks
I will create a smaller data set to illustrate the solution below.
To get pairwise combinations the best option is to compute a matrix of combinations with combn and then loop through it, in this case a lapply loop.
set.seed(1234) # Make the results reproducible
a <- matrix(1:9, ncol = 3)
b <- matrix(rnorm(9), ncol = 3)
c <- matrix(sample(1:9), ncol = 3)
sample_list <- list(a, b, c)
cmb <- combn(3, 2)
res <- lapply(seq.int(ncol(cmb)), function(i) {
cor(sample_list[[ cmb[1, i] ]], sample_list[[ cmb[2, i] ]])
})
The results are in the list res.
Note that sample is a base r function, so I changed the name to sample_list.

numpy binned mean, conserving extra axes

It seems I am stuck on the following problem with numpy.
I have an array X with shape: X.shape = (nexp, ntime, ndim, npart)
I need to compute binned statistics on this array along npart dimension, according to the values in binvals (and some bins), but keeping all the other dimensions there, because I have to use the binned statistic to remove some bias in the original array X. Binning values have shape binvals.shape = (nexp, ntime, npart).
A complete, minimal example, to explain what I am trying to do. Note that, in reality, I am working on large arrays and with several hunderds of bins (so this implementation takes forever):
import numpy as np
np.random.seed(12345)
X = np.random.randn(24).reshape(1,2,3,4)
binvals = np.random.randn(8).reshape(1,2,4)
bins = [-np.inf, 0, np.inf]
nexp, ntime, ndim, npart = X.shape
cleanX = np.zeros_like(X)
for ne in range(nexp):
for nt in range(ntime):
indices = np.digitize(binvals[ne, nt, :], bins)
for nd in range(ndim):
for nb in range(1, len(bins)):
inds = indices==nb
cleanX[ne, nt, nd, inds] = X[ne, nt, nd, inds] - \
np.mean(X[ne, nt, nd, inds], axis = -1)
Looking at the results of this may make it clearer?
In [8]: X
Out[8]:
array([[[[-0.20470766, 0.47894334, -0.51943872, -0.5557303 ],
[ 1.96578057, 1.39340583, 0.09290788, 0.28174615],
[ 0.76902257, 1.24643474, 1.00718936, -1.29622111]],
[[ 0.27499163, 0.22891288, 1.35291684, 0.88642934],
[-2.00163731, -0.37184254, 1.66902531, -0.43856974],
[-0.53974145, 0.47698501, 3.24894392, -1.02122752]]]])
In [10]: cleanX
Out[10]:
array([[[[ 0. , 0.67768523, -0.32069682, -0.35698841],
[ 0. , 0.80405255, -0.49644541, -0.30760713],
[ 0. , 0.92730041, 0.68805503, -1.61535544]],
[[ 0.02303938, -0.02303938, 0.23324375, -0.23324375],
[-0.81489739, 0.81489739, 1.05379752, -1.05379752],
[-0.50836323, 0.50836323, 2.13508572, -2.13508572]]]])
In [12]: binvals
Out[12]:
array([[[ -5.77087303e-01, 1.24121276e-01, 3.02613562e-01,
5.23772068e-01],
[ 9.40277775e-04, 1.34380979e+00, -7.13543985e-01,
-8.31153539e-01]]])
Is there a vectorized solution? I thought of using scipy.stats.binned_statistic, but I seem to be unable to understand how to use it for this aim. Thanks!
import numpy as np
np.random.seed(100)
nexp = 3
ntime = 4
ndim = 5
npart = 100
nbins = 4
binvals = np.random.rand(nexp, ntime, npart)
X = np.random.rand(nexp, ntime, ndim, npart)
bins = np.linspace(0, 1, nbins + 1)
d = np.digitize(binvals, bins)[:, :, np.newaxis, :]
r = np.arange(1, len(bins)).reshape((-1, 1, 1, 1, 1))
m = d[np.newaxis, ...] == r
counts = np.sum(m, axis=-1, keepdims=True).clip(min=1)
means = np.sum(X[np.newaxis, ...] * m, axis=-1, keepdims=True) / counts
cleanX = X - np.choose(d - 1, means)
Ok, I think I got it, mainly based on the answer by #jdehesa.
clean2 = np.zeros_like(X)
d = np.digitize(binvals, bins)
for i in range(1, len(bins)):
m = d == i
minds = np.where(m)
sl = [*minds[:2], slice(None), minds[2]]
msum = m.sum(axis=-1)
clean2[sl] = (X - \
(np.sum(X * m[...,np.newaxis,:], axis=-1) /
msum[..., np.newaxis])[..., np.newaxis])[sl]
Which gives the same results as my original code.
On the small arrays I have in the example here, this solution is approximately three times as fast as the original code. I expect it to be way faster on larger arrays.
Update:
Indeed it's faster on larger arrays (didn't do any formal test), but despite this, it just reaches the level of acceptable in terms of performance... any further suggestion on extra vectoriztaions would be very welcome.

filtering 3D array using data in similar array

I have two 3-D arrays, one of which contains data and the other contains metadata. The metadata is a date signature, so an example can be produced with the following:
datamatrix <- array(data = c(rep(0,9), rep(0,9),(sample(0:100, 9)/1000), (sample(30:50, 9)/100), (sample(70:80,9)/100), (sample(30:50,9)/100), rep(0,9), rep(0,9)), dim = c(3,3,8))
timematrix <- array(data = c(sample(1:20), sample(30:50, 9), sample(70:90, 9), sample(110:130,9), sample(150:170,9), sample(190:210,9), sample(230:250,9), sample(260:280,9)), dim = c(3,3,8))
I wish to construct a new 3D array filled with the data from the first matrix (datamatrix) and a bunch of NA's such that the element i in the datamatrix falls into its corresponding date (derived from the corresponding metadata in timematrix) in a final workingdata 3D array like so:
workingdata <- array(data = NA,
dim = c(3,3,365))
for (i in 1:length(datamatrix)){
location <- i
locationguide <- location%%9
locationfinal <- locationguide%%3
if (locationfinal == 0){
a <- 3
b <- 3
}
if (locationfinal == 1){
a <- 1
b <- 1
}
if (locationfinal == 2){
a <- 1
b <- 2
}
if (locationfinal == 3){
a <- 1
b <- 3
}
if (locationfinal == 4){
a <- 2
b <- 1
}
if (locationfinal == 5){
a <- 2
b <- 2
}
if (locationfinal == 6){
a <- 2
b <- 3
}
if (locationfinal == 7){
a <- 3
b <- 1
}
if (locationfinal == 8){
a <- 3
b <- 2
}
value <- datamatrix[i]
day <- timematrix[i]
workingdata[a,b,day] <- datamatrix[i]
}
The dataset I'm working with is thousands of columns wide and equivalently long-rowed. The current method does the job, however it would take forever using a for loop in the actual data, and coding it would be ridiculous because of all of the if's this requires. Does anyone know of a better method for filtering data such as this?
For a viewer-friendly concept of what I want, an image from ESRI best sums it up:
http://pro.arcgis.com/en/pro-app/tool-reference/space-time-pattern-mining/GUID-42A31756-6518-41E9-A900-2C892AF4023A-web.png
I'm shooting for a z-dimension for time, with one block per day, where observations fall into their appropriate row on the z-axis but remain in their original locations in the x- and y-dimension.
I'm not sure what your for loop is doing exactly, and it might perhaps not quite do what you are trying to do. Not sure. For example, check the results of ((1:30)%%9)%%3.
But, from your description of the problem, you might want to do something like this:
workingdata <- array(data = NA, dim = c(3,3,365))
for (i in 1:dim(datamatrix)[1]) {
for (j in 1:dim(datamatrix)[1]) {
workingdata[i, j, timematrix[i, j, ]] <- datamatrix[i, j, ]
}
}
Note that this won't work for when you have days that are 0 in your timematrix (like in your example data) since R has 1 based indexing.
Final answer: Axeman's solution works in 3 dimensions with the following approach:
workingdata <- array(data = NA, dim = c(3,3,365))
for (i in 1:dim(datamatrix)[1]) {
for (j in 1:dim(datamatrix)[2]) {
for(k in 1:dim(datamatrix)[3]){
workingdata[i, j, timematrix[i, j, k]] <- datamatrix[i, j, k]
}
}
}

loop through column and add other row

EDIT: I've made some progress. So I read up on subsets, and was able to break down my dataframe under a certain condition. Let's say titleCSV[3] consists of file names ("file1", "file2", "file3", etc) and titleCSV[13] contains values (-18, -8, -2, etc). Code below:
titleRMS <- data.frame(titleCSV[3], titleCSV[13])
for(x.RMS in titleRMS[2]){
x.RMS <- gsub("[A-Za-z]","",r)
x.RMS <- gsub(" ","",r)
x.RMS = abs(as.numeric(r))
}
x.titleRMSJudge <- data.frame(titleRMS[1], x.RMS)
x.titleRMSResult <- subset(x.titleRMSJudge, r < 12)
My question now is, what's the best way to print each row of the first column of x.titleRMSResult with a message saying that it's loud? Thanks, guys!
BTW, here is the dput of my titleRMS:
dput(titleRMS)
structure(list(FILE.NAME = c("00-Introduction.mp3", "01-Chapter_01.mp3",
"02-Chapter_02.mp3", "03-Chapter_03.mp3", "04-Chapter_04.mp3",
"05-Chapter_05.mp3", "06-Chapter_06.mp3", "07-Chapter_07.mp3",
"08-Chapter_08.mp3", "09-Chapter_09.mp3", "10-Chapter_10.mp3",
"11-Chapter_11.mp3", "12-Chapter_12.mp3", "Bonus_content.mp3",
"End.mp3"), AVG.RMS..dB. = c(-14, -10.74, -9.97, -10.53, -10.94,
-12.14, -11, -9.19, -10.42, -11.51, -14, -10.96, -11.71, -11,
-16)), .Names = c("FILE.NAME", "AVG.RMS..dB."), row.names = c(NA,
-15L), class = "data.frame")
ORIGINAL POST BELOW
Newb here! Coding in R. So I am trying to analyze a csv file. One column has 10 rows with different file names, while the other has 10 rows with different values. I want to run the 2nd column into a loop, and if it's greater/less than a certain value, I wanted it to print the associating file name as well as a message. I don't know how to have both columns run in a loop together so that the proper file name prints with the proper value/message. I wrote a loop that ends up checking each value for as many rows as there are in the other column. At the moment, all 10 rows meet the criteria for the message I want to print, so I've been getting 100 messages!
titleRMS <- data.frame(titleCSV[3], titleCSV[13])
for(title in titleRMS[1]){
title <- gsub(" ","",title)
}
for(r in titleRMS[2]){
r <- gsub("[A-Za-z]","",r)
r <- gsub(" ","",r)
r = abs(as.numeric(r))
for(t in title){
for(f in r){
if (f < 18 & f > 0) {
message(t, "is Loud!")
}
}
}
}
And this line of code only prints the first file name for each message:
for(r in titleRMS[2]){
r <- gsub("[A-Za-z]","",r)
r <- gsub(" ","",r)
r = abs(as.numeric(r))
for(f in r){
if (f < 18 & f > 0) {
message(t, "is Loud!")
}
}
}
Can someone throw me some tips or even re-write what I wrote to show me how to get what I need? Thanks, guys!
I've figured out my own issue. Here is what I wrote to come to the conclusion I wanted:
titleRMS <- data.frame(titleCSV[3], titleCSV[13])
filesHighRMS <- vector()
x.titleRMSJudge <- data.frame(titleCSV[3], titleCSV[13])
x.titleRMSResult <- subset(x.titleRMSJudge, titleCSV[13] > -12 & titleCSV[15] > -1)
for(i in x.titleRMSResult[,1]){
filesHighRMS <- append(filesHighRMS, i, 999)
}
emailHighRMS <- paste(filesHighRMS, collapse=", ")
blurbHighRMS <- paste("" ,nrow(x.titleRMSResult), " file(s) (" ,emailHighRMS, ") have a high RMS and are too loud.")
Being new to code, I bet there is a simpler way, I'm just glad I was able to work this out on my own. :-)
You're making things hard on yourself. You don't need regex for this, and you probably don't need a loop, at least not through your data frame. Definitely you don't need nested loops.
I think this will do what you say you want...
indicesToMessage <- titleRms[, 2] > 0 & titleRms[, 2] < 18
myMessages <- paste(titleRms[indicesToMessage, 1], "is Loud!")
for (i in 1:length(myMessages)) {
message(myMessages[i])
}
A more R-like way (read: without an explicit loop) to do the last line is like this:
invisible(lapply(myMessages, message))
The invisible is needed because message() doesn't return anything, just has the side-effect of printing to the console, but lapply expects a return and will print NULL if there is none. invisible just masks the NULL.
Edits: Negative data
Since your data is negative, I assume you actually want messages when the absolute value abs() is between 0 and 18. This works for that case.
indicesToMessage <- abs(titleRms[, 2]) > 0 & abs(titleRms[, 2]) < 18
myMessages <- paste(titleRms[indicesToMessage, 1], "is Loud!")
invisible(lapply(myMessages, message))

lapply and rbind not properly appending the results

SimNo <- 10
for (i in 1:SimNo){
z1<-rnorm(1000,0,1)
z2<-rnorm(1000,0,1)
z3<-rnorm(1000,0,1)
z4<-rnorm(1000,0,1)
z5<-rnorm(1000,0,1)
z6<-rnorm(1000,0,1)
X<-cbind(z1,z2,z3,z4,z5,z6)
sx<-scale(X)/sqrt(999)
det1<-det(t(sx)%*%sx)
detans<-do.call(rbind,lapply(1:SimNo, function(x) ifelse(det1<1,det1,0)))
}
when I run all commands with in loop except last one I get different values of determinant but when I run code with loops at once I get last value of determinant repeated for all.
Please help and guide to control all situation like this.
Is there way to have short and efficient way for this code, so that each individual variable can also be accessed.
Whenever you are repeating the same operation multiple times, and without inputs, think about using replicate. Here you can use it twice:
SimNo <- 10
det1 <- replicate(SimNo, {
X <- replicate(6, rnorm(1000, 0, 1))
sx <- scale(X) / sqrt(999)
det(t(sx) %*% sx)
})
detans <- ifelse(det1 < 1, det1, 0)
Otherwise, this is what your code should have looked with your for loop. You needed to create a vector for storing your outputs at each loop iteration:
SimNo <- 10
detans <- numeric(SimNo)
for (i in 1:SimNo) {
z1<-rnorm(1000,0,1)
z2<-rnorm(1000,0,1)
z3<-rnorm(1000,0,1)
z4<-rnorm(1000,0,1)
z5<-rnorm(1000,0,1)
z6<-rnorm(1000,0,1)
X<-cbind(z1,z2,z3,z4,z5,z6)
sx<-scale(X)/sqrt(999)
det1<-det(t(sx)%*%sx)
detans[i] <- ifelse(det1<1,det1,0)
}
Edit: you asked in the comments how to access X using replicate. You would have to make replicate create and store all your X matrices in a list. Then use the *apply family of functions to loop throughout that list to finish the computations:
X <- replicate(SimNo, replicate(6, rnorm(1000, 0, 1)), simplify = FALSE)
det1 <- sapply(X, function(x) {
sx <- scale(x) / sqrt(999)
det(t(sx) %*% sx)
})
detans <- ifelse(det1 < 1, det1, 0)
Here, X is now a list of matrices, so you can get e.g. the matrix for the second simulation by doing X[[2]].
SimNo <- 10
matdet <- matrix(data=NA, nrow=SimNo, ncol=1, byrow=TRUE)
for (i in 1:SimNo){
z1<-rnorm(1000,0,1)
z2<-rnorm(1000,0,1)
z3<-rnorm(1000,0,1)
z4<-rnorm(1000,0,1)
z5<-rnorm(1000,0,1)
z6<-rnorm(1000,0,1)
X<-cbind(z1,z2,z3,z4,z5,z6)
sx<-scale(X)/sqrt(999)
det1<-det(t(sx)%*%sx)
matdet[i] <-do.call(rbind,lapply(1:SimNo, function(x) ifelse(det1<1,det1,0)))
}
matdet

Resources