Let's suppose I want to minimize a function:
x<-seq(-4.5,4.5,by=.2)
y<-seq(-4.5,4.5,by=.2)
f <- function(x1,x2){(x1^2 + x2)^2 }
z <- outer(x,y,f)
Where z is a matrix of dimension 46 x 46:
> class(z)
[1] "matrix"
> dim(z)
[1] 46 46
So I make a graph of the result with:
persp(x,y,z,phi=-45,theta=45,col="yellow",shade=.65 ,ticktype="detailed")
If I write the previous, it works, but since I want to minimize the function using optim, if I use that I get:
optim(c(-4,-4), f, df)$par
> Error in fn(par, ...) : argument "x2" is missing, with no default
So I need to use an array in order to use optim after all. So if I write:
f <- function(x) (x[1]^2 + x[2])^2
x <- seq(-4.5,4.5,by=.2)
y <- seq(-4.5,4.5,by=.2)
s<-data.frame(cbind(x,y))
I can use optim:
optim(c(-4,-4), f, df)$par
But outer gives an error:
z <- outer(s,f)
Error in as.vector(x, mode) : cannot coerce type 'closure' to
vector of type 'any'
I don't know how to solve it.
I believe the goal here is to not have to write the function two different ways, right?
f0 <- function(x1,x2) ( x1^2 + x2 )^2
f <- function(x) ( x[1]^2 + x[2] )^2
Similarly, maybe you just want to use just s<-data.frame(cbind(x,y)) (without x and y).
Here's what I would consider doing:
outer(s[[1]],s[[2]],Vectorize(function(xi,yi) f(c(xi,yi))))
This way you only have to write the function once, in the way amenable to using optim (with a single argument).
Note: If you want the grids x and y to have a different number of points, you should store s <- list(x,y) instead. The code will work the same.
Related
I'm starting to use functions handles in Matlab and I have a question,
what Matlab computes when I do:
y = (0:.1:1)';
fun = #(x) x(1) + x(2).^2 + exp(x(3)*y)
and what Matlab computes when I do:
fun = #(x) x + x.^2 + exp(x*y)
Because I'm evaluating the Jacobian of these functions (from this code ) and it gives different results. I don't understand the difference of putting x(i) or only x
Let's define a vector vec as vec = [1, 2, 3].
When you use this vec in your first function as results = fun(vec), the program will take only the particular elements of the vector, meaning x(1) = vec(1), x(2) = vec(2) and x(3) = vec(3). The whole expression then will look as
results = vec(1) + vec(2).^2 + exp(vec(3)*y)
or better
results = 1 + 2^2 + exp(3*y)
However, when you use your second expression as results = fun(vec), it will use the entire vector vec in all the cases like this
results = vec + vec.^2 + exp(vec*y)
or better
results = [1, 2, 3] + [1^2, 2^2, 3^2] + exp([1, 2, 3]*y)
You can also clearly see that in the first case, I don't really need to care about matrix dimensions, and the final dimensions of the results variable are the same as the dimensions of your y variable. This is not the case in the second example, because you multiply matrices vec and y, which (in this particular example) results in error, as the vec variable has dimensions 1x3 and the y variable 11x1.
If you want to investigate this, I recommend you split this up into subexpressions and debug, e.g.
f1 = #(x) x(1);
f2 = #(x) x(2).^2;
f3 = #(x) exp(x(3)*y);
f = #(x) f1(x) + f1(x) + f3(x)
You can split it up even further if any subexpression is unclear.
The distinction is that one is an array array multiplication (x * y, I'm assuming x is an array with 11 columns in order for the matrix multiplication to be consistent) and the other is a scalar array multiplication (x(3) * y). The subscript operator (n) for any matrix extracts the n-th value from that matrix. For a scalar, the index can only be 1. For a 1D array, it extracts the n-th element of the column/row vector. For a 2D array, its the n-th element when traversed columnwise.
Also, if you only require the first derivative, I suggest using complex-step differentiation. It provides reduced numerical error and is computationally efficient.
I am trying to optimise the code designed to compute double sums of product of the elements of two square matrices. Let’s say we have two square matrices of size n, W and V. The object that needs to be computed is a vector B with elements
In simple terms: compute element-by-element products of two different rows in two different matrices and take their sum, then take an extra sum over all rows of the second matrix (sans identical indices).
The problem is, the computational complexity of this task seemingly O(n3) because the length of this object we are creating, B, is n, and each element requires two summations. This is what I have come up with:
For given i and j (i≠j), start with the inner sum over k. Sum for all k, then subtract the terms for k=i and k=j, and multiply by the indicator of j≠i.
Since the restriction j≠i has been taken care of in the inner sum, the outer sum is taken just for j=1,...,n.
If we denote , then the two steps will look like
and .
However, writing a loop turned out to be very inefficient. n=100 works quickly (0.05 seconds). But, for instance, when n=500 (we are talking about real-world applications here), the average computation time is 3 seconds, and for n=1000, it jumps to 22 s.
The inner loop over k can be easily replaced by a sum, but the outer one... In this question, the suggested solution is sapply, but it implies that the summation must be done over all elements.
This is the code I am trying to evaluate before the heat death of the Universe for large n.
set.seed(1)
N <- 500
x1 <- rnorm(N)
x2 <- rchisq(N, df=3)
bw1 <- bw.nrd(x1)
bw2 <- bw.nrd(x2)
w <- outer(x1, x1, function(x, y) dnorm((x-y)/bw1) )
w <- w/rowSums(w)
v <- outer(x2, x2, function(x, y) dnorm((x-y)/bw2) )
v <- v/rowSums(v)
Bij <- matrix(NA, ncol=N, nrow=N)
for (i in 1:N) { # Around 22 secs for N=1000
for (j in 1:N) {
Bij[i, j] <- (sum(w[i, ]*v[j, ]) - w[i, i]*v[j, i] - w[i, j]*v[j, j]) * (i!=j)
}
}
Bi <- rowSums(Bij)
How would an expert R programmer vectorise such kind of loops?
Update:
In fact, given your expression for B_{ij}, we may also do the following
diag(w) <- diag(v) <- 0
BBij <- tcrossprod(w, v)
diag(BBij) <- 0
range(rowSums(BBij) - Bi)
# [1] -2.220446e-16 0.000000e+00
range(BBij - Bij)
# [1] -6.938894e-18 5.204170e-18
Hence, while somewhat obvious, it may also be an interesting observation for your purposes that neither B_{ij} nor B_i depend on the diagonals of W and V.
Initial answer:
Since
where the diagonals of W and V can be set to zero and V_{.k} denotes the sum of the k-th column of V, we have
diag(w) <- diag(v) <- 0
A <- w * v
rowSums(sweep(w, 2, colSums(v), `*`)) - rowSums(A) + diag(A)
where
range(rowSums(sweep(w, 2, colSums(v), `*`)) - rowSums(A) + diag(A) - Bi)
# [1] -1.110223e-16 1.110223e-16
Without looking into the content of your matrices w and v, your double for-loop can be replaced with simple matrix operations, using one matrix multiplication (tcrossprod), transpose (t) and diagonal extraction:
Mat.ij <- tcrossprod(w, v) -
matrix(rep(diag(w), times = N), nrow = N) * t(v) -
w * matrix(rep(diag(v), each = N), nrow = N)
diag(Mat.ij) <- 0
all.equal(Bij, Mat.ij)
[1] TRUE
I am applying a user-defined function to individual cells of a 3D array. The contents of each cell are one of the following possibilities, all of which are character vectors because of prior formatting:
"N"
"A"
""
"1"
"0"
I want to create a new 3D array of the same dimensions, where cells contain either NA or a numeric vector containing 1 or 0. Thus, I wrote a function named Numericize and used aaply to apply it to the entire array. However, it takes forever to apply it.
Numericize <- function(x){
if(!is.na(x)){
x[x=="N"] <- NA; x
x[x=="A"] <- NA; x
x[x==""] <- NA; x
x <- as.integer(x)
}
return(x)
}
The dimensions original array are 480x866x366. The function takes forever to apply using the following code:
Final.Daily.Array <- aaply(.data = Complete.Daily.Array,
.margins = c(1,2,3),
.fun = Numericize,
.progress = "text")
I am unsure if the speed issue comes from an inefficient Numericize, an inefficient aaply, or something else entirely. I considered trying to set up parallel computing using the plyr package but I wouldn't think that such a simple command would require parallel processing.
On one hand I am concerned that I created a stack overflow for myself (see this for more), but I have applied other functions to similar arrays without problems.
ex.array <- array(dim = c(3,3,3))
ex.array[,,1] <- c("N","A","","1","0","N","A","","1")
ex.array[,,2] <- c("0","N","A","","1","0","N","A","")
ex.array[,,3] <- c("1","0","N","A","","1","0","N","A")
desired.array <- array(dim = c(3,3,3))
desired.array[,,1] <- c(NA,NA,NA,1,0,NA,NA,NA,1)
desired.array[,,2] <- c(0,NA,NA,NA,1,0,NA,NA,NA)
desired.array[,,3] <- c(1,0,NA,NA,NA,1,0,NA,NA)
ex.array
desired.array
Any suggestions?
You can just use a vectorized approach:
ex.array[ex.array %in% c("", "N", "A")] <- NA
storage.mode(ex.array) <- "integer"
You can simply use the second line and it will introduce NAs by coercion.
SimNo <- 10
for (i in 1:SimNo){
z1<-rnorm(1000,0,1)
z2<-rnorm(1000,0,1)
z3<-rnorm(1000,0,1)
z4<-rnorm(1000,0,1)
z5<-rnorm(1000,0,1)
z6<-rnorm(1000,0,1)
X<-cbind(z1,z2,z3,z4,z5,z6)
sx<-scale(X)/sqrt(999)
det1<-det(t(sx)%*%sx)
detans<-do.call(rbind,lapply(1:SimNo, function(x) ifelse(det1<1,det1,0)))
}
when I run all commands with in loop except last one I get different values of determinant but when I run code with loops at once I get last value of determinant repeated for all.
Please help and guide to control all situation like this.
Is there way to have short and efficient way for this code, so that each individual variable can also be accessed.
Whenever you are repeating the same operation multiple times, and without inputs, think about using replicate. Here you can use it twice:
SimNo <- 10
det1 <- replicate(SimNo, {
X <- replicate(6, rnorm(1000, 0, 1))
sx <- scale(X) / sqrt(999)
det(t(sx) %*% sx)
})
detans <- ifelse(det1 < 1, det1, 0)
Otherwise, this is what your code should have looked with your for loop. You needed to create a vector for storing your outputs at each loop iteration:
SimNo <- 10
detans <- numeric(SimNo)
for (i in 1:SimNo) {
z1<-rnorm(1000,0,1)
z2<-rnorm(1000,0,1)
z3<-rnorm(1000,0,1)
z4<-rnorm(1000,0,1)
z5<-rnorm(1000,0,1)
z6<-rnorm(1000,0,1)
X<-cbind(z1,z2,z3,z4,z5,z6)
sx<-scale(X)/sqrt(999)
det1<-det(t(sx)%*%sx)
detans[i] <- ifelse(det1<1,det1,0)
}
Edit: you asked in the comments how to access X using replicate. You would have to make replicate create and store all your X matrices in a list. Then use the *apply family of functions to loop throughout that list to finish the computations:
X <- replicate(SimNo, replicate(6, rnorm(1000, 0, 1)), simplify = FALSE)
det1 <- sapply(X, function(x) {
sx <- scale(x) / sqrt(999)
det(t(sx) %*% sx)
})
detans <- ifelse(det1 < 1, det1, 0)
Here, X is now a list of matrices, so you can get e.g. the matrix for the second simulation by doing X[[2]].
SimNo <- 10
matdet <- matrix(data=NA, nrow=SimNo, ncol=1, byrow=TRUE)
for (i in 1:SimNo){
z1<-rnorm(1000,0,1)
z2<-rnorm(1000,0,1)
z3<-rnorm(1000,0,1)
z4<-rnorm(1000,0,1)
z5<-rnorm(1000,0,1)
z6<-rnorm(1000,0,1)
X<-cbind(z1,z2,z3,z4,z5,z6)
sx<-scale(X)/sqrt(999)
det1<-det(t(sx)%*%sx)
matdet[i] <-do.call(rbind,lapply(1:SimNo, function(x) ifelse(det1<1,det1,0)))
}
matdet
i was trying to store my simulation output inside an array. I have written the following code:
nsim=50
res=array(0,c(nsim,20,20))
for(i in 1:nsim) {
cat("simul=",i,"\n")
simulated = NULL
stik.simulated = NULL
simulated = rpp(....)
stik.simulated = STIKhat(....)
# from stik.simulated we will get $khat and $Ktheo and
# the dimension of stik.simulated$Khat-stik.simulated$Ktheo is 20 x 20
res[i,,] = stik.simulated$Khat - stik.simulated$Ktheo
}
But whenever the function is trying to store the output inside an array, I get the following error:
simul= 1
Xrange is 20 40
Yrange is -20 20
Doing quartic kernel
Error in res[, , i] = stik.simulated$Khat - stik.simulated$Ktheo :
subscript out of bounds
seeking your help. Thanks.
I think you need to organize your code to avoid such errors. I assume you are using package stpp.
First You create, a function which generate the matrix of each iteration; Try to test your function with many values.
stick_diff <- function(u,v){
u <- seq(0,u,by=1)
v <- seq(0,v,by=1)
simulated <- rpp(...) ## call here rpp with right parameters
stik <- STIKhat(xyt=simulated$xyt,
dist=u, times=v, ...)
stik$Khat-stik$Ktheo ## !! here if u#v you will have recycling!
}
Once you are sure of your function you call it the loop, with right dimensions.
nsim <- 50
u <- 20
res=array(0,c(nsim,u,u))
for(i in 1:nsim)
res[i,,] <- stick_diff(i,u,u)