Bootstrapping the uncertainty on an RMSE estimate of a location-scale generalized additive model - loops

I have height data (numeric height data in cm; Height) of plants measured over time (numeric data expressed in days of the year; Doy). These data is grouped per genotype (factor data; Genotype) and individual plant (Factor data; Individual). I've managed to calculate the RMSE of the location-scale GAM but I can't figure out how to bootstrap the uncertainty estimate on the RMSE calculation given it is a hierarchical location-scale generalized additive model.
The code to extract the RMSE value looks something like this:
# The GAM
model <- gam(list(Height ~ s(Doy, bs = 'ps', by = Genotype) +
s(Doy, Individual, bs = "re") +
Genotype,
~ s(Doy, bs = 'ps', by = Genotype) +
s(Doy, Individual, bs = "re") +
Genotype),
family = gaulss(), # Gaussian location-scale
method = "REML",
data = data)
# Extract the model formula
form <- formula.gam(model)
# Cross-validation for the location
CV <- CVgam(form[[1]], data, nfold = 10, debug.level = 0, method = "GCV.Cp",
printit = TRUE, cvparts = NULL, gamma = 1, seed = 29)
# The root mean square error is given by taking the square root of the MSE
sqrt(CV$cvscale[1])`
There is only one height measurement per Individual per day of the year. I figure this is problematic in maintaining the exact same formulation of the GAM. In thsi regard, I was thinking of making sure that the same few Individuals of each genotype (let's say n = 4) were randomly sampled over each day of the year. I can't figure out how to proceed though. Any ideas?
I've tried several methods, such as the boot package and for loops. An example of one of things I've tried is:
lm=list();counter=0
lm2=list()
loops = 3
for (i in 1:loops){
datax <- data %>%
group_by(Doy, Genotype) %>%
slice_sample(prop = 0.6, replace = T)
datax
model <- gam(list(Height ~ s(Doy, bs = 'ps', by = Genotype) +
s(Doy, Individual, bs = "re") +
Genotype,
~ s(Doy, bs = 'ps', by = Genotype) +
s(Doy, Individual, bs = "re") +
Genotype),
family = gaulss(),
method = "REML",
data = datax)
# Extract the model formula
form <- formula.gam(model)
# Cross-validation for the location
CV <- CVgam(form[[1]], datax, nfold = 10, debug.level = 0, method = "GCV.Cp",
printit = TRUE, cvparts = NULL, gamma = 1, seed = 29)
RMSE[i] <- sqrt(CV$cvscale[c(1)])
}
RMSE
This loop runs very slow and just returns me 3 times the same RMSE values; Surely, there is an issue with the sampling.
Unfortunately, I can't share my data but maybe somebody has an idea on how to proceed?
Many thanks!

Related

Quantum walk on 3D grid

I am trying to apply the quantum coin walk on a 3D grid, with 3 Hadamard coins. However I can't seem to get symmetric results after 3 steps. Is it simply not possible to have a probability distribution which is symmetric with such a coin?
Thank you
ps the implementation is based on http://susan-stepney.blogspot.com/2014/02/mathjax.html and the position vector captures a 3D grid.
pps Has this been attempted on qiskit? I couldn't use the hard coded matrix to get result perfectly symmetric for some reasons...
Not sure I answered your question, but
from the code reference you mentioned, I only changed line 30 to:ax = fig.add_subplot(111, projection = '3d') and line 3 to:from mpl_toolkits.mplot3d import Axes3D
from numpy import *
from matplotlib.pyplot import *
from mpl_toolkits.mplot3d import Axes3D
N = 100 # number of random steps
P = 2*N+1 # number of positions
coin0 = array([1, 0]) # |0>
coin1 = array([0, 1]) # |1>
C00 = outer(coin0, coin0) # |0><0|
C01 = outer(coin0, coin1) # |0><1|
C10 = outer(coin1, coin0) # |1><0|
C11 = outer(coin1, coin1) # |1><1|
C_hat = (C00 + C01 + C10 - C11)/sqrt(2.)
ShiftPlus = roll(eye(P), 1, axis=0)
ShiftMinus = roll(eye(P), -1, axis=0)
S_hat = kron(ShiftPlus, C00) + kron(ShiftMinus, C11)
U = S_hat.dot(kron(eye(P), C_hat))
posn0 = zeros(P)
posn0[N] = 1 # array indexing starts from 0, so index N is the central posn
psi0 = kron(posn0,(coin0+coin1*1j)/sqrt(2.))
psiN = linalg.matrix_power(U, N).dot(psi0)
prob = empty(P)
for k in range(P):
posn = zeros(P)
posn[k] = 1
M_hat_k = kron( outer(posn,posn), eye(2))
proj = M_hat_k.dot(psiN)
prob[k] = proj.dot(proj.conjugate()).real
fig = figure()
ax = fig.add_subplot(111, projection = '3d')
plot(arange(P), prob)
plot(arange(P), prob, 'o')
loc = range(0, P, P // 10) #Location of ticks
xticks(loc)
xlim(0, P)
ax.set_xticklabels(range(-N, N+1, P // 10))
show()

1D-Coupled Transient Diffusion in FiPY with Reactive Boundary Condition

I would like to solve the transient diffusion equation for two compounds A and B as shown in image. I think the image is a better way to show my problem.
Diffusion equations and boundary conditions.
As you can see, the reaction only occurs at the surface and the flux of A is equal to flux of B. So, this two equations are coupled only at surface. The boundary condition is similar to ROBIN boundary condition, explained in Fipy manual. However, the main difference is the existence of the second variable in boundary condition. Does anybody have any idea how to formulate this boundary condition in Fipy?
I guess I need to add some extra term to ROBIN boundary condition, but I couldn't figure it out.
I really appreciate your help.
This is the code which solves the mentioned equation with ROBIN boundary condition # x=0.
-D(dC_A/dx) = -kC_A
-D(dC_B/dx) = -kC_B
In this condition, I can easily use ROBIN boundary condition to solve equations. The results seem reasonable for this boundary condition.
"""
Question for StackOverflow
"""
#%%
from fipy import Variable, FaceVariable, CellVariable, Grid1D, TransientTerm, DiffusionTerm, Viewer, ImplicitSourceTerm
from fipy.tools import numerix
#%%
##### Model parameters
L= 8.4853e-4 # m boundary layer thickness
dx= 1e-8 # mesh size
nx = int(L/dx)+1 # number of meshes
D = 1e-9 # m^2/s diffusion coefficient
k = 1e-4 # m/s reaction coefficient R = k [c_A],
c_inf = 0. # ROBIN general condition, once can think R = k ([c_A]-[c_inf])
c_init = 1. # Initial concentration of compound A, mol/m^3
#%%
###### Meshing and variable definition
mesh = Grid1D(nx=nx, dx=dx)
c_A = CellVariable(name="c_A", hasOld = True,
mesh=mesh,
value=c_init)
c_B = CellVariable(name="c_B", hasOld = True,
mesh=mesh,
value=0.)
#%%
##### Right boundary condition
valueRight = c_init
c_A.constrain(valueRight, mesh.facesRight)
c_B.constrain(0., mesh.facesRight)
#%%
### ROBIN BC requirements, defining cellDistanceVectors
## This code is for fixing celldistance via this link:
## https://stackoverflow.com/questions/60073399/fipy-problem-with-grid2d-celltofacedistancevectors-gives-error-uniformgrid2d
MA = numerix.MA
tmp = MA.repeat(mesh._faceCenters[..., numerix.NewAxis,:], 2, 1)
cellToFaceDistanceVectors = tmp - numerix.take(mesh._cellCenters, mesh.faceCellIDs, axis=1)
tmp = numerix.take(mesh._cellCenters, mesh.faceCellIDs, axis=1)
tmp = tmp[..., 1,:] - tmp[..., 0,:]
cellDistanceVectors = MA.filled(MA.where(MA.getmaskarray(tmp), cellToFaceDistanceVectors[:, 0], tmp))
#%%
##### Defining mask and Robin BC at left boundary
mask = mesh.facesLeft
Gamma0 = D
Gamma = FaceVariable(mesh=mesh, value=Gamma0)
Gamma.setValue(0., where=mask)
dPf = FaceVariable(mesh=mesh,
value=mesh._faceToCellDistanceRatio * cellDistanceVectors)
n = mesh.faceNormals
a = FaceVariable(mesh=mesh, value=k, rank=1)
b = FaceVariable(mesh=mesh, value=D, rank=0)
g = FaceVariable(mesh=mesh, value= k * c_inf, rank=0)
RobinCoeff = (mask * Gamma0 * n / (-dPf.dot(a)+b))
#%%
#### Making a plot
viewer = Viewer(vars=(c_A, c_B),
datamin=-0.2, datamax=c_init * 1.4)
viewer.plot()
#%% Time step and simulation time definition
time = Variable()
t_simulation = 4 # seconds
timeStepDuration = .05
steps = int(t_simulation/timeStepDuration)
#%% PDE Equations
eqcA = (TransientTerm(var=c_A) == DiffusionTerm(var=c_A, coeff=Gamma) +
(RobinCoeff * g).divergence
- ImplicitSourceTerm(var=c_A, coeff=(RobinCoeff * a.dot(-n)).divergence))
eqcB = (TransientTerm(var=c_B) == DiffusionTerm(var=c_B, coeff=Gamma) -
(RobinCoeff * g).divergence
+ ImplicitSourceTerm(var=c_B, coeff=(RobinCoeff * a.dot(-n)).divergence))
#%% A loop for solving PDE equations
while time() <= (t_simulation):
time.setValue(time() + timeStepDuration)
c_B.updateOld()
c_A.updateOld()
res1=res2 = 1e10
viewer.plot()
while (res1 > 1e-6) & (res2 > 1e-6):
res1 = eqcA.sweep(var=c_A, dt=timeStepDuration)
res2 = eqcB.sweep(var=c_B, dt=timeStepDuration)
It's possible to solve this as a fully implicit system. The code below simplifies the problem to have a unity domain size and diffusion coefficient. k is set to 0.2. It captures the analytical solution quite well with some caveats (see below).
from fipy import (
CellVariable,
TransientTerm,
DiffusionTerm,
ImplicitSourceTerm,
Grid1D,
Viewer,
)
L = 1.0
nx = 1000
dx = L / nx
konstant = 0.2
coeff = 1.0
mesh = Grid1D(nx=nx, dx=dx)
var_a = CellVariable(mesh=mesh, value=1.0, hasOld=True)
var_b = CellVariable(mesh=mesh, value=0.0, hasOld=True)
var_a.constrain(1.0, mesh.facesRight)
var_b.constrain(0.0, mesh.facesRight)
coeff_mask = ~mesh.facesLeft * coeff
boundary_coeff = konstant * (mesh.facesLeft * mesh.faceNormals).divergence
eqn_a = TransientTerm(var=var_a) == DiffusionTerm(
coeff_mask, var=var_a
) - ImplicitSourceTerm(boundary_coeff, var=var_a) + ImplicitSourceTerm(
boundary_coeff, var=var_b
)
eqn_b = TransientTerm(var=var_b) == DiffusionTerm(
coeff_mask, var=var_b
) - ImplicitSourceTerm(boundary_coeff, var=var_b) + ImplicitSourceTerm(
boundary_coeff, var=var_a
)
eqn = eqn_a & eqn_b
for _ in range(5):
var_a.updateOld()
var_b.updateOld()
eqn.sweep(dt=1e10)
Viewer((var_a, var_b)).plot()
print("var_a[0] (expected):", (1 + konstant) / (1 + 2 * konstant))
print("var_b[0] (expected):", konstant / (1 + 2 * konstant))
print("var_a[0] (actual):", var_a[0])
print("var_b[0] (actual):", var_b[0])
input("wait")
Note the following:
As written the boundary condition is only first order accurate, which doesn't really matter for this problem, but might hurt you for in higher dimensions. There might be ways to fix this such as having a small cell near the boundary or adding in an explicit second order correction for the boundary condition.
The equations are coupled here. If uncoupled it would probably require loads of iterations to reach equilibrium.
It did require a few iterations to reach equilibrium, but it shouldn't. That's probably due to the solver not converging adequately without a few tries. It might be that coupled equations have some bad conditioning.

adding array to an existing array

I perform calculations using 5 fold cross validation. I want to collect all the predictions in one array in order to avoid statistic calculation per fold. I have tried doing it by extending array of predictions by adding array to an existing array. For example:
for train_index, test_index in skf:
fold += 1
x_train, x_test = X[train_index], X[test_index]
y_train, y_test = y[train_index], y[test_index]
rf.fit(x_train, y_train)
predicted = rf.predict_proba(x_test)
round_predicted = rf.predict(x_test)
if fold>1:
allFolds_pred = np.concatenate((predicted, allFolds_pred), axis=1)
allFolds_rpred = np.concatenate((round_predicted, allFolds_rpred), axis=1)
allFolds_y = np.concatenate((y_test, allFolds_y), axis=1)
else:
allFolds_pred = predicted
allFolds_rpred = round_predicted
allFolds_y = y_test
fpr, tpr, _ = roc_curve(allFolds_y, llFolds_pred[:,1])
roc_auc = auc(fpr, tpr)
cm=confusion_matrix(allFolds_y, allFolds_rpred, labels=[0, 1])
Calculate statistics.
However it is not working. What is the best way to proceed? Is there any better way to do the same?

MATLAB : how to solve coupled differential equations dependend on data stored in arrays

I want to solve a system of two ordinary differential equations in MATLAB.
The parameters of the ODEs depend on measured data stored in two arrays, F and T.
When I run the program, I always get the error shown below. I am sure it has something to do with the arrays, because when I use single numbers for F and T (e.g. F = 60; T = 30;) the program works fine.
Subscript indices must either be real positive integers or logicals.
Error in dynamics (line 46)
ddyn(1) = k1*F(t) + v_b(t) - k_1*dyn(1) - v_a(t);
Error in ode23 (line 256)
f(:,2) = feval(odeFcn,t+h*A(1),y+f*hB(:,1),odeArgs{:});
Error in main (line 33)
[t,sol] = ode23(#dynamics , (1:1:3000),[0 0]);
Here is the code I use for the main function and the ODE system:
Main function:
[t,sol] = ode45(#dynamics , (1:1:3000),[0 0]);
ODE system:
function [ddyn] = dynamics(t,dyn)
% constant numbers
k1 = 10^-2; k_1 = 8* 10^-3; k2 = 10^-2; k_2 = 4*10^-3;
V_max_a = 1.6; V_max_b = 3.5;
K_M_a = 1.5*10^-3; K_M_b = 2*10^-3;
K_a_F = 9.4*10^5; K_a_T = 3.9*10; K_b_F = 1.3*10^4; K_b_T = 1.2*10^-10;
r_a_F = 4.3*10^7; r_a_T = 4.2*10^9; r_b_F = 6.9*10^-7; r_b_T = 6.2*10^-9;
%arrays with data e.g.
F = 1:3000;
T = 1:3000;
% program works if I use numbers, e.g.:
%F = 60; T = 30;
ddyn = zeros(2,1);
R_a_F = (K_a_F + r_a_F* F)/(K_a_F + r_a_F);
R_a_T = (K_a_T + r_a_T* T)/(K_a_T + r_a_T);
R_b_F = (K_b_F + r_b_F* F)/(K_b_F + r_b_F);
R_b_T = (K_b_T + r_b_T* T)/(K_b_T + r_b_T);
v_a = (V_max_a*dyn(1))/(K_M_a + dyn(1))*R_a_F .*R_a_T;
v_b = (V_max_b*dyn(2))/(K_M_b + dyn(2))*R_b_F .*R_b_T;
ddyn(1) = k1*F(t) + v_b(t) - k_1*dyn(1) - v_a(t);
ddyn(2) = k2*T(t) + v_a(t) - k_2*dyn(2) - v_b(t);
All of the functions in the Matlab ODE suite, including ode45, assume t to be continuous and use a dynamic time-step to achieve a certain level of accuracy.1 As such, you cannot assume t to be an integer and should never be used as an index as you are doing with F(t). To quote from the documentation:
If tspan contains more than two elements [t0,t1,t2,...,tf], then the solver returns the solution evaluated at the given points. This does not affect the internal steps that the solver uses to traverse from tspan(1) to tspan(end). Thus, the solver does not necessarily step precisely to each point specified in tspan.
Therefore, assuming F and T are continuous functions in time, I'd recommend making a function that performs interpolation in time, more than likely via interp1, and pass that function to your ODE function through parametrization. For example:
tspan = 1:3000;
Ffun = #(t) interp1(tspan,F,t); % Default is linear
[t,sol] = ode45(#(t,dyn) dynamics(t,dyn,Ffun) , tspan , [0 0]);
That's just an example but should, hopefully, be serviceable.
1 In particular, ode45 uses the Dormand-Prince (4,5) Runkge-Kutta pair for its time integration; in short, the function compares a fourth order and fifth order solution to decide if the result from the current time-step is good enough or if it should be reduced.

How to create a grid from 1D array using R?

I have a file which contains a 209091 element 1D binary array representing the global land area
which can be downloaded from here:
ftp://sidads.colorado.edu/DATASETS/nsidc0451_AMSRE_Land_Parms_v01/AMSRE_flags_2002/
I want to create a full from the 1D data arrays using provided ancillary row and column files .globland_r and globland_c which can be downloaded from here:
ftp://sidads.colorado.edu/DATASETS/nsidc0451_AMSRE_Land_Parms_v01/AMSRE_ancil/
There is a code written in Matlab for this purpose and I want to translate this Matlab code to R but I do not know Matlab
function [gridout, EASE_r, EASE_s] = mkgrid_global(x)
%MKGRID_GLOBAL(x) Creates a matrix for mapping
% gridout = mkgrid_global(x) uses the 2090887 element array (x) and returns
%Load ancillary EASE grid row and column data, where <MyDir> is the path to
%wherever the globland_r and globland_c files are located on your machine.
fid = fopen('C:\MyDir\globland_r','r');
EASE_r = fread(fid, 209091, 'int16');
fclose(fid);
fid = fopen('C:\MyDir\globland_c','r');
EASE_s = fread(fid, 209091, 'int16');
fclose(fid);
gridout = NaN.*zeros(586,1383);
%Loop through the elment array
for i=1:1:209091
%Distribute each element to the appropriate location in the output
%matrix (but MATLAB is
%(1,1)
end
EDit following the solution of #mdsumner:
The files MLLATLSB and MLLONLSB (4-byte integers) contain latitude and longitude (multiply by 1e-5) for geo-locating the full global EASE grid matrix (586×1383)
MLLATLSB and MLLONLSB can be downloaded from here:
ftp://sidads.colorado.edu/DATASETS/nsidc0451_AMSRE_Land_Parms_v01/AMSRE_ancil/
## the sparse dims, literally the xcol * yrow indexes
dims <- c(1383, 586)
cfile <- "ftp://sidads.colorado.edu/DATASETS/nsidc0451_AMSRE_Land_Parms_v01/AMSRE_ancil/globland_c"
rfile <- "ftp://sidads.colorado.edu/DATASETS/nsidc0451_AMSRE_Land_Parms_v01/AMSRE_ancil/globland_r"
## be nice, don't abuse this
col <- readBin(cfile, "integer", n = prod(dims), size = 2, signed = FALSE)
row <- readBin(rfile, "integer", n = prod(dims), size = 2, signed = FALSE)
## example data file
fdat <- "ftp://sidads.colorado.edu/DATASETS/nsidc0451_AMSRE_Land_Parms_v01/AMSRE_flags_2002/flags_2002170A.bin"
dat <- readBin(fdat, "integer", n = prod(dims), size = 1, signed = FALSE)
## now get serious
m <- matrix(as.integer(NA), dims[2L], dims[1L])
m[cbind(row + 1L, col + 1L)] <- dat
image(t(m)[,dims[2]:1], col = rainbow(length(unique(m)), alpha = 0.5))
Maybe we can reconstruct this map projection too.
flon <- "MLLONLSB"
flat <- "MLLATLSB"
## the key is that these are integers, floats scaled by 1e5
lon <- readBin(flon, "integer", n = prod(dims), size = 4) * 1e-5
lat <- readBin(flat, "integer", n = prod(dims), size = 4) * 1e-5
## this is all we really need from now on
range(lon)
range(lat)
library(raster)
library(rgdal) ## need for coordinate transformation
ex <- extent(projectExtent(raster(extent(range(lon), range(lat)), crs = "+proj=longlat"), "+proj=cea"))
grd <- raster(ncols = dims[1L], nrows = dims[2L], xmn = xmin(ex), xmx = xmax(ex), ymn = ymin(ex), ymx = ymax(ex), crs = "+proj=cea")
There is probably an "out by half pixel" error in there, left as an exercise.
Test
plot(setValues(grd, m), col = rainbow(max(m, na.rm = TRUE), alpha = 0.5))
Hohum
library(maptools)
data(wrld_simpl)
plot(spTransform(wrld_simpl, CRS(projection(grd))), add = TRUE)
We can now save the valid cellnumbers to match our "grd" template, then read any particular dat-file and just populate the template with those values based on cellnumbers. Also, it seems someone trod nearly this path earlier but not much was gained:
How to identify lat and long for a global matrix?

Resources