Confidence Intervals for prediction of a logistic generalized linear mixed model (GLMM) - logistic-regression

I am running a logistic generalized linear mixed model and would like to plot my effects together with confidence intervals.
I use the lme4 package to fit my model:
glmer (cbind(positive, negative) ~ F1 * F2 * F3 + V1 + F1 * I(V1^2) + V2 + F1 * I(V2^2) + V3 + I(V3^2) + V4 + I(V4^2) + F4 + (1|OLRE) + (1|ind), family = binomial, data = try, na.action = na.omit, control=glmerControl(optimizer = "optimx", calc.derivs = FALSE, optCtrl = list(method = "nlminb", starttests = FALSE, kkt = FALSE)))
OLRE means that I use an observational level random effect in order to overcome overdispersion.
If you wonder because of convergence warnings, I went through the lme4 troubleshooting protocol, and it should be fine.
In order to get effect plots with confidence intervals, I tried to use ggpredict:
sjPlot, e.g.:
plot_model(mod, type = "pred", terms = c("F1", "F2", "F3"), ci.lvl = 0.95)
I also tried to go through this protocol:
https://rpubs.com/hughes/84484
intdata <- expand.grid(
positive = c(0,1),
negative = c(0,1),
F1 = as.factor(c(0,1)),
F2 = as.factor(c(1,2,3)),
F3= as.factor(c(1,2,3,4)),
V1= as.numeric(median(try$V1)),
F4= as.factor(c(30, 31)),
ind= as.factor(c(68)),
OLRE = as.factor(c(2450)),
V2= as.numeric(median(try$V2)),
V3= as.numeric(median(try$V3)),
V4= as.numeric(median(try$V4))
)
#conditional variances
cV <- ranef(mod, condVar = TRUE)
ranvar <- attr(cV[[1]], "postVar")
sqrt(diag(ranvar[,,1]))
mm <- model.matrix(terms( mod), data=intdata,
contrasts.arg = lapply(intdata[,c(3:5, 7:9)], contrasts, contrasts=FALSE))
predFun<-function(.) mm%*%fixef(.)
bb<-bootMer(mod,FUN=predFun,nsim=3)
with some problems and warnings regarding contrasts and
Error in mm %*% fixef(.) : non-conformable arguments
Therefore, I am wondering if anyone might help me but so far I am not able to Nothing worked so far, so I would really appreciate some help.
Here the link to the data https://drive.google.com/file/d/1qZaJBbM1ggxwPnZ9bsTCXL7_BnXlvfkW/view?usp=sharing

Related

Solving stochastic PDEs in Fipy

I would like to simulate coupled PDEs with a Gaussian white noise field, and was unable to find any examples or documentation that suggests how it should be done. In particular, I am interested in Cahn-Hilliard-like systems with noise:
d/dt(phi) = div(grad(psi)) + div(noise)
psi = f(phi) + div(grad(phi))
Is there a way to implement this in Fipy?
You can add a GaussianNoiseVariable as a source to your equations.
For a non-conserved field, you would do, e.g.,
noise = fp.GaussianNoiseVariable(mesh=..., mean=..., variance=...)
:
:
eq = fp.TransientTerm(...) == fp.DiffusionTerm(...) + ... + noise
for step in steps:
noise.scramble()
:
:
eq.solve(...)
For a conserved field, you would use:
eq = fp.TransientTerm(...) == fp.DiffusionTerm(...) + ... + noise.faceGrad.divergence

The gof function from package btergm gives AUC value of a precision-recall greater than 1

I was trying to do out-of-sample prediction using the gof function from package btergm. When calculating the AUC value of a precision-recall curve from the testing set, I get the result of 1.012909, which seems to be theoretically impossible.
How can I interpret the result, or am I doing something wrong. Thank you! Here is my code:
network <- readRDS(url("https://www.dropbox.com/s/zxhqxa8h9awkzpd/network.rds?dl=1"))
model.3<- btergm(network[1:9]~edges+gwodegree(1, fixed=TRUE)+transitiveties + ctriple +
gwidegree(1, fixed=TRUE)+mutual+gwesp(1.5, fixed=TRUE)+ttriple+
memory(type="stability")+delrecip,R=1000)
gof.3 <-gof(model.3, nsim = 1000,target=network[[10]],formula = network[9:10]~edges+gwodegree(1,fixed=TRUE) +
transitiveties + ctriple +gwidegree(1,fixed=TRUE)+mutual+gwesp(1.5, fixed=TRUE)+ttriple+
memory(type="stability")+delrecip,coef = coef(model.3),
statistics = rocpr)
gof.3[[1]]$auc.pr

Tensorflow Probability Logistic Regression Example

I feel I must be missing something obvious, in struggling to get a positive control for logistic regression going in tensorflow probability.
I've modified the example for logistic regression here, and created a positive control features and labels data. I struggle to achieve accuracy over 60%, however this is an easy problem for a 'vanilla' Keras model (accuracy 100%). What am I missing? I tried different layers, activations, etc.. With this method of setting up the model, is posterior updating actually being performed? Do I need to specify an interceptor object? Many thanks..
### Added positive control
nSamples = 80
features1 = np.float32(np.hstack((np.reshape(np.ones(40), (40, 1)),
np.reshape(np.random.randn(nSamples), (40, 2)))))
features2 = np.float32(np.hstack((np.reshape(np.zeros(40), (40, 1)),
np.reshape(np.random.randn(nSamples), (40, 2)))))
features = np.vstack((features1, features2))
labels = np.concatenate((np.zeros(40), np.ones(40)))
featuresInt, labelsInt = build_input_pipeline(features, labels, 10)
###
#w_true, b_true, features, labels = toy_logistic_data(FLAGS.num_examples, 2)
#featuresInt, labelsInt = build_input_pipeline(features, labels, FLAGS.batch_size)
with tf.name_scope("logistic_regression", values=[featuresInt]):
layer = tfp.layers.DenseFlipout(
units=1,
activation=None,
kernel_posterior_fn=tfp.layers.default_mean_field_normal_fn(),
bias_posterior_fn=tfp.layers.default_mean_field_normal_fn())
logits = layer(featuresInt)
labels_distribution = tfd.Bernoulli(logits=logits)
neg_log_likelihood = -tf.reduce_mean(labels_distribution.log_prob(labelsInt))
kl = sum(layer.losses)
elbo_loss = neg_log_likelihood + kl
predictions = tf.cast(logits > 0, dtype=tf.int32)
accuracy, accuracy_update_op = tf.metrics.accuracy(
labels=labelsInt, predictions=predictions)
with tf.name_scope("train"):
optimizer = tf.train.AdamOptimizer(learning_rate=FLAGS.learning_rate)
train_op = optimizer.minimize(elbo_loss)
init_op = tf.group(tf.global_variables_initializer(),
tf.local_variables_initializer())
with tf.Session() as sess:
sess.run(init_op)
# Fit the model to data.
for step in range(FLAGS.max_steps):
_ = sess.run([train_op, accuracy_update_op])
if step % 100 == 0:
loss_value, accuracy_value = sess.run([elbo_loss, accuracy])
print("Step: {:>3d} Loss: {:.3f} Accuracy: {:.3f}".format(
step, loss_value, accuracy_value))
### Check with basic Keras
kerasModel = tf.keras.models.Sequential([
tf.keras.layers.Dense(1)])
optimizer = tf.train.AdamOptimizer(5e-2)
kerasModel.compile(optimizer = optimizer, loss = 'binary_crossentropy',
metrics = ['accuracy'])
kerasModel.fit(features, labels, epochs = 50) #100% accuracy
Compared to the github example, you forgot to divide by the number of examples when defining the KL divergence:
kl = sum(layer.losses) / FLAGS.num_examples
When I change this to your code, I quickly get to an accuracy of 99.9% on your toy data.
Additionaly, the output layer of your Keras model actually expects a sigmoid activation for this problem (binary classification):
kerasModel = tf.keras.models.Sequential([
tf.keras.layers.Dense(1, activation='sigmoid')])
It's a toy problem, but you will notice that the model gets to 100% accuracy faster with a sigmoid activation.

Echo state prediction laziness

I am trying to use PyBrain for time series prediction by implementing this solution. The others produce large offsets. The problem is that although I have tried changing the learning rate, momentum, max training epochs, continue epochs, neuron amount (1-500), and activation function, the result is always flat. What might the solution be?
Blue: original. Green: network's prediction.
INPUTS = 60
HIDDEN = 60
OUTPUTS = 1
def build_network():
net = buildNetwork(INPUTS, HIDDEN, OUTPUTS,
hiddenclass=LSTMLayer,
outclass=LinearLayer,
recurrent=True,
bias=True, outputbias=False)
net.sortModules()
return net
def prepare_datasets(data, training_data_ratio):
training_data, validation_data = split_list(data, training_data_ratio)
training_set = SequentialDataSet(INPUTS, OUTPUTS)
for i in range(len(training_data) - INPUTS - 1):
training_set.newSequence()
tr_inputs = training_data[i:i + INPUTS]
tr_output = training_data[i + INPUTS]
training_set.addSample(tr_inputs, tr_output)
validation_set = []
for i in range(len(validation_data) - INPUTS - 1):
validation_set.append(validation_data[i:i + INPUTS])
return training_set, validation_set
def train_network(net, data, max_iterations):
net.randomize()
learning_rate = 0.1
trainer = BackpropTrainer(net, data, verbose=True,
momentum=0.8,
learningrate=learning_rate)
errors = trainer.trainUntilConvergence(maxEpochs=max_iterations, continueEpochs=10)
return errors
def try_network(net, data):
outputs = []
for item in data:
output = net.activate(item)[0]
outputs.append(output)
return outputs
Normalize data:
data = data / max(data)

How to improve ANN results by reducing error through hidden layer size, through MSE, or by using while loop?

This is my source code and I want to reduce the possible errors. When running this code there is a lot of difference between trained output to target. I have tried different ways but didn't work so please help me reducing it.
a=[31 9333 2000;31 9500 1500;31 9700 2300;31 9700 2320;31 9120 2230;31 9830 2420;31 9300 2900;31 9400 2500]'
g=[35000;23000;3443;2343;1244;9483;4638;4739]'
h=[31 9333 2000]'
inputs =(a);
targets =[g];
% Create a Fitting Network
hiddenLayerSize = 1;
net = fitnet(hiddenLayerSize);
% Choose Input and Output Pre/Post-Processing Functions
% For a list of all processing functions type: help nnprocess
net.inputs{1}.processFcns = {'removeconstantrows','mapminmax'};
net.outputs{2}.processFcns = {'removeconstantrows','mapminmax'};
% Setup Division of Data for Training, Validation, Testing
% For a list of all data division functions type: help nndivide
net.divideFcn = 'dividerand'; % Divide data randomly
net.divideMode = 'sample'; % Divide up every sample
net.divideParam.trainRatio = 70/100;
net.divideParam.valRatio = 15/100;
net.divideParam.testRatio = 15/100;
% For help on training function 'trainlm' type: help trainlm
% For a list of all training functions type: help nntrain
net.trainFcn = 'trainlm'; % Levenberg-Marquardt
% Choose a Performance Function
% For a list of all performance functions type: help nnperformance
net.performFcn = 'mse'; % Mean squared error
% Choose Plot Functions
% For a list of all plot functions type: help nnplot
net.plotFcns = {'plotperform','plottrainstate','ploterrhist', ...
'plotregression','plotconfusion' 'plotfit','plotroc'};
% Train the Network
[net,tr] = train(net,inputs,targets);
plottrainstate(tr)
% Test the Network
outputs = net(inputs)
errors = gsubtract(targets,outputs)
fprintf('errors = %4.3f\t',errors);
performance = perform(net,targets,outputs);
% Recalculate Training, Validation and Test Performance
trainTargets = targets .* tr.trainMask{1};
valTargets = targets .* tr.valMask{1};
testTargets = targets .* tr.testMask{1};
trainPerformance = perform(net,trainTargets,outputs);
valPerformance = perform(net,valTargets,outputs);
testPerformance = perform(net,testTargets,outputs);
% View the Network
view(net);
sc=sim(net,h)
I think you need to be more specific.
What is the performance like on your training set and on your test set?
Have you tried doing any regularization?

Resources