How to interpret oglmx function in R programming? - logistic-regression

am currently working on a project wherein am supposed to model public acceptance on pricing schemes.
The independent variables being used for model:- Age, gender,income etc... which are categorical in nature, so I converted them into factored variables using as.factor() function.
Age Gender Income
0 1 2
0 0 0
0 0 1
I have certain other variables like Transit satisfaction, Environment improvement etc... which are ordered factors on scale of 1 to 5 . 1 being extremely dissatisfied and 5 being very satisfied.
My model is as follows :-
mdl = oglmx( prcing ~Ann_In1+Edu+Env_imp+rs_imp,data=cpdat, link = "logit", constantMEAN = F, constantSD = F, delta = 0, threshparam = NULL)
summary(mdl)
Estimate Std. error t value Pr(>|t|)
Ann_In11 0.1605540 0.3021613 0.5314 0.5951749
Ann_In12 -0.9556992 0.4218504 -2.2655 0.0234824 *
Edu1 0.0710699 0.2678081 0.2654 0.7907196
Edu2 1.0732587 0.7112519 1.5090 0.1313061
Env_imp.L -0.8524288 0.4899275 -1.7399 0.0818752 .
Env_imp.Q 0.0784353 0.3936332 0.1993 0.8420595
Env_imp.C 0.4589036 0.4498676 1.0201 0.3076878
Env_imp^4 -0.2219108 0.4423486 -0.5017 0.6159032
rd_sft.L 2.6335035 0.7362206 3.5771 0.0003475 ***
rd_sft.Q -0.7064391 0.5773880 -1.2235 0.2211377
rd_sft.C 0.0130127 0.4408486 0.0295 0.9764519
rd_sft^4 -0.2886550 0.3582014 -0.8058 0.4203318
I obtained the results as below. Am unable to interpret the results. Any leads in this can be very helpful.
In case of rd_sft (road safety ) as rd_sft.L (linear) is signiicant than other levels, can we neglect the other levels i.e Q,C,^4 in model formation ??
please through some light on model formulation and its intepretation as i am new to R.

Related

Lasso via GridSearchCV: ConvergenceWarning: Objective did not converge

I am trying to find the optimal parameter of a Lasso regression:
alpha_tune = {'alpha': np.linspace(start=0.000005, stop=0.02, num=200)}
model_tuner = Lasso(fit_intercept=True)
cross_validation = RepeatedKFold(n_splits=5, n_repeats=3, random_state=1)
model = GridSearchCV(estimator=model_tuner, param_grid=alpha_tune, cv=cross_validation, scoring='neg_mean_squared_error', n_jobs=-1).fit(features_train_std, labels_train)
print(model.best_params_['alpha'])
My variables are demeaned and standardized. But I get the following error:
ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 1.279e+02, tolerance: 6.395e-01
I know this error has been reported several times, but none of the previous posts answer how to solve it. In my case, the error is generated by the fact that the lowerbound 0.000005 is very small, but this is a reasonable value as indicated by solving the tuning problem via the information criteria:
lasso_aic = LassoLarsIC(criterion='aic', fit_intercept=True, eps=1e-16, normalize=False)
lasso_aic.fit(X_train_std, y_train)
print('Lambda: {:.8f}'.format(lasso_aic.alpha_))
lasso_bic = LassoLarsIC(criterion='bic', fit_intercept=True, eps=1e-16, normalize=False)
lasso_bic.fit(X_train_std, y_train)
print('Lambda: {:.8f}'.format(lasso_bic.alpha_))
AIC and BIC give values of around 0.000008. How can this warning be solved?
Increasing the default parameter max_iter=1000 in Lasso will do the job:
alpha_tune = {'alpha': np.linspace(start=0.000005, stop=0.02, num=200)}
model_tuner = Lasso(fit_intercept=True, max_iter=5000)
cross_validation = RepeatedKFold(n_splits=5, n_repeats=3, random_state=1)
model = GridSearchCV(estimator=model_tuner, param_grid=alpha_tune, cv=cross_validation, scoring='neg_mean_squared_error', n_jobs=-1).fit(features_train_std, labels_train)
print(model.best_params_['alpha'])

Titanic Dataset. Logistic Regression Model. Confusion Matrix gives 0 as output

I am running the logistic regression model on the Titanic Dataset with the below code:
#Modeling
#Split into train and test and fit the logistic regression model
titanic_train <- titanic_complete[1:891,]
titanic_test <- titanic_complete[892:1309,]
##############Logistic Regression ###############################
glm_model = glm(Survived~.,data= titanic_train, family = 'binomial')
summary(glm_model)
## Using anova() to analyze the table of devaiance
anova(glm_model, test="Chisq")
final_model = glm(Survived~Sex + Pclass + Age + SibSp + Cabin_f, data = titanic_train, family = 'binomial')
summary(final_model)
varImp(glm_model)
glm.pred <-predict(final_model, titanic_test, type = 'response')
glm.pred <- ifelse(glm.pred > 0.5, "yes", "no")
glm.pred
confusionMatrix(glm.pred, titanic_test$Survived)
As a result I've got this error message:
> confusionMatrix(glm.pred, titanic_test$Survived)
[1] no yes
<0 rows> (or 0-length row.names)
In Ops.factor(predictedScores, threshold) :
‘<’ not meaningful for factors
Cannot understand this error message and what is wrong. The model is working fine on the train data.
I assume it is related to the threshold i apply to the Survived variable (which is a factor variable - 1,0).

Tensorflow Probability Logistic Regression Example

I feel I must be missing something obvious, in struggling to get a positive control for logistic regression going in tensorflow probability.
I've modified the example for logistic regression here, and created a positive control features and labels data. I struggle to achieve accuracy over 60%, however this is an easy problem for a 'vanilla' Keras model (accuracy 100%). What am I missing? I tried different layers, activations, etc.. With this method of setting up the model, is posterior updating actually being performed? Do I need to specify an interceptor object? Many thanks..
### Added positive control
nSamples = 80
features1 = np.float32(np.hstack((np.reshape(np.ones(40), (40, 1)),
np.reshape(np.random.randn(nSamples), (40, 2)))))
features2 = np.float32(np.hstack((np.reshape(np.zeros(40), (40, 1)),
np.reshape(np.random.randn(nSamples), (40, 2)))))
features = np.vstack((features1, features2))
labels = np.concatenate((np.zeros(40), np.ones(40)))
featuresInt, labelsInt = build_input_pipeline(features, labels, 10)
###
#w_true, b_true, features, labels = toy_logistic_data(FLAGS.num_examples, 2)
#featuresInt, labelsInt = build_input_pipeline(features, labels, FLAGS.batch_size)
with tf.name_scope("logistic_regression", values=[featuresInt]):
layer = tfp.layers.DenseFlipout(
units=1,
activation=None,
kernel_posterior_fn=tfp.layers.default_mean_field_normal_fn(),
bias_posterior_fn=tfp.layers.default_mean_field_normal_fn())
logits = layer(featuresInt)
labels_distribution = tfd.Bernoulli(logits=logits)
neg_log_likelihood = -tf.reduce_mean(labels_distribution.log_prob(labelsInt))
kl = sum(layer.losses)
elbo_loss = neg_log_likelihood + kl
predictions = tf.cast(logits > 0, dtype=tf.int32)
accuracy, accuracy_update_op = tf.metrics.accuracy(
labels=labelsInt, predictions=predictions)
with tf.name_scope("train"):
optimizer = tf.train.AdamOptimizer(learning_rate=FLAGS.learning_rate)
train_op = optimizer.minimize(elbo_loss)
init_op = tf.group(tf.global_variables_initializer(),
tf.local_variables_initializer())
with tf.Session() as sess:
sess.run(init_op)
# Fit the model to data.
for step in range(FLAGS.max_steps):
_ = sess.run([train_op, accuracy_update_op])
if step % 100 == 0:
loss_value, accuracy_value = sess.run([elbo_loss, accuracy])
print("Step: {:>3d} Loss: {:.3f} Accuracy: {:.3f}".format(
step, loss_value, accuracy_value))
### Check with basic Keras
kerasModel = tf.keras.models.Sequential([
tf.keras.layers.Dense(1)])
optimizer = tf.train.AdamOptimizer(5e-2)
kerasModel.compile(optimizer = optimizer, loss = 'binary_crossentropy',
metrics = ['accuracy'])
kerasModel.fit(features, labels, epochs = 50) #100% accuracy
Compared to the github example, you forgot to divide by the number of examples when defining the KL divergence:
kl = sum(layer.losses) / FLAGS.num_examples
When I change this to your code, I quickly get to an accuracy of 99.9% on your toy data.
Additionaly, the output layer of your Keras model actually expects a sigmoid activation for this problem (binary classification):
kerasModel = tf.keras.models.Sequential([
tf.keras.layers.Dense(1, activation='sigmoid')])
It's a toy problem, but you will notice that the model gets to 100% accuracy faster with a sigmoid activation.

using lookup tables to plot a ggplot and table

I'm creating a shiny app and i'm letting the user choose what data that should be displayed in a plot and a table. This choice is done through 3 different input variables that contain 14, 4 and two choices respectivly.
ui <- dashboardPage(
dashboardHeader(),
dashboardSidebar(
selectInput(inputId = "DataSource", label = "Data source", choices =
c("Restoration plots", "all semi natural grasslands")),
selectInput(inputId = "Variabel", label = "Variable", choices =
choicesVariables)),
#choicesVariables definition is omitted here, because it's very long but it
#contains 14 string values
selectInput(inputId = "Factor", label = "Factor", choices = c("Company
type", "Region and type of application", "Approved or not approved
applications", "Age group" ))
),
dashboardBody(
plotOutput("thePlot"),
tableOutput("theTable")
))
This adds up to 73 choices (yes, i know the math doesn't add up there, but some choices are invalid). I would like to do this using a lookup table so a created one with every valid combination of choices like this:
rad1<-c(rep("Company type",20), rep("Region and type of application",20),
rep("Approved or not approved applications", 13), rep("Age group", 20))
rad2<-choicesVariable[c(1:14,1,4,5,9,10,11, 1:14,1,4,5,9,10,11, 1:7,9:14,
1:14,1,4,5,9,10,11)]
rad3<-c(rep("Restoration plots",14),rep("all semi natural grasslands",6),
rep("Restoration plots",14), rep("all semi natural grasslands",6),
rep("Restoration plots",27), rep("all semi natural grasslands",6))
rad4<-1:73
letaLista<-data.frame(rad1,rad2,rad3, rad4)
colnames(letaLista) <- c("Factor", "Variabel", "rest_alla", "id")
Now its easy to use subset to only get the choice that the user made. But how do i use this information to plot the plot and table without using a 73 line long ifelse statment?
I tried to create some sort of multidimensional array that could hold all the tables (and one for the plots) but i couldn't make it work. My experience with these kind of arrays is limited and this might be a simple issue, but any hints would be helpful!
My dataset that is the foundation for the plots and table consists of dataframe with 23 variables, factors and numerical. The plots and tabels are then created using the following code for all 73 combinations
s_A1 <- summarySE(Samlad_info, measurevar="Dist_brukcentrum",
groupvars="Companytype")
s_A1 <- s_A1[2:6,]
p_A1=ggplot(s_A1, aes(x=Companytype,
y=Dist_brukcentrum))+geom_bar(position=position_dodge(), stat="identity") +
geom_errorbar(aes(ymin=Dist_brukcentrum-se,
ymax=Dist_brukcentrum+se),width=.2,position=position_dodge(.9))+
scale_y_continuous(name = "") + scale_x_discrete(name = "")
where summarySE is the following function, burrowed from cookbook for R
summarySE <- function(data=NULL, measurevar, groupvars=NULL, na.rm=TRUE,
conf.interval=.95, .drop=TRUE) {
# New version of length which can handle NA's: if na.rm==T, don't count them
length2 <- function (x, na.rm=FALSE) {
if (na.rm) sum(!is.na(x))
else length(x)
}
# This does the summary. For each group's data frame, return a vector with
# N, mean, and sd
datac <- ddply(data, groupvars, .drop=.drop,
.fun = function(xx, col) {
c(N = length2(xx[[col]], na.rm=na.rm),
mean = mean (xx[[col]], na.rm=na.rm),
sd = sd (xx[[col]], na.rm=na.rm)
)
},
measurevar
)
# Rename the "mean" column
datac <- rename(datac, c("mean" = measurevar))
datac$se <- datac$sd / sqrt(datac$N) # Calculate standard error of the mean
# Confidence interval multiplier for standard error
# Calculate t-statistic for confidence interval:
# e.g., if conf.interval is .95, use .975 (above/below), and use df=N-1
ciMult <- qt(conf.interval/2 + .5, datac$N-1)
datac$ci <- datac$se * ciMult
return(datac)
}
The code in it's entirety is a bit to large but i hope this may clarify what i'm trying to do.
Well, thanks to florian's comment i think i might have found a solution my self. I'll present it here but leave the question open as there is probably far neater ways of doing it.
I rigged up the plots (that was created as lists by ggplot) into a list
plotList <- list(p_A1, p_A2, p_A3...)
tableList <- list(s_A1, s_A2, s_A3...)
I then used subset on my lookup table to get the matching id of the list to select the right plot and table.
output$thePlot <-renderPlot({
plotValue<-subset(letaLista, letaLista$Factor==input$Factor &
letaLista$Variabel== input$Variabel & letaLista$rest_alla==input$DataSource)
plotList[as.integer(plotValue[1,4])]
})
output$theTable <-renderTable({
plotValue<-subset(letaLista, letaLista$Factor==input$Factor &
letaLista$Variabel== input$Variabel & letaLista$rest_alla==input$DataSource)
skriva <- tableList[as.integer(plotValue[4])]
print(skriva)
})

How to update a Mnesia table in Erlang

I have a little problem with my code. I have a table containing car details, name, price and quantity, so I am trying to create a function called buy which will be used to buy a specific car. When a user buys eg 5 BMW cars, they will call buy_car(bmw,5). Now after this I want to update the new value of quantity for BMW cars.
My attempt is below but I can't seem to work around it, I am new to Erlang.
buy_car(X,Ncars) ->
F = fun() ->
%% ----first i find the number of car X available in the shop
[Xcars] = mnesia:read({car,X}),
Nc = Xcars#car.quantity,
Leftcars = Xcars#car{quantity = Nc - Ncars},
%% ---now we update the database
mnesia:write(Leftcars),
end,
mnesia:transaction(F).
Please help me with how I can write a function that buys cars from the shop.
But your implementation works fine except you added illegal comma after mnesia:write(Leftcars).
Here is code that works (I tried your implementation as buy_car2).
-module(q).
-export([setup/0, buy_car/2, buy_car2/2]).
-record(car, {brand, quantity}).
setup() ->
mnesia:start(),
mnesia:create_table(car, [{attributes, record_info(fields, car)}]),
mnesia:transaction(fun() -> mnesia:write(#car{brand=bmw, quantity=1000}) end).
buy_car(Brand, Ncars) ->
F = fun() ->
[Car] = mnesia:read(car, Brand), % crash if the car is missing
mnesia:write(Car#car{quantity = Car#car.quantity - Ncars})
end,
mnesia:transaction(F).
buy_car2(X,Ncars) ->
F = fun() ->
%% ----first i find the number of car X available in the shop
[Xcars] = mnesia:read({car,X}),
Nc = Xcars#car.quantity,
Leftcars = Xcars#car{quantity = Nc - Ncars},
%% ---now we update the database
mnesia:write(Leftcars)
end,
mnesia:transaction(F).
I would do something like below:
Considering the record is defined as :
-record(car_record, {car, quantity}).
The following function will update the data:
buy_car(X,NCars) ->
Row = #car_record{car = X, quantity = NCars}.
mnesia:ets(fun()-> mnesia:dirty_write(Row) end),
mnesia:change_table_copy_type(guiding_data, node(), disc_copies).
To use the above method, mnesia table must be created as "ram_copies" and with no replication nodes. Also, if there are lot of updates happening, you might not want to copy the ram_copies to disk for every update due to performance issues, rather you will do it in time triggered manner.

Resources