I am performing lasso variable selection using either the clogitL1 package for a conditional logistic regression but I need to control for age and gender. Is there a way (code) to force the variables into the model selection? --I assume it would be inappriopate to leave them out of lasso and control for them in the conditional logistic regression.
case/control study (strata 1:3)
Are there alternative suggestions?
Related
While reading the paper "A Unified Approach to Interpreting Model
Predictions" by Lundberg and Lee (https://proceedings.neurips.cc/paper/2017/file/8a20a8621978632d76c43dfd28b67767-Paper.pdf), on page 3 I see:
Shapley sampling values are meant to explain any model by: (1) applying sampling approximations
to Equation 4, and (2) approximating the effect of removing a variable from the model by integrating
over samples from the training dataset. This eliminates the need to retrain the model and allows fewer
than pow(2,|F|) differences to be computed. Since the explanation model form of Shapley sampling values
is the same as that for Shapley regression values, it is also an additive feature attribution method.
My question is: how does sampling from the training dataset eliminate the need to retrain models? It is not obvious to me and I cannot think of a mathematical proof. Any reference or explanation would be greatly appreciated. My internet searches have been unsuccessful. Thank you.
This is a general question without codes.
My dataframe consists of a binary response variable and ordinal predictor variables (likert-type scale). I want to do partial least squares by retrieving the most relevant components from the predictor variables (1st stage) and using those as my new predictors for a logit model - 2nd stage (since my response is binary).
So far, the package plsRglm seem the most applicable since it allows a logit in the second stage. The challenge is that it seems plsRglm does not have provision for ordinal factor variables. If you know about the plsRglm package, could you please suggest how to handle ordinal factor variables?
Or could you suggest another package that solves this problem?
Thanks
I am performing a resource selection function using use and availability locations for a set of animals. For this type of analysis, an infinitely weighted logistic regression is suggested (Fithian and Hastie 2013) and is done by setting weights of used locations to 1 and available locations to some large number (e.g. 10,000). I know that implementing this approach using the glm function in R would be relatively simple
model1 <- glm(used ~ covariates , family=binomial, weights=weights)
I am attempting to implement this as part of a larger hierarchical bayesian model, and thus need to figure out how to incorporate weights in JAGS. In my searching online, I have not been able to find a clear example of how to use weights in specifically a logistic regression. For a poisson model, I have seen suggestions to just multiply the weights by lambda such as described here. I was uncertain if this logic would hold for weights in a logistic regression. Below is an excerpt of JAGS code for the logistic regression in my model.
alpha_NSel ~ dbeta(1,1)
intercept_NSel <- logit(alpha_NSel)
beta_SC_NSel ~ dnorm(0, tau_NSel)
tau_NSel <- 1/(pow(sigma_NSel,2))
sigma_NSel ~ dunif(0,50)
for(n in 1:N_NSel){
logit(piN[n]) <- intercept_NSel + beta_SC_NSel*cov_NSel[n]
yN[n] ~ dbern(piN[n])
}
To implement weights, would I simply change the bernoulli trial to the below? In this case, I assume I would need to adjust weights so that they are between 0 and 1. So weights for used are 1/10,000 and available are 1?
yN[n] ~ dbern(piN[n]*weights[n])
Similar to Question here:
If I have one of the dummies of the categorical variables which has high VIF (multicollinearity), I would assume it should not be removed from the predictor list.
But the logistic regression of statsmodels has the 'Singular matrix' problem. What to do when this happens?
Possible solutions: 1. To remove all the dummies of this categorical variable; 2. To remove the high VIF dummy only, which makes the categorical variable missing one subcategory.
Thanks!
I'm performing logistical regression with SPSS and Exp(B) is showing the reciprocal of what I'd like. E.g., where I'd like to display, say 2.0, Exp(B) is listed as 0.5. My variables are all categorical, so the coding is arbitrary.
I know I can recode variables, but I'm wondering if there's a simple setting in one of the dialogs to display reciprocals or recode on the fly? If possible, I'd like to do it through the UI rather than the command line input?
If you're using the LOGISTIC REGRESSION procedure (Analyze>Regression>Binary Logistic in the menus), clicking on the Categorical button will allow you to specify predictor variables as categorical and the desired type of contrast coding for each one. As long as the variables of interest are binary or the contrasts you want use either the first or last level of the variables as the reference category in forming the contrasts, you can specify them in that dialog box in order to get what you want.
If a variable has more than two levels and you want to use a category other than the first or the last as the reference category, you'd have to paste the command from the dialogs and add the sequential number of the desired category to the CONTRAST subcommand for that predictor variable. For example, if you have a three-category variable named X and you want to compare the first and third categories against the second one, you'd edit it to read
/CONTRAST (X)=Indicator(2)
or
/CONTRAST (X)=Simple(2)
depending on the type of contrasts specified in the dialogs (these two would produce the same results for these contrasts in models where X is not contained in an interaction term also in the model, differing only in how the constant or intercept is represented).