I ran a logistic regression with like 10 variables (with R) and some of them have high P-values (>0.05). Should we follow the elimination techniques that we follow in multiple linear regression to remove insignificant variables? Or is the method different in logistic regression?
I'm new to this field so please pardon me if this question sounds silly.
Thank you.
Related
I am performing a resource selection function using use and availability locations for a set of animals. For this type of analysis, an infinitely weighted logistic regression is suggested (Fithian and Hastie 2013) and is done by setting weights of used locations to 1 and available locations to some large number (e.g. 10,000). I know that implementing this approach using the glm function in R would be relatively simple
model1 <- glm(used ~ covariates , family=binomial, weights=weights)
I am attempting to implement this as part of a larger hierarchical bayesian model, and thus need to figure out how to incorporate weights in JAGS. In my searching online, I have not been able to find a clear example of how to use weights in specifically a logistic regression. For a poisson model, I have seen suggestions to just multiply the weights by lambda such as described here. I was uncertain if this logic would hold for weights in a logistic regression. Below is an excerpt of JAGS code for the logistic regression in my model.
alpha_NSel ~ dbeta(1,1)
intercept_NSel <- logit(alpha_NSel)
beta_SC_NSel ~ dnorm(0, tau_NSel)
tau_NSel <- 1/(pow(sigma_NSel,2))
sigma_NSel ~ dunif(0,50)
for(n in 1:N_NSel){
logit(piN[n]) <- intercept_NSel + beta_SC_NSel*cov_NSel[n]
yN[n] ~ dbern(piN[n])
}
To implement weights, would I simply change the bernoulli trial to the below? In this case, I assume I would need to adjust weights so that they are between 0 and 1. So weights for used are 1/10,000 and available are 1?
yN[n] ~ dbern(piN[n]*weights[n])
i just started with data-science, so if this is a very dumb question then please excuse me...
So, i just learnt about the sigmoid neuron, and learnt that its range is [0, 1].
The question i have is that how can it be used in regression tasks, for example to predict the cost of any real estate property, or the imdb rating of a movie, or something
I am aware of the scaling method (multiplying the output of sigmoid with any number) to get real outputs, but that works only for outputs which have an upper limit, like the imdb rating, what about stuff like the price of a commodity or something?
Thanks in advance
In Regression tasks, The output layer of the Neural Net. shouldn't be Sigmoid function. You should use a function that does not have limits in its range. Sigmoid function often used in the middle layers of a Neural net.
You can use a Linear function or a Relu (Rectified Linear Unit) for Regression tasks.
Ps: Remember that Logistic regression is an algorithm for Classification in contrast to its name. Make sure you don't mix them up. 😁
We have two prominent functions (or we can say equations) in logistic regression algorithms:
Logistic regression function.
Logit function.
I would like to know:
Which of these equation(s) is/are used in the logistic regression model building process?
At what stage of model building process which of these equation(s) is/are used?
I know that logit function is used to transform probability values (which range b/w 0 and 1) to real number values (which range b/w -Inf to +Inf). I would like to know the real purpose of logit function in logistic regression modeling process.
Here are few queries which are directly related to the purpose of logit function in Logistic regression modeling:
Has Logit function (i.e. Logit equation LN(P/1-P)) being derived from Logistic Regression equation or its the other way around?
What is the purpose of Logit equation in logistic regression equation? How logit function is used in Logistic regression algorithm? Reason for asking this question will get clear after going through point no. 3 & 4.
Upon building a logistic regression model, we get model coefficients. When we substitute these model coefficients and respective predictor values into the logistic regression equation, we get probability value of being default class (same as the values returned by predict()).
Does this mean that estimated model coefficient values are determined
based on the probability values (computed using logistic regression equation not logit equation) which will be inputed to the likelihood function to determine if it maximizes it or not? If this understanding is correct then, where the logit function is used in the entire process of model building.
Assume that - "Neither logit function is used during model building not during predicting the values". If this is the case then why do we give importance to logit function which is used to map probability values to real number values (ranging between -Inf to +Inf).
Where exactly the logit function is used in the entire logistic regression model buidling process? Is it while estimating the model coefficients?
The model coefficient estimates that we see upon running summary(lr_model) are determined using linear form of logistic regression equation (logit equation) or the actual logistic regression equation?
What is the purpose of Logit function?
The purpose of the Logit function is to convert the real space [0, 1] interval to infinity.
If you check math Logit function, it converts real space from [0,1] interval to infinity [-inf, inf].
Sigmoid and softmax will do exactly the opposite thing. They will convert the [-inf, inf] real space to [0, 1] real space.
This is why in machine learning we may use logit before sigmoid and softmax function, since they match perfectly.
I had 628 predictors after forming dummy of all categorical variables. When I ran lot many iterations traditional logistic regression iteration, I came across 15 variables that was giving me pretty good model with good ROC, recall & precision(for certain cut-off) values on test data and also all variables were significant(at p<=0.05). But since it took lot of time, I tried using lasso that gave me 50 non-zero-coefficient variables after taking best lambda value post running 10 fold cross-validation. But only 5 variables were common between 15 variables of traditional method and 50 of lasso. Moreover, when I tried to calculate its SE and t-stats, I figured out that many variables are insignificant(low t-stats and high p-value). In addition to it, the AUC for ROC was less than traditional method.The ROC drops even more when I used traditional logistic regression on 50 variables that were result of lasso. Can someone help me understand the dynamics of it and how I will be able to justify the coefficients of lasso model as they are penalized(I have normalized all the variables before using lasso)?
I have a training set with about 300000 examples and about 50-60 features and also it's a multiclass with about 7 classes. I have my logistic regression function that finds out the convergence of the parameters using gradient descent. My gradient descent algorithm, finds the parameters in matrix form as it's faster in matrix form than doing separately and linearly in loops.
Ex :
Matrix(P) <- Matrix(P) - LearningRate( T(Matrix(X)) * ( Matrix(h(X)) -Matrix(Y) ) )
For small training data, it's quite fast and gives correct values with maximum iterations to be around 1000000, but with that much training data, it's extremely slow, that with around 500 iterations it takes 18 minutes, but with that much iterations in gradient descent, the cost is still high and it does not predict the class correctly.
I know, I should implement maybe feature selection, or feature scaling and I can't use the packages provided. Language used is R. How do I go about implementing feature selection or scaling without using any library packages.
According to link, you can use either Z-score normalization or min-max scaling method. Both methods scale the data to [0,1] range. Z-score normalization is calculated as
Min-max scaling method is calculated as: