What is the purpose of Logit function? At what stage of model building process this logit function is used? - logistic-regression

We have two prominent functions (or we can say equations) in logistic regression algorithms:
Logistic regression function.
Logit function.
I would like to know:
Which of these equation(s) is/are used in the logistic regression model building process?
At what stage of model building process which of these equation(s) is/are used?
I know that logit function is used to transform probability values (which range b/w 0 and 1) to real number values (which range b/w -Inf to +Inf). I would like to know the real purpose of logit function in logistic regression modeling process.
Here are few queries which are directly related to the purpose of logit function in Logistic regression modeling:
Has Logit function (i.e. Logit equation LN(P/1-P)) being derived from Logistic Regression equation or its the other way around?
What is the purpose of Logit equation in logistic regression equation? How logit function is used in Logistic regression algorithm? Reason for asking this question will get clear after going through point no. 3 & 4.
Upon building a logistic regression model, we get model coefficients. When we substitute these model coefficients and respective predictor values into the logistic regression equation, we get probability value of being default class (same as the values returned by predict()).
Does this mean that estimated model coefficient values are determined
based on the probability values (computed using logistic regression equation not logit equation) which will be inputed to the likelihood function to determine if it maximizes it or not? If this understanding is correct then, where the logit function is used in the entire process of model building.
Assume that - "Neither logit function is used during model building not during predicting the values". If this is the case then why do we give importance to logit function which is used to map probability values to real number values (ranging between -Inf to +Inf).
Where exactly the logit function is used in the entire logistic regression model buidling process? Is it while estimating the model coefficients?
The model coefficient estimates that we see upon running summary(lr_model) are determined using linear form of logistic regression equation (logit equation) or the actual logistic regression equation?

What is the purpose of Logit function?
The purpose of the Logit function is to convert the real space [0, 1] interval to infinity.
If you check math Logit function, it converts real space from [0,1] interval to infinity [-inf, inf].
Sigmoid and softmax will do exactly the opposite thing. They will convert the [-inf, inf] real space to [0, 1] real space.
This is why in machine learning we may use logit before sigmoid and softmax function, since they match perfectly.

Related

Individual P-values in Logistic Regression

I ran a logistic regression with like 10 variables (with R) and some of them have high P-values (>0.05). Should we follow the elimination techniques that we follow in multiple linear regression to remove insignificant variables? Or is the method different in logistic regression?
I'm new to this field so please pardon me if this question sounds silly.
Thank you.

How to implement a weighted logistic regression in JAGS?

I am performing a resource selection function using use and availability locations for a set of animals. For this type of analysis, an infinitely weighted logistic regression is suggested (Fithian and Hastie 2013) and is done by setting weights of used locations to 1 and available locations to some large number (e.g. 10,000). I know that implementing this approach using the glm function in R would be relatively simple
model1 <- glm(used ~ covariates , family=binomial, weights=weights)
I am attempting to implement this as part of a larger hierarchical bayesian model, and thus need to figure out how to incorporate weights in JAGS. In my searching online, I have not been able to find a clear example of how to use weights in specifically a logistic regression. For a poisson model, I have seen suggestions to just multiply the weights by lambda such as described here. I was uncertain if this logic would hold for weights in a logistic regression. Below is an excerpt of JAGS code for the logistic regression in my model.
alpha_NSel ~ dbeta(1,1)
intercept_NSel <- logit(alpha_NSel)
beta_SC_NSel ~ dnorm(0, tau_NSel)
tau_NSel <- 1/(pow(sigma_NSel,2))
sigma_NSel ~ dunif(0,50)
for(n in 1:N_NSel){
logit(piN[n]) <- intercept_NSel + beta_SC_NSel*cov_NSel[n]
yN[n] ~ dbern(piN[n])
}
To implement weights, would I simply change the bernoulli trial to the below? In this case, I assume I would need to adjust weights so that they are between 0 and 1. So weights for used are 1/10,000 and available are 1?
yN[n] ~ dbern(piN[n]*weights[n])

Sigmoid for regression

i just started with data-science, so if this is a very dumb question then please excuse me...
So, i just learnt about the sigmoid neuron, and learnt that its range is [0, 1].
The question i have is that how can it be used in regression tasks, for example to predict the cost of any real estate property, or the imdb rating of a movie, or something
I am aware of the scaling method (multiplying the output of sigmoid with any number) to get real outputs, but that works only for outputs which have an upper limit, like the imdb rating, what about stuff like the price of a commodity or something?
Thanks in advance
In Regression tasks, The output layer of the Neural Net. shouldn't be Sigmoid function. You should use a function that does not have limits in its range. Sigmoid function often used in the middle layers of a Neural net.
You can use a Linear function or a Relu (Rectified Linear Unit) for Regression tasks.
Ps: Remember that Logistic regression is an algorithm for Classification in contrast to its name. Make sure you don't mix them up. 😁

Logistic Regression with Gradient Descent on large data

I have a training set with about 300000 examples and about 50-60 features and also it's a multiclass with about 7 classes. I have my logistic regression function that finds out the convergence of the parameters using gradient descent. My gradient descent algorithm, finds the parameters in matrix form as it's faster in matrix form than doing separately and linearly in loops.
Ex :
Matrix(P) <- Matrix(P) - LearningRate( T(Matrix(X)) * ( Matrix(h(X)) -Matrix(Y) ) )
For small training data, it's quite fast and gives correct values with maximum iterations to be around 1000000, but with that much training data, it's extremely slow, that with around 500 iterations it takes 18 minutes, but with that much iterations in gradient descent, the cost is still high and it does not predict the class correctly.
I know, I should implement maybe feature selection, or feature scaling and I can't use the packages provided. Language used is R. How do I go about implementing feature selection or scaling without using any library packages.
According to link, you can use either Z-score normalization or min-max scaling method. Both methods scale the data to [0,1] range. Z-score normalization is calculated as
Min-max scaling method is calculated as:

Does Convolution Neural Network need normalized input?

I have trained a Convolution Neural Network, after comparing two normalizations,
I found that simple minus mean and divided by standard variance is better than scaling into [0, 1], it seems that the interval of input value is unnecessary in [0, 1] with sigmoid function.
Does anybody could explain about it?
If you're talking about a NN using logistic regression, then you are correct that a suitable sigmoid function (or logistic function in this context) will give you a [0, 1] range from your original inputs.
However, the logistic function works best when the inputs are in a small range on either side of zero - so, for example, your input to the logistic function might be [-3, +3].
By rescaling your data to [0, 1] first, you would flatten out any underlying distribution and move all of your data to the positive side of zero, which is not what the logistic function expects. So you will get a worse result than by normalising (i.e. subtract mean and divide by standard deviation, as you said) because that normalisation step takes account of the variance in the original distribution and makes sure that the mean is zero so you get both positive and negative data input to the logistic function.
In your question, you said "comparing two normalisations" - I think you are misunderstanding what "normalisation" means and actually comparing normalisation with rescaling, which is different.

Resources