What is better and easier to use for Multinomial Logit Regression, R or Python?
for doing a predictive model using a Racehorse Dataset.
Trying to learn the basics in R and Python
Related
I ran a logistic regression with like 10 variables (with R) and some of them have high P-values (>0.05). Should we follow the elimination techniques that we follow in multiple linear regression to remove insignificant variables? Or is the method different in logistic regression?
I'm new to this field so please pardon me if this question sounds silly.
Thank you.
I am trying to implement the dynamic harmonic regression Arima model in python.
I have the R code for it.
xreg <- forecast:::fourier(time_series, K = 1)
How can I implement the forecast:::fourier function in python? I am trying to implement the model for the M4 dataset (time series dataset).
I am performing a resource selection function using use and availability locations for a set of animals. For this type of analysis, an infinitely weighted logistic regression is suggested (Fithian and Hastie 2013) and is done by setting weights of used locations to 1 and available locations to some large number (e.g. 10,000). I know that implementing this approach using the glm function in R would be relatively simple
model1 <- glm(used ~ covariates , family=binomial, weights=weights)
I am attempting to implement this as part of a larger hierarchical bayesian model, and thus need to figure out how to incorporate weights in JAGS. In my searching online, I have not been able to find a clear example of how to use weights in specifically a logistic regression. For a poisson model, I have seen suggestions to just multiply the weights by lambda such as described here. I was uncertain if this logic would hold for weights in a logistic regression. Below is an excerpt of JAGS code for the logistic regression in my model.
alpha_NSel ~ dbeta(1,1)
intercept_NSel <- logit(alpha_NSel)
beta_SC_NSel ~ dnorm(0, tau_NSel)
tau_NSel <- 1/(pow(sigma_NSel,2))
sigma_NSel ~ dunif(0,50)
for(n in 1:N_NSel){
logit(piN[n]) <- intercept_NSel + beta_SC_NSel*cov_NSel[n]
yN[n] ~ dbern(piN[n])
}
To implement weights, would I simply change the bernoulli trial to the below? In this case, I assume I would need to adjust weights so that they are between 0 and 1. So weights for used are 1/10,000 and available are 1?
yN[n] ~ dbern(piN[n]*weights[n])
I would like to generate odds-ratios or coefficients for various features in my dataset along with their 95% confidence intervals using a logistic regression model.
Since we cannot generate 95% CI values for odds-ratios or coefficients in sklearn logistic regression models, I started to play with statsmodels.
However, I am not seeing any standard errors for the coefficients in my output using a very large dataset that contains 17 dummy coded categorical features and 1 outcome variable - with modest correlation seen for only a couple of features (Person’s r < 0.45).
My code follows below:
import statsmodels.api as sm
X_atr = sm.add_constant(X_atr) #add constant for intercept
logit_model = sm.Logit(y_atr, X_atr) #Create model instance
result = logit_model.fit(method = "bfgs") #Fit model
print(result.summary()) #print results
Here is a sample of my output. I am getting the coefficients - but without their standard errors or 95% CI values. Can somebody suggest how to fix this issue?
We have two prominent functions (or we can say equations) in logistic regression algorithms:
Logistic regression function.
Logit function.
I would like to know:
Which of these equation(s) is/are used in the logistic regression model building process?
At what stage of model building process which of these equation(s) is/are used?
I know that logit function is used to transform probability values (which range b/w 0 and 1) to real number values (which range b/w -Inf to +Inf). I would like to know the real purpose of logit function in logistic regression modeling process.
Here are few queries which are directly related to the purpose of logit function in Logistic regression modeling:
Has Logit function (i.e. Logit equation LN(P/1-P)) being derived from Logistic Regression equation or its the other way around?
What is the purpose of Logit equation in logistic regression equation? How logit function is used in Logistic regression algorithm? Reason for asking this question will get clear after going through point no. 3 & 4.
Upon building a logistic regression model, we get model coefficients. When we substitute these model coefficients and respective predictor values into the logistic regression equation, we get probability value of being default class (same as the values returned by predict()).
Does this mean that estimated model coefficient values are determined
based on the probability values (computed using logistic regression equation not logit equation) which will be inputed to the likelihood function to determine if it maximizes it or not? If this understanding is correct then, where the logit function is used in the entire process of model building.
Assume that - "Neither logit function is used during model building not during predicting the values". If this is the case then why do we give importance to logit function which is used to map probability values to real number values (ranging between -Inf to +Inf).
Where exactly the logit function is used in the entire logistic regression model buidling process? Is it while estimating the model coefficients?
The model coefficient estimates that we see upon running summary(lr_model) are determined using linear form of logistic regression equation (logit equation) or the actual logistic regression equation?
What is the purpose of Logit function?
The purpose of the Logit function is to convert the real space [0, 1] interval to infinity.
If you check math Logit function, it converts real space from [0,1] interval to infinity [-inf, inf].
Sigmoid and softmax will do exactly the opposite thing. They will convert the [-inf, inf] real space to [0, 1] real space.
This is why in machine learning we may use logit before sigmoid and softmax function, since they match perfectly.