I am doing logistic regression with 3 attributes. According to my data set I am expecting all coefficients to be positive. But it gives me both positive and negative coefficients. Is it possible to have all positive coefficients using logistic regression.
Related
I am performing a resource selection function using use and availability locations for a set of animals. For this type of analysis, an infinitely weighted logistic regression is suggested (Fithian and Hastie 2013) and is done by setting weights of used locations to 1 and available locations to some large number (e.g. 10,000). I know that implementing this approach using the glm function in R would be relatively simple
model1 <- glm(used ~ covariates , family=binomial, weights=weights)
I am attempting to implement this as part of a larger hierarchical bayesian model, and thus need to figure out how to incorporate weights in JAGS. In my searching online, I have not been able to find a clear example of how to use weights in specifically a logistic regression. For a poisson model, I have seen suggestions to just multiply the weights by lambda such as described here. I was uncertain if this logic would hold for weights in a logistic regression. Below is an excerpt of JAGS code for the logistic regression in my model.
alpha_NSel ~ dbeta(1,1)
intercept_NSel <- logit(alpha_NSel)
beta_SC_NSel ~ dnorm(0, tau_NSel)
tau_NSel <- 1/(pow(sigma_NSel,2))
sigma_NSel ~ dunif(0,50)
for(n in 1:N_NSel){
logit(piN[n]) <- intercept_NSel + beta_SC_NSel*cov_NSel[n]
yN[n] ~ dbern(piN[n])
}
To implement weights, would I simply change the bernoulli trial to the below? In this case, I assume I would need to adjust weights so that they are between 0 and 1. So weights for used are 1/10,000 and available are 1?
yN[n] ~ dbern(piN[n]*weights[n])
I have been developing a C language control software working in real time. The software implements among others discrete state space observer of the controlled system. For implementation of the observer it is necessary to calculate inverse of the matrix with 4x4 dimensions. The inverse matrix calculation has to be done each 50 microseconds and it is worthwhile to say that during this time period also other pretty time consuming calculation will be done. So the inverse matrix calculation has to consume much less than 50 microseconds. It is also necessary to say that the DSP used does not have ALU with floating point operations support.
I have been looking for some efficient way how to do that. One idea which I have is to prepare general formula for calculation the determinant of the matrix 4x4 and general formula for calculation the adjoint matrix of the 4x4 matrix and then calculate the inverse matrix according to below given formula.
What do you think about this approach?
As I understand the consensus among those who study numerical linear algebra, the advice is to avoid computing matrix inverses unnecessarily. For example if the inverse of A appears in your controller only in expressions such as
z = inv(A)*y
then it is better (faster, more accurate) to solve for z the equation
A*z = y
than to compute inv(A) and then multiply y by inv(A).
A common method to solve such equations is to factorize A into simpler parts. For example if A is (strictly) positive definite then the cholesky factorization finds lower triangular matrix L so that
A = L*L'
Given that we can solve A*z=y for z via:
solve L*u = y for u
solve L'*z = u for z
and each of these is easy given the triangular nature of L
Another factorization (that again only applies to positive definite matrices) is the LDL which in your case may be easier as it does not involve square roots. It is described in the wiki article linked above.
More general factorizations include the LUD and QR These are more general in that they can be applied to any (invertible) matrix, but are somewhat slower than cholesky.
Such factorisations can also be used to compute inverses.
To be pedantic describing adj(A) in your post as the adjoint is, perhaps, a little old fashioned; I thing adjugate or adjunct is more modern. In any case adj(A) is not the transpose. Rather the (i,j) element of adj(A) is, up to a sign, the determinant of the matrix obtained from A by deleting the i'th row and j'th column. It is awkward to compute this efficiently.
We have two prominent functions (or we can say equations) in logistic regression algorithms:
Logistic regression function.
Logit function.
I would like to know:
Which of these equation(s) is/are used in the logistic regression model building process?
At what stage of model building process which of these equation(s) is/are used?
I know that logit function is used to transform probability values (which range b/w 0 and 1) to real number values (which range b/w -Inf to +Inf). I would like to know the real purpose of logit function in logistic regression modeling process.
Here are few queries which are directly related to the purpose of logit function in Logistic regression modeling:
Has Logit function (i.e. Logit equation LN(P/1-P)) being derived from Logistic Regression equation or its the other way around?
What is the purpose of Logit equation in logistic regression equation? How logit function is used in Logistic regression algorithm? Reason for asking this question will get clear after going through point no. 3 & 4.
Upon building a logistic regression model, we get model coefficients. When we substitute these model coefficients and respective predictor values into the logistic regression equation, we get probability value of being default class (same as the values returned by predict()).
Does this mean that estimated model coefficient values are determined
based on the probability values (computed using logistic regression equation not logit equation) which will be inputed to the likelihood function to determine if it maximizes it or not? If this understanding is correct then, where the logit function is used in the entire process of model building.
Assume that - "Neither logit function is used during model building not during predicting the values". If this is the case then why do we give importance to logit function which is used to map probability values to real number values (ranging between -Inf to +Inf).
Where exactly the logit function is used in the entire logistic regression model buidling process? Is it while estimating the model coefficients?
The model coefficient estimates that we see upon running summary(lr_model) are determined using linear form of logistic regression equation (logit equation) or the actual logistic regression equation?
What is the purpose of Logit function?
The purpose of the Logit function is to convert the real space [0, 1] interval to infinity.
If you check math Logit function, it converts real space from [0,1] interval to infinity [-inf, inf].
Sigmoid and softmax will do exactly the opposite thing. They will convert the [-inf, inf] real space to [0, 1] real space.
This is why in machine learning we may use logit before sigmoid and softmax function, since they match perfectly.
I have trained a Convolution Neural Network, after comparing two normalizations,
I found that simple minus mean and divided by standard variance is better than scaling into [0, 1], it seems that the interval of input value is unnecessary in [0, 1] with sigmoid function.
Does anybody could explain about it?
If you're talking about a NN using logistic regression, then you are correct that a suitable sigmoid function (or logistic function in this context) will give you a [0, 1] range from your original inputs.
However, the logistic function works best when the inputs are in a small range on either side of zero - so, for example, your input to the logistic function might be [-3, +3].
By rescaling your data to [0, 1] first, you would flatten out any underlying distribution and move all of your data to the positive side of zero, which is not what the logistic function expects. So you will get a worse result than by normalising (i.e. subtract mean and divide by standard deviation, as you said) because that normalisation step takes account of the variance in the original distribution and makes sure that the mean is zero so you get both positive and negative data input to the logistic function.
In your question, you said "comparing two normalisations" - I think you are misunderstanding what "normalisation" means and actually comparing normalisation with rescaling, which is different.
I am trying to convert the following matlab code to c
cX = (fft(inData, fftSize) / fftSize);
Power_X = (cX*cX')/50;
Questions:
why divide the results of fft (array of fftSize complex elements) by
fftSize?
I'm not sure at all how to convert the complex conjugate
transform to c, I just don't understand what that line does.
Peter
1. Why divide the results of fft (array of fftSize complex elements) by fftSize?
Because the "energy" (sum of squares) of the resultant fft grows as the number of points in the fft grows. Dividing by the number of points "N" normalizes it so that the sum of squares of the fft is equal to the sum of squares of the original signal.
2. I'm not sure at all how to convert the complex conjugate transform to c, I just don't understand what that line does.
That is what is actually calculating the sum of the squares. It is easy to verify cX*cX' = sum(abs(cX)^2), where cX' is the conjugate transpose.
Ideally a Discrete Fourier Transform (DFT) is purely a rotation, in that it returns the same vector in a different coordinate system (i.e., it describes the same signal in terms of frequencies instead of in terms of sound volumes at sampling times). However, the way the DFT is usually implemented as a Fast Fourier Transform (FFT), the values are added together in various ways that require multiplying by 1/N to keep the scale unchanged.
Often, these multiplications are omitted from the FFT to save computing time and because many applications are unconcerned with scale changes. The resulting FFT data still contains the desired data and relationships regardless of scale, so omitting the multiplications does not cause any problems. Additionally, the correcting multiplications can sometimes be combined with other operations in an application, so there is no point in performing them separately. (E.g., if an application performs an FFT, does some manipulations, and performs an inverse FFT, then the combined multiplications can be performed once during the process instead of once in the FFT and once in the inverse FFT.)
I am not familiar with Matlab syntax, but, if Stuart’s answer is correct that cX*cX' is computing the sum of the squares of the magnitudes of the values in the array, then I do not see the point of performing the FFT. You should be able to calculate the total energy in the same way directly from iData; the transform is just a coordinate transform that does not change energy, except for the scaling described above.