Feature scalling/normalization with sparse data - sparse-matrix

I am having a problem with training a neural network with sparse input data to solve a supervised regression problem. When I perform mean normalization (subtract mean then divide standard deviation) on the input data, I get a lot of NaN values. I am wondering if anyone has experience dealing this kind of problem. What is the correct way to scale the sparse input data?
thanks
Joe

Sounds like your data is so sparse that the standard deviation is occasionally zero.
Test for that, and if so, don't divide your input by it (stdev normalization is not necessary in that case anyway).

Related

Neural network time series normalization

I am using LSTM as the hidden layer function in a time series prediction network. Is input normalization necessary? If it is, is data = data / sum(data) the correct normalization?
Should the output also be normalized with the inputs?
Is input normalization necessary?
No, but it might make your network converge faster. Use this calculation to scale your values to [0,1]:
.
Should the output also be normalized with the inputs?
No, I can't think of a reason why you would ever want to do that.

Use case for incremental supervised learning using apache mahout

Business case:
Forecasting fuel consumption at site.
Say fuel consumption C, is dependent on various factors x1,x2,...xn. So mathematically speaking, C = F{x1,x2,...xn}. I do not have any equation to put this.
I do have historical dataset from where I can get a correlation of C to x1,x2 .. etc. C,x1,x2,.. are all quantitative. Finding out the correlation seems tough for a person like me with limited statistical knowledge, for a n variable equation.
So, I was thinking of employing some supervised machine learning techniques for the same. I will train a classifier with the historic data to get a prediction for the next consumption.
Question: Am I thinking in the right way?
Question: If this is correct, my system should be an evolving one. So the more real data I am going to feed to the system, that would evolve my model to make a better prediction the next time. Is this a correct understanding?
If the above the statements are true, does the AdaptiveLogisticRegression algorithm, as present in Mahout, will be of help to me?
Requesting advises from the experts here!
Thanks in advance.
Ok, correlation is not a forecasting model. Correlation simply ascribes some relationship between the datasets based on covariance.
In order to develop a forecasting model, what you need to peform is regression.
The simplest form of regression is linear univariate, where C = F (x1). This can easily be done in Excel. However, you state that C is a function of several variables. For this, you can employ linear multivariate regression. There are standard packages that can perform this (within Excel for example), or you can use Matlab, etc.
Now, we are assuming that there is a "linear" relationship between C and the components of X (the input vector). If the relationship were not linear, then you would need more sophisticated methods (nonlinear regression), which may very well employ machine learning methods.
Finally, some series exhibit auto-correlation. If this is the case, then it may be possible for you to ignore the C = F(x1, x2, x3...xn) relationships, and instead directly model the C function itself using time-series techniques such as ARMA and more complex variants.
I hope this helps,
Srikant Krishna

What is the suitable range of Lambda, eta, and alpha for Artificial Neural Network?

I've finished my algorithm to apply the ANN on the C++ language, but I stack with the value of lambda, eta, and alpha, to show the best resul. I don't know if there is a rule or range that will give a good result of training.
The dataset is 4000
The hidden neurons are 15
Can anyone help me please to give the reasons of chosing the best value of Lambda, Eta, and Alpha?
Thank you very much
Obviously, the exact optimal values are completely problem-dependent, but I suggest using commonly-applied values (Murray Smith makes some suggestions in "Neural Networks for Statistical Modeling") and not tinkering with them.
Even today, the complaint is sometimes made that neural networks are difficult to work with because of all of the experimentation needed to optimize parameters like eta, etc. In most situations, though, the idea is to discover a good approximation, not the optimal one. Assuming that one hasn't chosen insane parameter values, then the only experimentation needed is adjustment of the size of the hidden layer.

The correctness of neural networks

I have asked other AI folk this question, but I haven't really been given an answer that satisfied me.
For anyone else that has programmed an artificial neural network before, how do you test for its correctness?
I guess, another way to put it is, how does one debug the code behind a neural network?
With neural networks, generally what is happening is you are taking an untrained neural network, and you are training it up using a given set of data, so that it responds in the way you expect. Here's the deal; usually, you're training it up to a certain confidence level for your inputs. Generally (and again, this is just generally; your mileage may vary), you cannot get neural networks to always provide the right answer; rather, you are getting the estimation of the right answer, to within a confidence range. You know that confidence range by how you have trained the network.
The question arises as to why you would want to use neural networks if you cannot be certain that the conclusion they come to is verifiably correct; the answer is that neural networks can arrive at high-confidence answers for certain classes of problems (specifically, NP-Complete problems) in linear time, whereas verifiably correct solutions of NP-Complete problems can only be arrived at in polynomial time. In layman's terms, neural networks can "solve" problems that normal computation can't; but you can only be a certain percentage confident that you have the right answer. You can determine that confidence by the training regimen, and can usually make sure that you will have at least 99.9% confidence.
Correctness is a funny concept in most of "soft computing." The best I can tell you is: "a neural network is correct when it consistently satisfies the parameters of it's design." You do this by training it with data, and then verifying with other data, and having a feedback loop in the middle which lets you know if the neural network is functioning appropriately.
This is of-course the case only for neural networks that are large enough where a direct proof of correctness is not possible. It is possible to prove that a neural network is correct through analysis if you are attempting to build a neural network that learns XOR or something similar, but for that class of problem an aNN is seldom necessary.
You're opening up a bigger can of worms here than you might expect.
NN's are perhaps best thought of as universal function approximators, by the way, which may help you in thinking about this stuff.
Anyway, there is nothing special about NN's in terms of your question, the problem applies to any sort of learning algorithm.
The confidence you have in the results it is giving is going to rely on both the quantity and the quality (often harder to determine) of the training data that you have.
If you're really interested in this stuff, you may want to read up a bit on the problems of overtraining, and ensemble methods (bagging, boosting, etc.).
The real problem is that you usually aren't actually interested in the "correctness" (cf quality) of an answer on a given input that you've already seen, rather you care about predicting the quality of answer on an input you haven't seen yet. This is a much more difficult problem. Typical approaches then, involve "holding back" some of your training data (i.e. the stuff you know the "correct" answer for) and testing your trained system against that. It gets subtle though, when you start considering that you may not have enough data, or it may be biased, etc. So there are many researchers who basically spend all of their time thinking about these sort of issues!
I've worked on projects where there is test data as well as training data, so you know the expected outputs for a set of inputs the NN hasn't seen.
One common way of analysing the result of any classifier is use of an ROC curve; an introduction to the statistics of classifiers and ROC curves can be found at Interpreting Diagnostic Tests
I'm a complete amateur in this field, but don't you use a pre-determined set of data you know is correct?
I don't believe there is a single correct answer but there are well-proven probabilistic or statistical methods that can provide reassurance. The statistical methods are usually referred to as Resampling.
One method that I can recommend is the Jackknife.
My teacher always said his rule of thumb was to train the NN with 80% of your data and validate it with the other 20%. And, of course, make sure that data set is as comprehensive as you need.
If you want to find out whether the backpropagation of the network is correct, there is an easy way.
Since you calculate the derivate of the error landscape, you can check whether your implementation is correct numerically. You will calculate the derivative of the error with respect to a specific weight, ∂E/∂w. You can show that
∂E/∂w = (E(w + e) - E(w - e)) / (2 * e) + O(e^2).
(Bishop, Machine Learning and Pattern Recognition, p. 246)
Essentially, you evaluate the error to the left of the weight, evaluate it to the right of the weight and chheck if the numerical gradient is the same as your analytical gradient.
(Here's an implementation: http://github.com/bayerj/arac/raw/9f5b225d6293974f8adfc5f20dfc6439cc1bed35/src/cpp/utilities/utilities.cpp)
To me probably there is only one value(s) takes extra effort to verify, the gradient of the back propagation. I think Bayer's answer is actually commonly used and suggested. You need to write extra code to this but all are forward propagation matrix multiplications which is easy to write and verify.
There are some other issues which will prevent you from getting the best answer, for example:
The cost function of NN is not concave so your gradient descent is not guaranteed to find the global optimum.
Over/under fitting
Not choosing the "right" features/model
etc
However I think they are beyond the scope of programming bug.

Measuring the performance of classification algorithm

I've got a classification problem in my hand, which I'd like to address with a machine learning algorithm ( Bayes, or Markovian probably, the question is independent on the classifier to be used). Given a number of training instances, I'm looking for a way to measure the performance of an implemented classificator, with taking data overfitting problem into account.
That is: given N[1..100] training samples, if I run the training algorithm on every one of the samples, and use this very same samples to measure fitness, it might stuck into a data overfitting problem -the classifier will know the exact answers for the training instances, without having much predictive power, rendering the fitness results useless.
An obvious solution would be seperating the hand-tagged samples into training, and test samples; and I'd like to learn about methods selecting the statistically significant samples for training.
White papers, book pointers, and PDFs much appreciated!
You could use 10-fold Cross-validation for this. I believe it's pretty standard approach for classification algorithm performance evaluation.
The basic idea is to divide your learning samples into 10 subsets. Then use one subset for test data and others for train data. Repeat this for each subset and calculate average performance at the end.
As Mr. Brownstone said 10-fold Cross-Validation is probably the best way to go. I recently had to evaluate the performance of a number of different classifiers for this I used Weka. Which has an API and a load of tools that allow you to easily test the performance of lots of different classifiers.

Resources