Recently I found an example where a Neural Network tried to classify characters. There were trained two neural networks. One with a noisy data set and second without it. I tried to find any theoretical explanation why using the noisy training set gives better results, but I didn't get enough to understand. Can anyone explain me? Thanks in advance
Training NN with noise improves generalization (the ability of network to provide correct predictions for new unknown data), because noise makes more difficult for NN to fit each data point precisely (preventing NN from just memorizing exact values of training data, thus forcing it to learn more meaningful relationships). For mathematical details and information about relationship between noise and other forms of regularization, you can take a look at this paper
Related
I am a beginner when it comes to NN. I understand the basics and I am not sure abou the following - lets consider handwriting recognition network. I understand you can train a network to recognize a pattern, i.e the weights are set appropriately. But if the network will be trained to recognize "A", how could it recognize "B" then, which would certainly require the weights to be set differently?
Or does the network only searches for one letter it is currently trained? I hope I made myselft clear - I basically try to understand how a trained network can recognize various characters if the weights will be mixed when training for all.
When a neural network is being trained, what is happening is that the network is searching for a set of weights which when combined with the test inputs, will yield the expected output.
One of the key features in neural networks is the setting up and assignment of the Learning Rate. What this means is essentially how much of the previous acquired information is kept.
It is important that this value be neither too high (if memory serves, setting it to 1 would mean that the weight will be changed by taking into consideration only the current test case) nor too low (setting it up to zero will mean that no weight change will be made). In either case, the neural network would never converge.
When training for hand writing, as far as I know, the training set involves various letters written in various forms. That being said, although neural networks tend to fair better than other AI approaches when there are variations in its input, there are always limitations.
EDIT:
As per your question, assuming that you are dealing with a back propagation neural network, what you do is that at each layer, you apply an activation function and pass the result of the current layer to the next.
The extra bit comes during testing, where you compare the result you have with the result you want. This is where you apply the back propagation algorithm to amend the weights, and in this section is where the learning rate comes in.
As you have mentioned in your comment, the weights will be changed, however, the value of the learning rate will determine how much will the weights change. Usually, you want them to change relatively slowly so that they converge, hence why you want to keep the value of the learning rate relatively low. However, if you have a very high learning rate, the current data set will, as you are saying, affect any improvements made by the next.
The way you can look at it is that while training, the neural network is searching for a set of weights which can given its test inputs, it will yield the expected results. So basically, you are looking for weights which satisfy all your test cases.
I am developing (for my senior project) a dumbbell that is able to classify and record different exercises. The device has to be able to classify a range of these exercises based on the data given from an IMU (Inertial Measurement Unit). I have acceleration, gyroscope, compass, pitch, yaw, and roll data.
I am leaning towards using an Artificial Neural Network in order to do this, but am open to other suggestions as well. Ultimately I want to pass in the IMU data into the network and have it tell me what kind of exercise it is (Bicep curl, incline fly etc...).
If I use an ANN, what kind should I use (recurrent or not) and how should I implement it? I am not sure how to get the network to recognize an exercise when I am passing it a continuous stream of data. I was thinking about constantly performing an FFT on a portion of the inputs and sending a set number of frequency magnitudes into the network, but am not sure if that will work either. Any suggestions/comments?
Your first task should be to collect some data from the dumbbell. There are many, many different schemes that could be used to classify the data, but until you have some sample data to work with, it is hard to predict exactly what will work best.
If you get 5 different people to do all of the exercises and look at the resulting data yourself (e.g. pilot the different parts of the data collected), can you distinguish which exercise is which? This may give you hints on what pre-processing you might want to perform on the data before sending it to a classifier.
First you create a large training set.
Then you train it, telling it what actually happens.
And you might uses averages of data as well.
Perhaps use actual movement and movement that is averaged over 2 sec 5 sec and 10 sec. use those too as for input nodes.
while exercising the trained network can be feeded with the averaged data as well ea (the last x samples divided by x), this will give you a stable approach. Otherwise the neural network can become hectic erratic.
Notice the training set might require averaged data as well and thus you will need a large training set.
My professor asked my class to make a neural network to try to predict if a breast cancer is benign or malignant. To do this I'm using the Breast Cancer Wisconsin (Diagnostic) Data Set.
As a tip for doing this my professor said not all 30 atributes needs to be used as an input (there are 32, but the first 2 are the ID and Diagnosis), what I want to ask is: How am I supposed to take those 30 inputs (that would create like 100+ weights depending on how many neurons I would use) and get them into a lesser number?
I've already found how to "prune" a neural net, but I don't think that's what I want. I'm not trying to eliminate unnecessary neurons, but to shrink the input itself.
PS: Sorry for any english errors, it's not my native language.
That is a question that is being under research right now. It is called feature selection and there are some techniques already. One is Principal Componetns Analysis (PCA) that reduces the dimensionality of your dataset taking those feature that keeps the most variance. Another thing you can do is to see if there are highly corelated variables. If two inputs are highly correlated may mean that they carry almost the same information so it may be remove without worsen much the performance of your classifier. As a third technique you could use is deep-learning which is a technique that tries to learn the features that will later be used to feed your trainer. More info about deep learning and PCA can be found here http://deeplearning.stanford.edu/wiki/index.php/Main_Page
This problem is called feature selection. It is mostly the same for neural networks as for other classifiers. You could prune your dataset while retaining the most variance using PCA. To go further, you could use a greedy approach and evaluate your features one by one by training and testing your network with each feature excluded in turn.
There is a technique for feature selection using just neural networks
Split your dataset into three groups:
Training data used for supervised training
Validation data used to verify that the neural network is able to generalize
Accuracy testing used to test which of the features are required
The steps:
Train a network on your training and validation set, just like you would normally do.
Test the accuracy of the network with the third dataset.
Locate the varible which yields the smallest drop in the accuracy test above when dropped (dropped meaning always feeding a zero as the input signal )
Retrain your network with the new selection of features
Keep doing this either to the network fails to be trained or there is just one variable left.
Here is a paper on the technique
I've trained a deep belief neural network (formed by stacked restricted boltzmann machines) using some pseudo-code from the internet. The problem is after training it, i.e. after adjusting its weights, I have no clear idea how to test it.
I have an input image and a trained neural network. How must the classification be done? I've saved the trained network to a file. The problem is I haven't thoroughly studied the math behind it as I need this project done ASAP. Also, Googling didn't provide any clear information.
I've trained a deep belief neural network (formed by stacked restricted boltzmann machines) using some pseudo-code from net.
This means that you've "fed" your neural networks with pairs consisting of an image and a value associated with it, right? This value might be 0/1 in case of classification or any real number in the case of regression.
Testing it means that you've got to "feed" your neural network only with the image. In your pseudo-code, there's supposed to be two functions : void train(Image input, float trainValue) and another one float predict(Image input). (Change Image with whatever is relevant in your case : vector, Matrix, etc...)
Can you give us your code (or at least pseudo code)?
One common approach is to train your NN on two-thirds of your available training data. The remaining third is then used to test the trained network. The ratio of training/testing data can be changed to meet your application, but it is critical that the training and test groups be free of bias. You might consider randomly partitioning your data into the two sets to ensure you don't inadvertently introduce bias.
I have asked other AI folk this question, but I haven't really been given an answer that satisfied me.
For anyone else that has programmed an artificial neural network before, how do you test for its correctness?
I guess, another way to put it is, how does one debug the code behind a neural network?
With neural networks, generally what is happening is you are taking an untrained neural network, and you are training it up using a given set of data, so that it responds in the way you expect. Here's the deal; usually, you're training it up to a certain confidence level for your inputs. Generally (and again, this is just generally; your mileage may vary), you cannot get neural networks to always provide the right answer; rather, you are getting the estimation of the right answer, to within a confidence range. You know that confidence range by how you have trained the network.
The question arises as to why you would want to use neural networks if you cannot be certain that the conclusion they come to is verifiably correct; the answer is that neural networks can arrive at high-confidence answers for certain classes of problems (specifically, NP-Complete problems) in linear time, whereas verifiably correct solutions of NP-Complete problems can only be arrived at in polynomial time. In layman's terms, neural networks can "solve" problems that normal computation can't; but you can only be a certain percentage confident that you have the right answer. You can determine that confidence by the training regimen, and can usually make sure that you will have at least 99.9% confidence.
Correctness is a funny concept in most of "soft computing." The best I can tell you is: "a neural network is correct when it consistently satisfies the parameters of it's design." You do this by training it with data, and then verifying with other data, and having a feedback loop in the middle which lets you know if the neural network is functioning appropriately.
This is of-course the case only for neural networks that are large enough where a direct proof of correctness is not possible. It is possible to prove that a neural network is correct through analysis if you are attempting to build a neural network that learns XOR or something similar, but for that class of problem an aNN is seldom necessary.
You're opening up a bigger can of worms here than you might expect.
NN's are perhaps best thought of as universal function approximators, by the way, which may help you in thinking about this stuff.
Anyway, there is nothing special about NN's in terms of your question, the problem applies to any sort of learning algorithm.
The confidence you have in the results it is giving is going to rely on both the quantity and the quality (often harder to determine) of the training data that you have.
If you're really interested in this stuff, you may want to read up a bit on the problems of overtraining, and ensemble methods (bagging, boosting, etc.).
The real problem is that you usually aren't actually interested in the "correctness" (cf quality) of an answer on a given input that you've already seen, rather you care about predicting the quality of answer on an input you haven't seen yet. This is a much more difficult problem. Typical approaches then, involve "holding back" some of your training data (i.e. the stuff you know the "correct" answer for) and testing your trained system against that. It gets subtle though, when you start considering that you may not have enough data, or it may be biased, etc. So there are many researchers who basically spend all of their time thinking about these sort of issues!
I've worked on projects where there is test data as well as training data, so you know the expected outputs for a set of inputs the NN hasn't seen.
One common way of analysing the result of any classifier is use of an ROC curve; an introduction to the statistics of classifiers and ROC curves can be found at Interpreting Diagnostic Tests
I'm a complete amateur in this field, but don't you use a pre-determined set of data you know is correct?
I don't believe there is a single correct answer but there are well-proven probabilistic or statistical methods that can provide reassurance. The statistical methods are usually referred to as Resampling.
One method that I can recommend is the Jackknife.
My teacher always said his rule of thumb was to train the NN with 80% of your data and validate it with the other 20%. And, of course, make sure that data set is as comprehensive as you need.
If you want to find out whether the backpropagation of the network is correct, there is an easy way.
Since you calculate the derivate of the error landscape, you can check whether your implementation is correct numerically. You will calculate the derivative of the error with respect to a specific weight, ∂E/∂w. You can show that
∂E/∂w = (E(w + e) - E(w - e)) / (2 * e) + O(e^2).
(Bishop, Machine Learning and Pattern Recognition, p. 246)
Essentially, you evaluate the error to the left of the weight, evaluate it to the right of the weight and chheck if the numerical gradient is the same as your analytical gradient.
(Here's an implementation: http://github.com/bayerj/arac/raw/9f5b225d6293974f8adfc5f20dfc6439cc1bed35/src/cpp/utilities/utilities.cpp)
To me probably there is only one value(s) takes extra effort to verify, the gradient of the back propagation. I think Bayer's answer is actually commonly used and suggested. You need to write extra code to this but all are forward propagation matrix multiplications which is easy to write and verify.
There are some other issues which will prevent you from getting the best answer, for example:
The cost function of NN is not concave so your gradient descent is not guaranteed to find the global optimum.
Over/under fitting
Not choosing the "right" features/model
etc
However I think they are beyond the scope of programming bug.