How to know that the Backpropagation can train successfully or not? - artificial-intelligence

I have an AI project, which uses a Backpropagation neural network.
It is training for about 1 hour, and it has trained 60-70 inputs from all 100 inputs. I mean, 60-70 inputs are correct in the condition of Backpropagation. (the number of trained inputs is moving between 60 and 70).
And currently, more than 10000 epochs are completed, and each epoch is taking almost 0.5 seconds.
How to know if the neural network can be trained successfully if I leave it for a long time? (or it can't train better?)

Check out my answer to this question: whats is the difference between train, validation and test set, in neural networks?
You should use 3 sets of data:
Training
Validation
Testing
The Validation data set tells you when you should stop (as I said in the other answer):
The validation data set is used to minimize overfitting. You're not
adjusting the weights of the network with this data set, you're just
verifying that any increase in accuracy over the training data set
actually yields an increase in accuracy over a data set that has not
been shown to the network before, or at least the network hasn't
trained on it (i.e. validation data set). If the accuracy over the
training data set increases, but the accuracy over then validation
data set stays the same or decreases, then you're overfitting your
neural network and you should stop training.
A good method for validation is to use 10-fold (k-fold) cross-validation. Additionally, there are specific "strategies" for splitting your data set into training, validation and testing. It's somewhat of a science in itself, so you should read up on that too.
Update
Regarding your comment on the error, I would point you to some resources which can give you a better understanding of neural networks (it's kinda math heavy, but see below for more info):
http://www.colinfahey.com/neural_network_with_back_propagation_learning/neural_network_with_back_propagation_learning_en.html
http://www.willamette.edu/~gorr/classes/cs449/linear2.html
Section 5.9 of Colin Fahey article describes it best:
Backward error propagation formula:
The error values at the neural network outputs are computed using the following formula:
Error = (Output - Desired); // Derived from: Output = Desired + Error;
The error accumulation in a neuron body is adjusted according to the output of the neuron body and the output error (specified by links connected to the neuron body).
Each output error value contributes to the error accumulator in the following manner:
ErrorAccumulator += Output * (1 - Output) * OutputError;

Related

PyBrain RNN prediction failure

I am using a recurrent neural network for time series prediction with LSTM as the activation function. The inputs are sequence datasets, with the output being the next datum after the input sequence. I have hundreds of inputs, one hidden layer of equal size, and a single output in the output layer. However much I train, the result is always much higher than the actual value (with other functions too), shown respectively by green and blue below. What is the solution?
It seems that LSTM is not suited for this kind of pattern. Softmax works well.

How to determine the threshold for neuron firings in neural networks?

I have a simple task to classify people by their height and hair length to either MAN or WOMAN category using a neural network. Also teach it the pattern with some examples and then use it to classify on its own.
I have a basic understanding of neural networks but would really need some help here.
I know that each neuron divides the area to two subareas, basically that is why P = w0 + w1*x1 + w2*x2 + ... + wn*xn is being used here (weights are just moving the line if we consider geometric representation).
I do understand that each epoche should modify the weights to get closer to correct result, yet I have never program it and I am hopeless about how to start.
How should I proceed, meaning: How can I determine the threshold and how should I deal with the inputs?
It is not a homework rather than task for the ones who were interested. I am and I would like to understand it.
Looks like you are dealing with a simple Perceptron with a threshold activation function. Have a look at this question. Since you ARE using a bias neuron (w0), you would set the threshold to 0.
You then simply take the output of your network and compare it to 0, so you would e.g. output class 1 if x < 0 and class 2 if x > 0. You could model the case x=0 as "indistinct".
For learning the weights you need to apply the Delta Learning Rule which can be implemented very easily. But be careful: a perceptron with a simple threshold activation function can only be correct if your data are linearly separable. If you have more complex data you will need a Multilayer Perceptron and a nonlinear activation function like the Logistic Sigmoid Function.
Have a look at Geoffrey Hintons Coursera Course, Lecture 2 for details.
I've been working with machine learning lately (but I'm not an expert) but you should look at the Accord.NET framework. It contains all the common machine learning algorithme out of the box. So it's easy to take an existing samples and modify it instead of starting from scratch. Also, the developper of the framework is very helpful in the forum available on the same page.
With the available samples, you may also discover something better than neural network like the Kernel Support Vector Machine. If you stick to the neural network, have fun modifying all the different variables and by tryout and error you will understand how it work.
Have fun!
Since you said:
I know that each neuron divides the area to two subareas
&
weights are just moving the line if we consider geometric representation
I think you want to use perseptron or ADALINE neural networks. These neural networks can just classify linear separable patterns. since your input data is complicated, It's better to use a Multi layer Non-Linear Neural network. (my suggestion is a two layer neural network with tanh activation function) . For training these network you should use back propagation algorithm.
For answering to
how should I deal with the inputs?
I need to know more details about the inputs( Like: are they just height and hair length or there is more, what is their range and your resolution and etc.)
If you're dealing with just height and hair length I suggest that divide heights and length in some classes (for example 160cm-165cm, 165cm-170cm & etc.) and for each one of these classes set an On/Off input neuron. then put a hidden layer after all classes related to heights and another hidden layer after all classes related to hair length (tanh activation function). Number of neurons in these two hidden layer is determined based on number of training cases.
then take these two hidden layer output and send them to an aggregation layer with 1 output neuron.

Backpropagation overall error chart with very small slope... Is this normal?

I'm training a neural network with backpropagation algorithm and this is the chart of Overall Errors:
( I'm calculating Overall error by this formula : http://www.colinfahey.com/neural_network_with_back_propagation_learning/neural_network_with_back_propagation_learning_en.html Part 6.3 : Overall training error)
I have used Power Trendline and after calculations, I saw that if epoches = 13000 => overall error = 0.2
Isn't this too high?
Is this chart normal? Seems that the training process will take too long... Right? What should I do? Isn't there any faster way?
EDIT : My neural network has a hidden layer with 200 neurons. and my input and output layers have 10-12 neurons. My problem is clustering characters. (it clusters Persian characters into some clusters with supervised training)
So you are using a ANN with 200 input nodes with 10-12 hidden nodes in the hidden layer, what activation function are you using if any for your hidden layer and output layer?
Is this a standard back propagation training algorithm and what training function are you using?
Each type of training function will affect the speed of training and in some cases its ability to generalise, you don't want to train against your data such that your neural network is only good for your training data.
So ideally you want decent training data that could be a sub sample of your real data say 15%.
You could training your data using conjugate gradient based algorithm:
http://www.mathworks.co.uk/help/toolbox/nnet/ug/bss331l-1.html#bss331l-2
this will train your network quickly.
10-12 nodes may not be ideal for your data, you can try changing the number in blocks of 5 or add another layer, in general more layers will improve the ability of your network to classify your problem but will increase the computational complexity and hence slow down the training.
Presumably these 10-12 nodes are 'features' you are trying to classify?
If so, you may wish to normalise them, so rescale each to between 0 and 1 or -1 to 1 depending on your activation function (e.g. tan sigmoidal will produce values in range -1 to +1):
http://www.heatonresearch.com/node/706
You may also train a neural network to identify the ideal number of nodes you should have in your hidden layer.

Multiple Output Neural Network

I have built my first neural network in python, and i've been playing around with a few datasets; it's going well so far !
I have a quick question regarding modelling events with multiple outcomes: -
Say i wish to train a network to tell me the probability of each runner winning a 100m sprint. I would give the network all of the relevant data regarding each runner, and the number of outputs would be equal to the number of runners in the race.
My question is, using a sigmoid function, how can i ensure the sum of the outputs will be equal to 1.0 ? Will the network naturally learn to do this, or will i have to somehow make this happen explicitly ? If so, how would i go about doing this ?
Many Thanks.
The output from your neural network will approach 1. I don't think it will actually get to 1.
You actually don't need to see which output is equal to 1. Once you've trained your network up to a specific error level, when you present the inputs, just look for the maximum output in your output later. For example, let's say your output layer presents the following output: [0.0001, 0.00023, 0.0041, 0.99999412, 0.0012, 0.0002], then the runner that won the race is runner number 4.
So yes, your network will "learn" to produce 1, but it won't exactly be 1. This is why you train to within a certain error rate. I recently created a neural network to recognize handwritten digits, and this is the method that I used. In my output layer, I have a vector with 10 components. The first component represents 0, and the last component represents 9. So when I present a 4 to the network, I expect the output vector to look like [0, 0, 0, 0, 1, 0, 0, 0, 0, 0]. Of course, it's not what I get exactly, but it's what I train the network to provide. So to find which digit it is, I simply check to see which component has the highest output or score.
Now in your second question, I believe you're asking how the network would learn to provide the correct answer? To do this, you need to provide your network with some training data and train it until the output is under a certain error threshold. So what you need is a set of data that contains the inputs and the correct output. Initially your neural network will be set up with random weights (there are some algorithms that help you select better weights to minimize training time, but that's a little more advanced). Next you need a way to tell the neural network to learn from the data provided. So basically you give the data to the neural network and it provides an output, which is highly likely to be wrong. Then you compare that data with the expected (correct) output and you tell the neural network to update its weights so that it gets closer to the correct answer. You do this over and over again until the error is below a certain threshold.
The easiest way to do this is to implement the stochastic backpropagation algorithm. In this algorithm, you calculate the error between the actual output of the neural network and the expected output. Then you backpropagate the error from the output layer all the way up to the weights to the hidden layer, adjusting the weights as you go. Then you repeat this process until the error that you calculate is below a certain threshold. So during each step, you're getting closer and closer towards your solution.
You can use the algorithm described here. There is a decent amount of math involved, so be prepared for that! If you want to see an example of an implementation of this algorithm, you can take a look at this Java code that I have on github. The code uses momentum and a simple form of simulated annealing as well, but the standard backpropagation algorithm should be easily discernible. The Wikipedia article on backpropagation has a link to an implementation of the backpropagation algorithm in Python.
You're probably not going to understand the algorithm immediately; expect to spend some time understanding it and working through some of the math. I sat down with a pencil and paper as I was coding, and that's how I eventually understood what was going on.
Here are a few resources that should help you understand backpropagation a little better:
The learning process: backpropagation
Error backpropagation
If you want some more resources, you can also take a look at my answer here.
Basically you want a function of multiple real numbers that converts those real numbers into probabilities (each between 0 to 1, sum to 1). You can this easily by post processing the output of your network.
Your network gives you real numbers r1, r2, ..., rn that increases as the probability of each runner wins the race.
Then compute exp(r1), exp(r2), ..., and sum them up for ers = exp(r1) + exp(r2) + ... + exp(rn). Then the probability that the first racer wins is exp(r1) / ers.
This is a one use of the Boltzman distribution. http://en.wikipedia.org/wiki/Boltzmann_distribution
Your network should work around that and learn it naturally eventually.
To make the network learn that a little faster, here's what springs to mind first:
add an additional output called 'sum' (summing all the other output neurons) -- if you want all the output neurons to be in an separate layer, just add a layer of outputs, first numRunners outputs just connect to corresponding neuron in the previous layer, and the last numRunners+1-th neuron you connect to all the neurons from the previous layer, and fix the weights to 1)
the training set would contain 0-1 vectors for each runner (did-did not run), and the "expected" result would be a 0-1 vector 00..00001000..01 first 1 marking the runner that won the race, last 1 marking the "sum" of "probabilities"
for the unknown races, the network would try to predict which runner would win. Since the outputs have contiguous values (more-or-less :D) they can be read as "the certainty of the network that the runner would win the race" -- which is what you're looking for
Even without the additional sum neuron, this is the rough description of the way the training data should be arranged.

Is neural network's response guaranteed on training data?

I'm trying to train an ANN (I use this library: http://leenissen.dk/fann/ ) and the results are somewhat puzzling - basically if I run the trained network on the same data used for training, the output is not what specified in the training set, but some random number.
For example, the first entry in the training file is something like
88.757004 88.757004 104.487999 138.156006 100.556000 86.309998 86.788002
1
with the first line being the input values and the second line is the desired output neuron's value. But when I feed the exact same data to the trained network, I get different results on each train attempt, and they are quite different from 1, e.g.:
Max epochs 500000. Desired error: 0.0010000000.
Epochs 1. Current error: 0.0686412785. Bit fail 24.
Epochs 842. Current error: 0.0008697828. Bit fail 0.
my test result -4052122560819626000.000000
and then on another attempt:
Max epochs 500000. Desired error: 0.0010000000.
Epochs 1. Current error: 0.0610717005. Bit fail 24.
Epochs 472. Current error: 0.0009952184. Bit fail 0.
my test result -0.001642
I realize that the training set size may be inadequate (I only have about a 100 input/output pairs so far), but shouldn't at least the training data trigger the right output value? The same code works fine for the "getting started" XOR function described at the FANN's website (I've already used up my 1 link limit)
Short answer: No
Longer answer (but possibly not the as correct):
1st: a training run only moves the weights of the neurons towards a position where they affect the output to be as in the testdata. After some/many iterations the output should be close to the expected output. Iff the neurol network is up to the task, which brings me to
2nd: Not every neuronal network works for every problem. For a single neuron it is pretty easy to come up with a simple function that can not get approximated by a single neuron. Though not as easy to see, the same limit applies for every neural network. In such cases your results will very likely look like random numbers. Edit after comment: In many cases this can be fixed by adding neurons to the network.
3rd: actually the first point is a strength of a neural network, because it allows the network to handle outliers nicely.
4th: I blame 3 for my lacking understanding of music. It just doesn't fit my brain ;-)
No, if you get your ANN to work perfectly on the training data, you either have a really easy problem or you're overfitting.

Resources