How to Identify Recurrent Connections in an Arbitrary Neural Network - artificial-intelligence

I am trying to implement Neuro-Evolution of Augmenting Topologies in C#. I am running into a problem with recurrent connections. I understand that, for a recurrent connection, the output is basically temporally displaced.
http://i.imgur.com/FQYjCLZ.png
In the linked image, I show a pretty simple neural network with 2 inputs, 3 hidden nodes, and one output. Without an activation function or transfer function, I think it would be evaluated as:
n3[t] = (i1[t]*a + n6[t-1]*e)*d + i2[t]*b*c) * f
However I am having a hard time figuring out how to identify the fact that the link e is a recurrent connection. The paper that I read about NEAT showed how the minimal solutions of the XOR problem and the dual pole no velocity problem both had recurrent connections.
It seems rather straight forward if you have a fixed topology, because you can analyze the topology yourself, and identify which connections you need to time delay.
How exactly would you identify these connections?

I had a similar problem when i started implememting this paper. I don't know what your network looks like in the momen, so i'll explain to you what i did.
My network starts out as input & output layers only. To create connections and neurons i implemented some kind of DNA (in my case this is an array of instructions like 'connect neuron nr. 2 with neuron nr. 5 and set the weight to 0.4'). Every neuron in my network has a "layerNumber" which tells me where a neuron is in my network. This layerNumber is set for every in and output neuron. for inputneurons i used Double.minvalue and for outputneurons i used Double.maxvalue.
This is the basic setup. From now on just follow these rules when modifying the network:
Whenever you want to create a connection, make sure the 'from' neuron has a layerNumber < Double.maxValue
Whenever you want to create a connection, make sure that the 'to' neuron has a bigger layerNumber than the 'from' neuron.
Whenever a connection is split up into 2 connections and a neuron between them, set the neurons layerNumber to NeuronFrom.layerNumber*0.5 + NeuronTo.layerNumber*0.5
This is important, you can't add them and simply divide by 2, because this would likely result in Double.maxValue + something, which would return some weird number (i guess an overflow would happen, so you would get a negative number?).
If you follow all the rules you should always have forwarding connections only. No recurrent ones. If you want recurrent connections you can create them by just swapping 'from' & 'to' while creating a new connection.
Pro tricks:
Use only one ArrayList of Neurons.
Make the DNA use ID's of neurons to find them, but create a 'Connection' class which will have the Neuron objects as attributes.
When filtering your connections/neurons use ArrayList.stream().filter()
When later propagating trough the network you can just sort your neurons by layerNumber, set the inputValues and go trough all neurons using a for() loop. Just calculate the neurons outputvalue and transfer it to every neuron which has a connection where 'from' is == the current neuron.
Hope it's not too complicated...

Okay, so instead of telling you to just not have recurrent connections, i'm actually going to tell you how to identify them.
First thing you need to know is that recurrent connections are calculated after all other connections and neurons. So which connection is recurrent and which is not depends on the order of calculation of your NN.
Also, the first time when you put data into the system, we'll just assume that every connection is zero, otherwise some or all neurons can't be calculated.
Lets say we have this neural network:
Neural Network
We devide this network into 3 layers (even though conceptually it has 4 layers):
Input Layer [1, 2]
Hidden Layer [5, 6, 7]
Output Layer [3, 4]
First rule: All outputs from the output layer are recurrent connections.
Second rule: All outputs from the input layer may be calculated first.
We create two arrays. One containing the order of calculation of all neurons and connections and one containing all the (potentially) recurrent connections.
Right now these arrays look somewhat like this:
Order of
calculation: [1->5, 2->7 ]
Recurrent: [ ]
Now we begin by looking at the output layer. Can we calculate Neuron 3? No? Because 6 is missing. Can we calculate 6? No? Because 5 is missing. And so on. It looks somewhat like this:
3, 6, 5, 7
The problem is that we are now stuck in a loop. So we introduce a temporary array storing all the neuron id's that we already visited:
[3, 6, 5, 7]
Now we ask: Can we calculate 7? No, because 6 is missing. But we already visited 6...
[3, 6, 5, 7,] <- 6
Third rule is: When you visit a neuron that has already been visited before, set the connection that you followed to this neuron as a recurrent connection.
Now your arrays look like this:
Order of
calculation: [1->5, 2->7 ]
Recurrent: [6->7 ]
Now you finish the process and in the end join the order of calculation array with your recurrent array so, that the recurrent array follows after the other array.
It looks somethat like this:
[1->5, 2->7, 7, 7->4, 7->5, 5, 5->6, 6, 6->3, 3, 4, 6->7]
Let's assume we have [x->y, y]
Where x->y is the calculation of x*weight(x->y)
And
Where y is the calculation of Sum(of inputs to y). So in this case Sum(x->y) or just x->y.
There are still some problems to solve here. For example: What if the only input of a neuron is a recurrent connection? But i guess you'll be able to solve this problem on your own...

Related

Artificial Neural Network: Multi-Layer Perception ORDER/PROCCESS

I am currently learning pattern recognition. I have a 7 year background in programming, so, I think like a programmer.
The documentation on ANN's tell me nothing about what order everything is processed, or at least does not make it very clear. This is annoying as I don't know how to code the formulas.
I found a nice gif which I hope is correct. Can someone please give me a step by step process of a artificial neural network back propagation with for example 2 inputs, 1 hidden layer with 3 nodes, 2 outputs using the sigmoid.
Here is the gif.
As Emile said you go layer by layer from input to output and then you propagate error backwards again layer by layer.
From what you have said I expect that you are trying to make "object oriented" implementation where every neuron is object. But that is not exactly the fastest nor easiest way. The most usual implementation is done by Matrix operations where
every layer is described by single Matrix (every row contains weights of one neuron plus threshold)
this is matlab code should do the trick:
output_hidden = logsig( hidden_layer * [inputs ; 1] );
inputs is column vector of inputs to layer
hidden_layer is matrix of weights plus one row which describes thresholds in hidden layer
output_hidden is again column vector of outputs of all neurons in layer which can be used as input into next layer
logsig is function which do sigmoid transform on all members of vector one by one
[inputs ; 1] creates new vector with 1 at the end of column vector inputs it is here because you need "virtual input" for thresholds to be multiplied with.
if you will think about it you will see that matrix multiplication will do exactly summation over all inputs multiplied by weight to output, you will also see that it doesn't matter in what order you do all the things. in order to implement it in any other language just find yourself good linear-algebra library. Implementing back-propagation is a bit trickier and you will need to tho some matrix transpositions (e.g. flipping matrix by diagonal)
As you can see in the gif, processing is per layer. As there are no connections within a layer, the processing order within a layer does not matter. Using the ANN (classifying) is done from input layer through hidden layers to the output layer. Training (using backpropagation) is done from output layer back to input layer.

How to determine the threshold for neuron firings in neural networks?

I have a simple task to classify people by their height and hair length to either MAN or WOMAN category using a neural network. Also teach it the pattern with some examples and then use it to classify on its own.
I have a basic understanding of neural networks but would really need some help here.
I know that each neuron divides the area to two subareas, basically that is why P = w0 + w1*x1 + w2*x2 + ... + wn*xn is being used here (weights are just moving the line if we consider geometric representation).
I do understand that each epoche should modify the weights to get closer to correct result, yet I have never program it and I am hopeless about how to start.
How should I proceed, meaning: How can I determine the threshold and how should I deal with the inputs?
It is not a homework rather than task for the ones who were interested. I am and I would like to understand it.
Looks like you are dealing with a simple Perceptron with a threshold activation function. Have a look at this question. Since you ARE using a bias neuron (w0), you would set the threshold to 0.
You then simply take the output of your network and compare it to 0, so you would e.g. output class 1 if x < 0 and class 2 if x > 0. You could model the case x=0 as "indistinct".
For learning the weights you need to apply the Delta Learning Rule which can be implemented very easily. But be careful: a perceptron with a simple threshold activation function can only be correct if your data are linearly separable. If you have more complex data you will need a Multilayer Perceptron and a nonlinear activation function like the Logistic Sigmoid Function.
Have a look at Geoffrey Hintons Coursera Course, Lecture 2 for details.
I've been working with machine learning lately (but I'm not an expert) but you should look at the Accord.NET framework. It contains all the common machine learning algorithme out of the box. So it's easy to take an existing samples and modify it instead of starting from scratch. Also, the developper of the framework is very helpful in the forum available on the same page.
With the available samples, you may also discover something better than neural network like the Kernel Support Vector Machine. If you stick to the neural network, have fun modifying all the different variables and by tryout and error you will understand how it work.
Have fun!
Since you said:
I know that each neuron divides the area to two subareas
&
weights are just moving the line if we consider geometric representation
I think you want to use perseptron or ADALINE neural networks. These neural networks can just classify linear separable patterns. since your input data is complicated, It's better to use a Multi layer Non-Linear Neural network. (my suggestion is a two layer neural network with tanh activation function) . For training these network you should use back propagation algorithm.
For answering to
how should I deal with the inputs?
I need to know more details about the inputs( Like: are they just height and hair length or there is more, what is their range and your resolution and etc.)
If you're dealing with just height and hair length I suggest that divide heights and length in some classes (for example 160cm-165cm, 165cm-170cm & etc.) and for each one of these classes set an On/Off input neuron. then put a hidden layer after all classes related to heights and another hidden layer after all classes related to hair length (tanh activation function). Number of neurons in these two hidden layer is determined based on number of training cases.
then take these two hidden layer output and send them to an aggregation layer with 1 output neuron.

Multiple Output Neural Network

I have built my first neural network in python, and i've been playing around with a few datasets; it's going well so far !
I have a quick question regarding modelling events with multiple outcomes: -
Say i wish to train a network to tell me the probability of each runner winning a 100m sprint. I would give the network all of the relevant data regarding each runner, and the number of outputs would be equal to the number of runners in the race.
My question is, using a sigmoid function, how can i ensure the sum of the outputs will be equal to 1.0 ? Will the network naturally learn to do this, or will i have to somehow make this happen explicitly ? If so, how would i go about doing this ?
Many Thanks.
The output from your neural network will approach 1. I don't think it will actually get to 1.
You actually don't need to see which output is equal to 1. Once you've trained your network up to a specific error level, when you present the inputs, just look for the maximum output in your output later. For example, let's say your output layer presents the following output: [0.0001, 0.00023, 0.0041, 0.99999412, 0.0012, 0.0002], then the runner that won the race is runner number 4.
So yes, your network will "learn" to produce 1, but it won't exactly be 1. This is why you train to within a certain error rate. I recently created a neural network to recognize handwritten digits, and this is the method that I used. In my output layer, I have a vector with 10 components. The first component represents 0, and the last component represents 9. So when I present a 4 to the network, I expect the output vector to look like [0, 0, 0, 0, 1, 0, 0, 0, 0, 0]. Of course, it's not what I get exactly, but it's what I train the network to provide. So to find which digit it is, I simply check to see which component has the highest output or score.
Now in your second question, I believe you're asking how the network would learn to provide the correct answer? To do this, you need to provide your network with some training data and train it until the output is under a certain error threshold. So what you need is a set of data that contains the inputs and the correct output. Initially your neural network will be set up with random weights (there are some algorithms that help you select better weights to minimize training time, but that's a little more advanced). Next you need a way to tell the neural network to learn from the data provided. So basically you give the data to the neural network and it provides an output, which is highly likely to be wrong. Then you compare that data with the expected (correct) output and you tell the neural network to update its weights so that it gets closer to the correct answer. You do this over and over again until the error is below a certain threshold.
The easiest way to do this is to implement the stochastic backpropagation algorithm. In this algorithm, you calculate the error between the actual output of the neural network and the expected output. Then you backpropagate the error from the output layer all the way up to the weights to the hidden layer, adjusting the weights as you go. Then you repeat this process until the error that you calculate is below a certain threshold. So during each step, you're getting closer and closer towards your solution.
You can use the algorithm described here. There is a decent amount of math involved, so be prepared for that! If you want to see an example of an implementation of this algorithm, you can take a look at this Java code that I have on github. The code uses momentum and a simple form of simulated annealing as well, but the standard backpropagation algorithm should be easily discernible. The Wikipedia article on backpropagation has a link to an implementation of the backpropagation algorithm in Python.
You're probably not going to understand the algorithm immediately; expect to spend some time understanding it and working through some of the math. I sat down with a pencil and paper as I was coding, and that's how I eventually understood what was going on.
Here are a few resources that should help you understand backpropagation a little better:
The learning process: backpropagation
Error backpropagation
If you want some more resources, you can also take a look at my answer here.
Basically you want a function of multiple real numbers that converts those real numbers into probabilities (each between 0 to 1, sum to 1). You can this easily by post processing the output of your network.
Your network gives you real numbers r1, r2, ..., rn that increases as the probability of each runner wins the race.
Then compute exp(r1), exp(r2), ..., and sum them up for ers = exp(r1) + exp(r2) + ... + exp(rn). Then the probability that the first racer wins is exp(r1) / ers.
This is a one use of the Boltzman distribution. http://en.wikipedia.org/wiki/Boltzmann_distribution
Your network should work around that and learn it naturally eventually.
To make the network learn that a little faster, here's what springs to mind first:
add an additional output called 'sum' (summing all the other output neurons) -- if you want all the output neurons to be in an separate layer, just add a layer of outputs, first numRunners outputs just connect to corresponding neuron in the previous layer, and the last numRunners+1-th neuron you connect to all the neurons from the previous layer, and fix the weights to 1)
the training set would contain 0-1 vectors for each runner (did-did not run), and the "expected" result would be a 0-1 vector 00..00001000..01 first 1 marking the runner that won the race, last 1 marking the "sum" of "probabilities"
for the unknown races, the network would try to predict which runner would win. Since the outputs have contiguous values (more-or-less :D) they can be read as "the certainty of the network that the runner would win the race" -- which is what you're looking for
Even without the additional sum neuron, this is the rough description of the way the training data should be arranged.

Effects of randomizing the order of inputs to a neural network

For my Advanced Algorithms and Data Structures class, my professor asked us to pick any topic that interested us. He also told us to research it and to try and implement a solution in it. I chose Neural Networks because it's something that I've wanted to learn for a long time.
I've been able to implement an AND, OR, and XOR using a neural network whose neurons use a step function for the activator. After that I tried to implement a back-propagating neural network that learns to recognize the XOR operator (using a sigmoid function as the activator). I was able to get this to work 90% of the time by using a 3-3-1 network (1 bias at the input and hidden layer, with weights initialized randomly). At other times it seems to get stuck in what I think is a local minima, but I am not sure (I've asked questions on this before and people have told me that there shouldn't be a local minima).
The 90% of the time it was working, I was consistently presenting my inputs in this order: [0, 0], [0, 1], [1, 0], [1, 0] with the expected output set to [0, 1, 1, 0]. When I present the values in the same order consistently, the network eventually learns the pattern. It actually doesn't matter in what order I send it in, as long as it is the exact same order for each epoch.
I then implemented a randomization of the training set, and so this time the order of inputs is sufficiently randomized. I've noticed now that my neural network gets stuck and the errors are decreasing, but at a very small rate (which is getting smaller at each epoch). After a while, the errors start oscillating around a value (so the error stops decreasing).
I'm a novice at this topic and everything I know so far is self-taught (reading tutorials, papers, etc.). Why does the order of presentation of inputs change the behavior of my network? Is it because the change in error is consistent from one input to the next (because the ordering is consistent), which makes it easy for the network to learn?
What can I do to fix this? I'm going over my backpropagation algorithm to make sure I've implemented it right; currently it is implemented with a learning rate and a momentum. I'm considering looking at other enhancements like an adaptive learning-rate. However, the XOR network is often portrayed as a very simple network and so I'm thinking that I shouldn't need to use a sophisticated backpropagation algorithm.
the order in which you present the observations (input vectors) comprising your training set to the network only matters in one respect--randomized arrangement of the observations according to the response variable is strongly preferred versus ordered arrangement.
For instance, suppose you have 150 observations comprising your training set, and for each the response variable is one of three class labels (class I, II, or III), such that observations 1-50 are in class I, 51-100 in class II, and 101-50 in class III. What you do not want to do is present them to the network in that order. In other words, you do not want the network to see all 50 observations in class I, then all 50 in class II, then all 50 in class III.
What happened during training your classifier? Well initially you were presenting the four observations to your network, unordered [0, 1, 1, 0].
I wonder what was the ordering of the input vectors in those instances in which your network failed to converge? If it was [1, 1, 0, 0], or [0, 1, 1, 1], this is consistent with this well-documented empirical rule i mentioned above.
On the other hand, i have to wonder whether this rule even applies in your case. The reason is that you have so few training instances that even if the order is [1, 1, 0, 0], training over multiple epochs (which i am sure you must be doing) will mean that this ordering looks more "randomized" rather than the exemplar i mentioned above (i.e., [1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0] is how the network would be presented with the training data over three epochs).
Some suggestions to diagnose the problem:
As i mentioned above, look at the ordering of your input vectors in the non-convergence cases--are they sorted by response variable?
In the non-convergence cases, look at your weight matrices (i assume you have two of them). Look for any values that are very large (e.g., 100x the others, or 100x the value it was initialized with). Large weights can cause overflow.

How to code an artificial neural network (Tic-tac-toe)? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I want to play Tic-tac-toe using an artificial neural network. My configuration for the network is as follows:
For each of the 9 fields, I use 2 input neuron. So I have 18 input neurons, of course. For every field, I have 1 input neuron for a piece of Player 1 and 1 neuron for a piece of Player 2. In addition to that, I have 1 output neuron which gives an evaluation of the current board position. The higher the output value is, the better is the position for Player 1. The lower it is, the better is it for Player 2.
But my problem is: How could I code that neural network? My idea was to use an Array[1-18] for the input neurons. The values of this array are the input weights. The I would walk through the array using a loop. Whenever there is a neuron to be activated, I add the weight to the output value. So the output value is the sum of the weights of the activated input neurons:
Output = SUM(ActivatedInputNeurons)
Do you think this is a good way of programming the network? Do you have better ideas?
I hope you can help me. Thanks in advance!
Well, you have an input layer of 18 neurons, and an output layer of 1 neuron. That's OK. However, you need to give your neural net the opportunity to put the inputs into relation. For that, you need at least one intermediate layer. I would propose to use 9 neurons in the intermediate layer. Each of these should be connected to each input neuron, and the output neuron should be connected to each intermediate. Each such connection has a weight, and each neuron has an activation level.
Then, you go through all neurons, a layer at a time. The input layer is just activated with the board state. For all further neurons, you go through all its respective connections and sum over the product of the connected neuron's activation level and the weight of the connection. Finally, you calculate the activation level by applying a sigmoid function on this sum.
This is the working principle. Now, you need to train this net to get better results. There are several algorithms for this, you will have to do some googling and reading. Finally, you might want to adjust the number of neurons and layers when the results don't get convincing fast enough. For example, you could reduce the input layer to 9 neurons and activate them with +1 for an X and -1 for an O. Perhaps adding another intermediate layer yields better results, or increasing the number of neurons of a layer.
I don't particularly understand how you expect to get a meaningful summary of the board situation out of one output neuron. I would more look at having:
I I I O O O
I I I x O O O
I I I O O O
9 input neurons 9 output neurons
in a fully connected network, i.e. 81 weights. Then train the output neurons for the relative desirability of playing in that position.
Have a look at my Tic project. I've solved this problem with both neural network and genetic algorithm. The source code is freely available.
http://www.roncemer.com/tic-tac-toe-an-experiment-in-machine-learning
I think you should implement a 'traditional' feed-forward ANN using transfer functions, as that allows you to train it using back-propagation. The code for these usually ends up being a few lines of code, something like this:
SetupInputs();
for (l = 1 .. layers.count)
for (i = 0 .. layers[l].count)
sum = 0
for (j = 0 .. layers[l-1].count)
sum += layers[l-1][j] * weights[l-1][j]
layers[l][i] = TransferFunction(sum)
This is an excellent starter project for AI coding, but coming up with a complete solution will be way to big of an answer for SO.
As with most software, I recommend using an object-oriented design. For example: Define a Neuron class which has inputs, weights, and an output function. Then, create several of these Neuron objects in order to build your network.
See the wikipedia article on artificial neural networks for a good starting point.
Good luck with the code! Sounds like a lot of fun.
It is not a direct answer to your question, but you should have a look at the following framework/tool: SNNS or its Java counterpart JavaNNS. I'm pretty sure that there you'll find an answer to your question.
After adding the weights, you need to normalize the sum using a function, people usually use TANH, if you want to allow negative numbers.
edit:
Here is a java multilayer perceptron implementation that i worked on a few years ago. this one was used for checkers, but with less inputs you can use it for checkers too.
Also, you need to probably figure out a way to teach it to win, but thats another problem
You will save time if you use neural network library like FANN or Neuroph.
One way to encode your input is by 9 input neurons. The output is also good to be 9 neurons. What I do not see in the other replays is the size of the hidden layer. I suppose you are going to use MLP with traditional 3 layers. The size of the hidden layer is always mystery. I would try 10 hidden neurons.
If the transfer function is sigmoid you can encode input as follows:
0.0 - O player.
1.0 - X player.
0.5 - Empty.
The output of the ANN will be 9 real numbers. In this case some of the cells will be occupied already. You can search for the highest output value which corresponds to an empty cell.

Resources