How should perceptron work with 0 inputs (for AND/OR) - artificial-intelligence

I am trying to make a simple perceptron to perform logical AND but I do not know how to solve the 0 input issue.
weight+=error*learning_rate*input
Also when input is 0, no matter what the error was, the weight will not change.
And one more question, in general, when training the perceptron, can I repeat the examples for both sets (lets say have one for 0 and one for 1) or do they need to be different?

That is an interesting and very important insight. That's why you usually should have a bias in a neural network.
Imagine the decision surface of a perceptron as a line of the form
y = w*x+b
When you remove the b (bias) from the equation, you will only be able to learn lines that pass (0, 0).

Related

How is the range of the last layer of a Neural Network determined when using ReLU

I'm relatively new to Neural Networks.
Atm I am trying to program a Neural Network for simple image recognition of numbers between 0 and 10.
The activation function I'm aiming for is ReLU (rectified linear unit).
With the sigmoid-function it is pretty clear how you can determine a probability for a certain case in the end (because its between 0 and 1).
But as far as I understand it, with the ReLU we don't have these limitations, but can get any value as a sum of previous "neurons" in the end.
So how is this commonly solved?
Do I just take the biggest of all values and say thats probability 100%?
Do I sum up all values and say thats the 100%?
Or is there another aproach I can't see atm?
I hope my question is understandable.
Thanks in advance for taking the time, looking at my question.
You can't use ReLU function as the output function for classification tasks because, as you mentioned, its range can't represent probability 0 to 1. That's why it is used only for regression tasks and hidden layers.
For binary classification, you have to use output function with range between 0 to 1 such as sigmoid. In your case, you would need a multidimensional extension such as softmax function.

Multiple Output Neural Network

I have built my first neural network in python, and i've been playing around with a few datasets; it's going well so far !
I have a quick question regarding modelling events with multiple outcomes: -
Say i wish to train a network to tell me the probability of each runner winning a 100m sprint. I would give the network all of the relevant data regarding each runner, and the number of outputs would be equal to the number of runners in the race.
My question is, using a sigmoid function, how can i ensure the sum of the outputs will be equal to 1.0 ? Will the network naturally learn to do this, or will i have to somehow make this happen explicitly ? If so, how would i go about doing this ?
Many Thanks.
The output from your neural network will approach 1. I don't think it will actually get to 1.
You actually don't need to see which output is equal to 1. Once you've trained your network up to a specific error level, when you present the inputs, just look for the maximum output in your output later. For example, let's say your output layer presents the following output: [0.0001, 0.00023, 0.0041, 0.99999412, 0.0012, 0.0002], then the runner that won the race is runner number 4.
So yes, your network will "learn" to produce 1, but it won't exactly be 1. This is why you train to within a certain error rate. I recently created a neural network to recognize handwritten digits, and this is the method that I used. In my output layer, I have a vector with 10 components. The first component represents 0, and the last component represents 9. So when I present a 4 to the network, I expect the output vector to look like [0, 0, 0, 0, 1, 0, 0, 0, 0, 0]. Of course, it's not what I get exactly, but it's what I train the network to provide. So to find which digit it is, I simply check to see which component has the highest output or score.
Now in your second question, I believe you're asking how the network would learn to provide the correct answer? To do this, you need to provide your network with some training data and train it until the output is under a certain error threshold. So what you need is a set of data that contains the inputs and the correct output. Initially your neural network will be set up with random weights (there are some algorithms that help you select better weights to minimize training time, but that's a little more advanced). Next you need a way to tell the neural network to learn from the data provided. So basically you give the data to the neural network and it provides an output, which is highly likely to be wrong. Then you compare that data with the expected (correct) output and you tell the neural network to update its weights so that it gets closer to the correct answer. You do this over and over again until the error is below a certain threshold.
The easiest way to do this is to implement the stochastic backpropagation algorithm. In this algorithm, you calculate the error between the actual output of the neural network and the expected output. Then you backpropagate the error from the output layer all the way up to the weights to the hidden layer, adjusting the weights as you go. Then you repeat this process until the error that you calculate is below a certain threshold. So during each step, you're getting closer and closer towards your solution.
You can use the algorithm described here. There is a decent amount of math involved, so be prepared for that! If you want to see an example of an implementation of this algorithm, you can take a look at this Java code that I have on github. The code uses momentum and a simple form of simulated annealing as well, but the standard backpropagation algorithm should be easily discernible. The Wikipedia article on backpropagation has a link to an implementation of the backpropagation algorithm in Python.
You're probably not going to understand the algorithm immediately; expect to spend some time understanding it and working through some of the math. I sat down with a pencil and paper as I was coding, and that's how I eventually understood what was going on.
Here are a few resources that should help you understand backpropagation a little better:
The learning process: backpropagation
Error backpropagation
If you want some more resources, you can also take a look at my answer here.
Basically you want a function of multiple real numbers that converts those real numbers into probabilities (each between 0 to 1, sum to 1). You can this easily by post processing the output of your network.
Your network gives you real numbers r1, r2, ..., rn that increases as the probability of each runner wins the race.
Then compute exp(r1), exp(r2), ..., and sum them up for ers = exp(r1) + exp(r2) + ... + exp(rn). Then the probability that the first racer wins is exp(r1) / ers.
This is a one use of the Boltzman distribution. http://en.wikipedia.org/wiki/Boltzmann_distribution
Your network should work around that and learn it naturally eventually.
To make the network learn that a little faster, here's what springs to mind first:
add an additional output called 'sum' (summing all the other output neurons) -- if you want all the output neurons to be in an separate layer, just add a layer of outputs, first numRunners outputs just connect to corresponding neuron in the previous layer, and the last numRunners+1-th neuron you connect to all the neurons from the previous layer, and fix the weights to 1)
the training set would contain 0-1 vectors for each runner (did-did not run), and the "expected" result would be a 0-1 vector 00..00001000..01 first 1 marking the runner that won the race, last 1 marking the "sum" of "probabilities"
for the unknown races, the network would try to predict which runner would win. Since the outputs have contiguous values (more-or-less :D) they can be read as "the certainty of the network that the runner would win the race" -- which is what you're looking for
Even without the additional sum neuron, this is the rough description of the way the training data should be arranged.

Artificial neural networks

I want to know whether Artificial Neural Networks can be applied to discrete values inputs? I know they can be applied to continuous valued inputs, but can they be applied to discrete valued ones? Also, will perform well for discrete valued inputs?
Yes, artificial neural networks may be applied to data featuring discrete-value input variables. In the most commonly used neural network architectures (which are numeric), discrete inputs are typically represented by a series of dummy variables, just as in statistical regression. Also, as with regression, one less than the number of distinct values dummy variables is needed. There are other methods, but this is the most straightforward.
Well, good question let me say!
First of all let me answer directly yes to your question!
The answer implies to consider few aspects about the use and the implementation of the network itself.
Than let me explain why:
The easiest way is to normalize input as usual, this is the first rule of thumb with NN,
than let the neural network compute the task, and once you have your output, invert the normalization to get the output in the original range but still continuous, to get back descrete values just consider the integer part of your output. It is easy, it works and is fine, DONE! A good result just depends on the topology you design for you network.
As a plus you could consider the use of "step" transfer function, instead of "tan-sigmoid", between layers just to strenght and mimic a sort of digitization forcing the output to be just 0 or 1. But you should reconsider also the starting normalization as well as the use of well tuned thresholds.
NB: this latter trick is not really necessary but could give some secondary benefits; maybe test it in a second stage of your development and look at the differences.
PS: Just let me suggest something that should apply to your issue; if you would be smart take into account the use of some fuzzy logic on your learning algorithm ;-)
Cheers!
I'm late on this question, but this may help someone.
Say you have a categorical output variable, for example 3 different categories (0, 1 and 2),
outputs
0
2
1
2
1
0
then becomes
1, 0, 0
0, 0, 1
0, 1, 0
0, 0, 1
0, 1, 0
1, 0, 0
A possible NN output result is
0.2, 0.3, 0.5 (winner is categ 2)
0.05, 0.9, 0.05 (winner is categ 1)
...
Then your NN hill have 3 output nodes in this case, so take the max value.
To improve this, use entropy as a error measure and a softmax activation on the output layer, so that the outputs sum up to 1.
The purpose of a neural network is to approximate complicated functions by interpolating samples. As such, they tend to be a poor fit for discrete data, unless that data can be expressed by thresholding a continuous function. Depending on your problem, there are likely to be much more effective learning methods.

Determining Bias for Neural network Perceptrons?

This is one thing in my beginning of understand neural networks is I don't quite understand what to initially set a "bias" at?
I understand the Perceptron calculates it's output based on:
P * W + b > 0
and then you could calculate a learning pattern based on b = b + [ G - O ] where G is the Correct Output, and O is the actual Output (1 or 0) to calculate a new bias...but what about an initial bias.....I don't really understand how this is calculated, or what initial value should be used besides just "guessing", is there any type of formula for this?
Pardon if Im mistaken on anything, Im still learning the whole Neural network idea before I implement my own (crappy) one.
The same goes for learning rate.....I mean most books and such just kinda "pick one" for μ.
The short answer is, it depends...
In most cases (I believe) you can just treat the bias just like any other weight (so it might get initialised to some small random value), and it will get updated as you train your network. The idea is that all the biases and weights will end up converging on some useful set of values.
However, you can also set the weights manually (with no training) to get some special behaviours: for example, you can use the bias to make a perceptron behave like a logic gate (assume binary inputs X1 and X2 are either 0 or 1, and the activation function is scaled to give an output of 0 or 1).
OR gate: W1=1, W2=1, Bias=0
AND gate: W1=1, W2=1, Bias=-1
You can solve the classic XOR problem by using AND and OR as the first layer in a multilayer network, and feed them into a third perceptron with W1=3 (from the OR gate), W2=-2 (from the AND gate) and Bias=-2, like this:
(Note: these values will be different if your activation function is scaled to -1/+1, ie a SGN function)
As to how to set the learning rate, that also depends(!) but I think usually something like 0.01 is recommended. Basically you want the system to learn as quickly as possible, but not so quickly that the weights fail to converge properly.
Since #Richard has already answered the greater part of the question I'll only elaborate on the learning rate. From what I've read (and it's working) there is a very simple formula that you can use in order to update the learning rate for each iteration k and it is:
learningRate_k = constant/k
Here obviously the 0th iteration is excluded since you'll be dividing by zero. The constant can be whatever you want it to be (except 0 of course since it will not be making any sense :D) but the easiest is naturally 1 so you get
learningRate_k = 1/k
The resulting series obeys two basic rules:
lim_(t->inf) SUM from k=1 to t (learningRate_k) = inf
lim_(t->inf) SUM from k=1 to t (learningRate_k^2) < inf
Note that the convergence of your perceptron is directly connected to the learning rate series. It starts big (for k=1 you get 1/1=1) and gets smaller and smaller with each and every update of your perceptron since - as in real life - when you encounter something new at the beginning you learn a lot but later on you learn less and less.

Determining which inputs to weigh in an evolutionary algorithm

I once wrote a Tetris AI that played Tetris quite well. The algorithm I used (described in this paper) is a two-step process.
In the first step, the programmer decides to track inputs that are "interesting" to the problem. In Tetris we might be interested in tracking how many gaps there are in a row because minimizing gaps could help place future pieces more easily. Another might be the average column height because it may be a bad idea to take risks if you're about to lose.
The second step is determining weights associated with each input. This is the part where I used a genetic algorithm. Any learning algorithm will do here, as long as the weights are adjusted over time based on the results. The idea is to let the computer decide how the input relates to the solution.
Using these inputs and their weights we can determine the value of taking any action. For example, if putting the straight line shape all the way in the right column will eliminate the gaps of 4 different rows, then this action could get a very high score if its weight is high. Likewise, laying it flat on top might actually cause gaps and so that action gets a low score.
I've always wondered if there's a way to apply a learning algorithm to the first step, where we find "interesting" potential inputs. It seems possible to write an algorithm where the computer first learns what inputs might be useful, then applies learning to weigh those inputs. Has anything been done like this before? Is it already being used in any AI applications?
In neural networks, you can select 'interesting' potential inputs by finding the ones that have the strongest correlation, positive or negative, with the classifications you're training for. I imagine you can do similarly in other contexts.
I think I might approach the problem you're describing by feeding more primitive data to a learning algorithm. For instance, a tetris game state may be described by the list of occupied cells. A string of bits describing this information would be a suitable input to that stage of the learning algorithm. actually training on that is still challenging; how do you know whether those are useful results. I suppose you could roll the whole algorithm into a single blob, where the algorithm is fed with the successive states of play and the output would just be the block placements, with higher scoring algorithms selected for future generations.
Another choice might be to use a large corpus of plays from other sources; such as recorded plays from human players or a hand-crafted ai, and select the algorithms who's outputs bear a strong correlation to some interesting fact or another from the future play, such as the score earned over the next 10 moves.
Yes, there is a way.
If you choose M selected features there are 2^M subsets, so there is a lot to look at.
I would to the following:
For each subset S
run your code to optimize the weights W
save S and the corresponding W
Then for each pair S-W, you can run G games for each pair and save the score L for each one. Now you have a table like this:
feature1 feature2 feature3 featureM subset_code game_number scoreL
1 0 1 1 S1 1 10500
1 0 1 1 S1 2 6230
...
0 1 1 0 S2 G + 1 30120
0 1 1 0 S2 G + 2 25900
Now you can run some component selection algorithm (PCA for example) and decide which features are worth to explain scoreL.
A tip: When running the code to optimize W, seed the random number generator, so that each different 'evolving brain' is tested against the same piece sequence.
I hope it helps in something!

Resources