Genetic Algorithm and Neural Network - artificial-intelligence

Right now I'm doing a research about Genetic Algorithm and Neural Network, I want to use the Genetic Algorithm to train the Neural Network and use the Neural Network to solve OCR problem, what I'm still don't understand is the training method, let's say I have 5 training set each set have 26 character (A-Z), should I do the training from A1-Z1 to A5-Z5 (per set) or A1-A5 to Z1-Z5 (per character) and how many generations should be generated per character?or until the error is minimum?
I think that's all I want to ask right now
if is there anything unclear about my question please tell me
Thank You

Actually, you really can develop a neural network for each letter. Just use Autoassociative Networks (Autoencoders). These networks are trained so that the output reproduces the input. That way you would train 26 networks. At validation time, for each letter, the network that obtains the lowest error is the network that represents the letter. This approach has achieved excellent results for classification problems.

A neural network with 26 outputs is odd. You should build a network for each letter, but let it see all letters during training so it could tell you when it sees intended letter.
For number of generations typically you should meet one of two conditions: Error becomes lower then a threshold or number of generations exceed a limit. It depends on you to choose these params.

It will be very slowly to train ANN with GA. Maybe you should think for some hybrid approach. You will need to do a lot of image preprocessing before to feed data in the ANN. Also you will need to design your ANN very carefully. You should think about the size of input and the size of output. With GA you can optimize two things: 1. ANN topology or 2. ANN weights.

Related

What is genetic drift and how does it affect EAs?

I have read in some articles on evolutionary computing that the algorithms generally converge to a single solution due to the phenomenon of genetic drift. There is a lot of content on the Internet, but I can't get a deep understanding of this concept. I need to know, simply and precisely:
What is genetic drift in the context of evolutionary computing?
How does it affect the convergence of an evolutionary algorithm?
To better understand the original concept of genetic drift (Biology), I suggest you read this Khan Academy's article. Simply put, you can think of it as an evolutionary phenomenon in which the frequency of one or more alleles (versions of a gene) in a population changes due to random factors (unrelated to the fitness of each individual). If the fittest individual of a population is struck, out of bad luck, by a lightning and dies before reproducing, he won't leave offspring (although he has the highest fitness!). This is an example (somewhat absurd, I know) of genetic drift.
Now, in the specific context of evolutionary algorithms, this paper provides a good summary on the subject:
EAs genetic drift can be as a result of a combination of factors,
primarily related to selection, fitness function and representation.
It happens by unintentional loss of genotypes. For example, random
chance that a good genotype solution never gets selected for
reproduction. Or, if there is a ‘lifespan’ to a solution and it dies
before it can reproduce. Normally such a genotype only resides in the
population for a limited number of generations.
(Sloss & Gustafson, 2019)
Finally, I will give you a real example of genetic drift acting on a genetic algorithm. Recently, I've used a simple neuroevolution algorithm to create an agent capable of playing the Snake game (GitHub repo). In my implementation of the game, the apples appear in random positions of the screen. When executing the evolutionary process for the first time, I noticed a big fluctuation in the population's best fitness between consecutive generations - overall, it wasn't improving much. Because of this, my algorithm was unable to converge to a good solution.
After some debugging, I found out that this was being caused by genetic drift. Because the apples spawned in random positions, some individuals, not necessarily the fittest, were lucky and got "easy apples", thus achieving a high fitness and leaving more offspring. Do you see the problem here?
Suppose that snake A is better at the game than snake B, because it can move towards the food, while B only moves randomly. Now, suppose that the first food that appeared for snake A was in a corner of the screen (a difficult position) and A died shortly after eating the apple. Now, suppose that snake B was lucky enough to have 3 apples spawn in a row, one after the other. Although B is "dumber" than A, it will leave more offspring, because it achieved a greater fitness. B's offspring will "pollute" the next generation, because they'll probably be "dumb" like B.
I solved the problem using a better apple positioning algorithm (I defined a minimum distance between the spawning position of two consecutive apples) and by calculating each individual's final fitness as the average of its fitness in several playing sessions. This greatly reduced (although it did not eliminate) the interference of genetic drift in my algorithm.
I hope this helps. You can also take a look at this video (it's in Portuguese, but English subtitles are available), where I explain some of the strategies I used to make the Snake AI.

Non-converging Neural Network in C

I wrote my first feed-forward neural network in C, using the sigmoid 1.0 / (1.0 + exp(-x)) as activation function and gradient descent to adjust the weights. I tried to approximate sin(x) to make sure my network works. However, the output of the neuron on the output layer seems to always oscillate between the extreme values 0 and 1 and the weights of the neurons grow to absurd sizes, no matter how many hidden layers there are, how many neurons are in the hidden layer(s), how many training samples I provide, or even what the target outputs are.
1) Are there any standard 'tried and tested' data sets used to proof-test neural networks for errors? If yes, what structures work best (e.g. numbers of neuron(s) in the hidden layer) to converge to the desired output?
2) Are there any common errors that generate the same symptoms? I found this thread, but the issue was because of faulty data, which I believe is not my case.
3) Is there any preferred way of training the network? In my implementation I cycle through the training sets and adjust the weights each time, then rinse and repeat ~1000 times. Is there any other order that works better?
So, to sum up:
Assuming that your gradient propagation works properly usually the values of parameters like topology, learning rate, batch size or value of a constant connected with weight penalty (L1 and L2 decay) are computed using a techniques called grid search or random search. It was empirically proved that random search performs better in this task.
The most common reason of weight divergence is wrong learning rate. Big value of it might make learning really hard. But on the other hand - when learning rate is too small - learning process might take a really long time. Usually - you should babysit the learning phase. The specified instruction might be found e.g. here.
In your learning phase you used a technique called SGD. Usually - it may achieve good results but it's vulnerable to variance of data sets and big values of learning rates. What I advice you is to use batch learning and set a batch size as additional learning parameter learnt during grid or random search. You can read about here e.g. here.
Another thing which you might consider is to change your activation function to tanh or relu. There are a lot of problems with saturation regions of sigmoid and it usually needs a proper initialization. You can read about it here.

Musical instrument automated teaching via microcontroller

The premise of the project will be:
There will be a prerecorded track of guitar, for example. The student will play the same track on his guitar. I need to compare these two sounds and find out whether the student played it good or not. I will be using STM32 microcontroller and Keil uVision software for simulation at first (programming at C).
I know that I will be using an ADC using DMA and I assume I would Fast Fourier Transform the wave signals and then somehow compare the two frequency responses. Also, would there be a problem with tempo? I mean it is not logical that every note will hit on the exact ms and then compare it
I've seen some methods like Hidden Markov Model or Goertzel algorithm but I am not quite sure what they do and if they are optimal and easy for the project. So my question would be: is there a specific algorithm that suits best and how would I implement it on my code (since I haven't really started working on code, mostly theoretical reading so far).
edit: I've made a similar post yesterday but my premise was too complicated to solve so I am posting on a new premise, much easier to accomplish. I thought not to ask on the first thread since it would mix up two different issues.
Assuming that you can use FFT to find out which notes are playing at what time (this may prove to be difficult for distorted guitar chords), you can do this e.g. 10 times per second for both streams, and then check how often the notes in both streams match. This will give you a percentage, if you need a binary value you'd have to use a threshold value.
If both streams are not equal length (different tempo) then you will have to stretch. You don't have to stretch the actual audio, just the times between the note measurements (e.g. every 100 ms for the first stream and every 125 seconds for the second stream).
So the biggest problem may be to find out what notes are playing at any given moment in time.
I'd start with constructing a mapping of frequencies to notes. Also it may be a good idea to low-pass filter the signal at around 1100 Hz to already get rid of some of the unwanted harmonics (you can't play higher than that on the guitar anyway) and similaryly high-pass filter the signal at 80 Hz. Then after the FFT or DFT (not sure if it matters which you choose), find the frequencies that are close to the real note frequencies. Then pick the loudest one and those that are above a certain threshold relative to the loudest one (e.g. drop anything that is less than half as loud as the loudest one, but some experimentation will be needed to find a good threshold value).

Kernel methods for large scale dataset

Kernel-based classifier usually requires O(n^3) training time because of the inner-product computation between two instances. To speed up the training, inner-product values can be pre-computed and stored in a two-dimensional array. However when the no. of instances is very large, say over 100,000, there will not be sufficient memory to do so.
So any better idea for this?
For modern implementations of support vector machines, the scaling of the training algorithm is dependent on lots of factors, such as the nature of the training data and kernel that you are using. The scaling factor of O(n^3) is an analytical result and isn't particularly useful in predicting how SVM training will scale in real-world situations. For example, empirical estimates of the training algorithm used by SVMLight put the scaling against training set size to be approximately O(n^2).
I would suggest you ask this question in the kernel machines forum. I think you're more likely to get a better answer than on Stack Overflow, which is more of a general-purpose programming site.
The Relevance Vector Machine has a sequential training mode in which you do not need to keep the entire kernel matrix in memory. You can basically calculate a column at a time, determine if it appears relevant, and throw it away otherwise. I have not had much luck with it myself, though, and the RVM has some other issues. There is most likely a better solution in the realm of Gaussian Processes. I haven't really sat down much with those, but I have seen mention of an online algorithm for it.
I am not a numerical analyst, but isn't the QR decomposition which you need to do ordinary least-squares linear regression also O(n^3)?
Anyways, you'll probably want to search the literature (since this is fairly new stuff) for online learning or active learning versions of the algorithm you're using. The general idea is to either discard data far from your decision boundary or to not include them in the first place. The danger is that you might get locked into a bad local maximum and then your online/active algorithm will ignore data that would help you get out.

Are evolutionary algorithms and neural networks used in the same domains? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 12 months ago.
Improve this question
I am trying to get a feel for the difference between the various classes of machine-learning algorithms.
I understand that the implementations of evolutionary algorithms are quite different from the implementations of neural networks.
However, they both seem to be geared at determining a correlation between inputs and outputs from a potentially noisy set of training/historical data.
From a qualitative perspective, are there problem domains that are better targets for neural networks as opposed to evolutionary algorithms?
I've skimmed some articles that suggest using them in a complementary fashion. Is there a decent example of a use case for that?
Here is the deal: in machine learning problems, you typically have two components:
a) The model (function class, etc)
b) Methods of fitting the model (optimizaiton algorithms)
Neural networks are a model: given a layout and a setting of weights, the neural net produces some output. There exist some canonical methods of fitting neural nets, such as backpropagation, contrastive divergence, etc. However, the big point of neural networks is that if someone gave you the 'right' weights, you'd do well on the problem.
Evolutionary algorithms address the second part -- fitting the model. Again, there are some canonical models that go with evolutionary algorithms: for example, evolutionary programming typically tries to optimize over all programs of a particular type. However, EAs are essentially a way of finding the right parameter values for a particular model. Usually, you write your model parameters in such a way that the crossover operation is a reasonable thing to do and turn the EA crank to get a reasonable setting of parameters out.
Now, you could, for example, use evolutionary algorithms to train a neural network and I'm sure it's been done. However, the critical bit that EA require to work is that the crossover operation must be a reasonable thing to do -- by taking part of the parameters from one reasonable setting and the rest from another reasonable setting, you'll often end up with an even better parameter setting. Most times EA is used, this is not the case and it ends up being something like simulated annealing, only more confusing and inefficient.
Problems that require "intuition" are better suited to ANNs, for example hand writing recognition. You train a neural network with a huge amount of input and rate it until you're done (this takes a long time), but afterwards you have a blackbox algorithm/system that can "guess" the hand writing, so you keep your little brain and use it as a module for many years or something. Because training a quality ANN for a complex problem can take months I'm worst case, and luck.
Most other evolutionary algorithms "calculate" an adhoc solution on the spot, in a sort of hill climbing pattern.
Also as pointed out in another answer, during runtime an ANN can "guess" faster than most other evolutionary algorithms can "calculate". However one must be careful, since the ANN is just "guessing" an it might be wrong.
Evolutionary, or more generically genetic algorithms, and neural networks can both be used for similar objectives, and other answers describe well the difference.
However, there is one specific case where evolutionary algorithms are more indicated than neural networks: when the solution space is non-differentiable.
Indeed, neural networks use gradient descent to learn from backpropagation (or similar algorithm). The calculation of a gradient relies on derivatives, which needs a continuous and derivative space, in other words that you can shift gradually and progressively from one solution to the next.
If your solution space is non-differentiable (ie, either you can choose solution A, or B, or C, but nothing in the middle like 0.5% A + 0.5% B, so that some solutions are impossible), then you are trying to fit a non-differentiable function, and then neural networks cannot work.
(Side note: discrete state space partially share the same issue and so are a common issue for most algorithms but there are usually some work done to workaround these issues, for example decision trees can work easily on categorical variables, while other models like svm have more difficulties and generally require encoding categorical variables into continuous values).
In this case, evolutionary and genetic algorithms are perfect, one could even say a god send, since they can "jump" from one solution to the next without any issue. They don't care that some solutions are impossible, nor that the gaps are big or small between subset of the possible state space, evolutionary algorithms can jump randomly far away or close by until they find appropriate solutions.
Also worth mentioning is that evolutionary algorithms are not subject to the curse of dimensionality as much as any other machine learning algorithm, including neural networks. This might seem a bit counter intuitive, since the convergence to a global maximum is not guaranteed, and the procedure might seem to be slow to evolve to a good solution, but in practice the selection procedure works fast and converges to a good local maximum.
This makes evolutionary algorithms a very versatile and generic tool to approach naively any problem, and one of the very few tools to deal with either non-differentiable functions, discrete functions, or with astronomically high dimensional datasets.
Look at Neuro Evolution. (NE)
The current best methods is NEAT and HyperNEAT by Kenneth Stanley.
Genetic Algorithms only find a genome of some sort; It's great to create the genome of a neural network, because you get the reactive nature of the neural network, rather than just a bunch of static genes.
There's not many limits to what it can learn. But it takes time of course. Neural topology have to be evolved through the usual mutation and crossover, as well as weights updated. There can be no back propagation.
Also you can train it with a fitness function, which is thus superior to back propagation when you do not know what the output should be. Perfect for learning complex behaviour for systems that you do not know any optimal strategies for. Only problem is that it'll learn behaviour you didn't anticipate. Often that behaviour can be very alien, although it does exactly what you rewarded it for in the fitness function. Thus you'll be using as much time deriving fitness functions as you would have creating output sets for backpropagation :P
Evolutionary algorithms (EAs) are slow because they rely on unsupervised learning: EAs are told that some solutions are better than others, but not how to improve them. Neural networks are generally faster, being an instance of supervised learning: they know how to make a solution better by using gradient descent within a function space over certain parameters; this allows them to reach a valid solution faster. Neural networks are often used when there isn't enough knowledge about the problem for other methods to work.
In terms of problem domains, I compare artificial neural networks trained by backpropagation to an evolutionary algorithm.
An evolutionary algorithm deploys a randomized beamsearch, that means your evolutionary operators develop candidates to be tested and compared by their fitness. Those operators are usually non deterministic and you can design them so they can both find candidates in close proximity and candidates that are further away in the parameter space to overcome the problem of getting stuck in local optima.
However the success of a EA approach greatly depends on the model you develop, which is a tradeoff between high expression potential (you might overfit) and generality (the model might not be able to express the target function).
Because neural networks usually are multilayered the parameter space is not convex and contains local optima, the gradient descent algorithms might get stuck in. The gradient descent is a deterministic algorithm, that searches through close proximity. That's why neural networks usually are randomly initialised and why you should train many more than one model.
Moreover you know each hidden node in a neural network defines a hyperplane you can design a neural network so it fits your problem well. There are some techniques to prevent neural networks from overfitting.
All in all, neural networks might be trained fast and get reasonable results with few efford (just try some parameters). In theory a neural network that is large enough is able to approximate every target function, which on the other side makes it prone to overfitting. Evolutionary algorithms require you to make a lot of design choices to get good results, the hardest probably being which model to optimise. But EA are able to search through very complex problem spaces (in a manner you define) and get good results quickly. AEs even can stay successful when the problem (the target function) is changing over time.
Tom Mitchell's Machine Learning Book:
http://www.cs.cmu.edu/~tom/mlbook.html
Evolutionary algorithms (EA) represent a manner of training a model, where as neuronal nets (NN) ARE a model. Most commonly throughout the literature, you will find that NNs are trained using the backpropagation algorithm. This method is very attractive to mathematicians BUT it requires that you can express the error rate of the model using a mathematical formula. This is the case for situations in which you know lots of input and output values for the function that you are trying to approximate. This problem can be modeled mathematically, as the minimization of a loss function, which can be achieved thanks to calculus (and that is why mathematicians love it).
But neuronal nets are also useful for modeling systems which try to maximize or minimize some outcome, the formula of which is very difficult to model mathematically. For instance, a neuronal net could control the muscles of a cyborg to achieve running. At each different time frame, the model would have to establish how much tension should be present in each muscle of the cyborg's body, based on the input from various sensors. It is impossible to provide such training data. EAs allow training by only providing a manner of evaluation of the model. For our example, we would punish falling and reward the traveled distance across a surface (in a fixed timeframe). EA would just select the models which do their best in this sense. First generations suck but, surprisingly, after a few hundred generations, such individuals achieve very "natural" movements and manage to run without falling off. Such models may also be capable of dealing with obstacles and external physical forces.

Resources