Optimizing functions with GAs - artificial-intelligence

Firstly, sorry if this is not the right stack exchange for this question it might fit better on math.
I've been working on a project to maximize a functions output using a GA. However, from the limited calculus I know I thought there were methods to find the maximum of a mathematical function using calculus? I'd assume the reason GAs are sometimes used to maximize functions is because there are functions where the mathematical methods don't work. I wondered what conditions those were? Maybe that it's not continuous or differentiable?

A superficial explanation
For simple™ mathematical functions, the solution would be to use your calculus and find the derivate function f'(x). If it's not mathematically possible to differentiate the error function f(x), you need to break out the other tools from you math-box. If the error function's solution space is convex, you could possibly use a numerical approach to find your optimum such as the gradient descent or the conjugate gradient algorithm.
The Genetic Algorithm (and other search algorithms) comes in handy if the function you are trying to optimize consists of multiple undefined variables. This would make calculating the optimum using calculus very difficult. If you are familiar with Neural Networks: the genetic algorithm has been applied to find optimal weight configurations for neural networks. In these problem instances, there might be thousands of unknown variables (weights).
A mathematical approach would have to search the solution space in some incremental approach, the genetic algorithm is a bit "all over the place"™. By adjusting the mutation frequency, the GA would be able to jump around in the search space.
A (oversimplified) difficult solution space:
Image: Ciumac Sergiu

Well, for starters, you don't always have easily differentiable functions. You may especially have very high dimensional functions, which can be very difficult to differentiate.
Moreover, even if you do have a function you can differentiate, what you find are local optima, not global optima, and you may end up finding a great many of them-- potentially infinite numbers of them-- with no clear way to decide which are better than others.
While you may know enough about some particular function to be able to optimize with calculus, there is no method guaranteed to go you the global optimum for any possible function. Hence we rely on a number of probabilistic techniques and heuristics, of which genetic algorithms are only one.

Related

which encoding to use for genetic algorithm?

I want to code a genetic algorithm in C for optimizing a function of 10 variables (x1 to x10). However I am not able to figure out which encoding I should use. I have mostly seen binary encoding being used in example but the variables in my case can take real values. Also, is value encoding a good option for these types of problems?
For real valued problems I would suggest to try CMA-ES or another ES variant. CMA-ES certainly is the current state of the art for real-valued problems. It is designed to find good solutions in multidimensional problems quickly. There are implementations available on Hansen's page. There's also a C# implementation in the work for HeuristicLab. Evolution strategies are algorithms that were specifically designed for real-valued optimization problems. They are very similar to genetic algorithms (both were invented around the same time, but in different places). The main distinction is that for ES the main driver is mutation and it features a clever adaption of the mutation strength. Without this adaption the (local) optimum cannot be located in time. CMA-ES is easy to configure, all it needs is the initial standard deviation and optionally the population size (otherwise there's a formula that estimates this given the problem size).
Genetic algorithms can of course also be applied, but you have to use some specific operators which are able to mutate variables only with very small degree. For example there's the Breeder Genetic Algorithm from Mühlenbein. In general however genetic algorithms are more suited for problems that need a right combination of things. E.g. which items to include in a knapsack problem or which functions and terminals to combine to a formula (genetic programming). Less for problems, where you need to find the right value for something. Although of course there are variants of the genetic algorithm to solve these, look for Real coded Genetic Algorithm (RCGA or RGA).
Another algorithm suited for real-valued problems is Particle Swarm Optimization, but in my opinion it is harder to configure. I'd start with SPSO-2011 the 2011 standard PSO.
If your problem contains integer variables choices become more difficult. Evolution strategies do not perform so well when variables are discrete, because the adaptation schemes for integer variables are different. A genetic algorithm becomes an interesting first-choice algorithm again.
A genetic algorithm is best used when two answers that are pretty close to optimal will make something else pretty close to optimal when combined. The problem with a pure binary encoding is that if you don't check your crossover you end up getting two answers which may not have all that much to do with the original answers.
That said, this is only really an issue if your number of variables is very small and the amount of data in your variables is large. As far as picking an encoding, it's more of an art than a science and it depends on your problem. I would suggest going with an encoding that fits the amount of precision you want. With 10 variables you won't got that far wrong however you encode it, an 8-bit ASCII encoder would probably work fine.
Hope that helps.

What's differential evolution and how does it compare to a genetic algorithm?

From what I've read so far they seem very similar.
Differential evolution uses floating point numbers instead, and the solutions are called vectors? I'm not quite sure what that means.
If someone could provide an overview with a little bit about the advantages and disadvantages of both.
Well, both genetic algorithms and differential evolution are examples of evolutionary computation.
Genetic algorithms keep pretty closely to the metaphor of genetic reproduction. Even the language is mostly the same-- both talk of chromosomes, both talk of genes, the genes are distinct alphabets, both talk of crossover, and the crossover is fairly close to a low-level understanding of genetic reproduction, etc.
Differential evolution is in the same style, but the correspondences are not as exact. The first big change is that DE is using actual real numbers (in the strict mathematical sense-- they're implemented as floats, or doubles, or whatever, but in theory they're ranging over the field of reals.) As a result, the ideas of mutation and crossover are substantially different. The mutation operator is modified so far that it's hard for me to even see why it's called mutation, as such, except that it serves the same purpose of breaking things out of local minima.
On the plus side, there are a handful of results showing DEs are often more effective and/or more efficient than genetic algorithms. And when working in numerical optimization, it's nice to be able to represent things as actual real numbers instead of having to work your way around to a chromosomal kind of representation, first. (Note: I've read about them, but I've not messed extensively with them so I can't really comment from first hand knowledge.)
On the negative side, I don't think there's been any proof of convergence for DEs, yet.
Differential evolution is actually a specific subset of the broader space of genetic algorithms, with the following restrictions:
The genotype is some form of real-valued vector
The mutation / crossover operations make use of the difference between two or more vectors in the population to create a new vector (typically by adding some random proportion of the difference to one of the existing vectors, plus a small amount of random noise)
DE performs well for certain situations because the vectors can be considered to form a "cloud" that explores the high value areas of the solution solution space quite effectively. It's pretty closely related to particle swarm optimization in some senses.
It still has the usual GA problem of getting stuck in local minima however.

Kernel methods for large scale dataset

Kernel-based classifier usually requires O(n^3) training time because of the inner-product computation between two instances. To speed up the training, inner-product values can be pre-computed and stored in a two-dimensional array. However when the no. of instances is very large, say over 100,000, there will not be sufficient memory to do so.
So any better idea for this?
For modern implementations of support vector machines, the scaling of the training algorithm is dependent on lots of factors, such as the nature of the training data and kernel that you are using. The scaling factor of O(n^3) is an analytical result and isn't particularly useful in predicting how SVM training will scale in real-world situations. For example, empirical estimates of the training algorithm used by SVMLight put the scaling against training set size to be approximately O(n^2).
I would suggest you ask this question in the kernel machines forum. I think you're more likely to get a better answer than on Stack Overflow, which is more of a general-purpose programming site.
The Relevance Vector Machine has a sequential training mode in which you do not need to keep the entire kernel matrix in memory. You can basically calculate a column at a time, determine if it appears relevant, and throw it away otherwise. I have not had much luck with it myself, though, and the RVM has some other issues. There is most likely a better solution in the realm of Gaussian Processes. I haven't really sat down much with those, but I have seen mention of an online algorithm for it.
I am not a numerical analyst, but isn't the QR decomposition which you need to do ordinary least-squares linear regression also O(n^3)?
Anyways, you'll probably want to search the literature (since this is fairly new stuff) for online learning or active learning versions of the algorithm you're using. The general idea is to either discard data far from your decision boundary or to not include them in the first place. The danger is that you might get locked into a bad local maximum and then your online/active algorithm will ignore data that would help you get out.

Are evolutionary algorithms and neural networks used in the same domains? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 12 months ago.
Improve this question
I am trying to get a feel for the difference between the various classes of machine-learning algorithms.
I understand that the implementations of evolutionary algorithms are quite different from the implementations of neural networks.
However, they both seem to be geared at determining a correlation between inputs and outputs from a potentially noisy set of training/historical data.
From a qualitative perspective, are there problem domains that are better targets for neural networks as opposed to evolutionary algorithms?
I've skimmed some articles that suggest using them in a complementary fashion. Is there a decent example of a use case for that?
Here is the deal: in machine learning problems, you typically have two components:
a) The model (function class, etc)
b) Methods of fitting the model (optimizaiton algorithms)
Neural networks are a model: given a layout and a setting of weights, the neural net produces some output. There exist some canonical methods of fitting neural nets, such as backpropagation, contrastive divergence, etc. However, the big point of neural networks is that if someone gave you the 'right' weights, you'd do well on the problem.
Evolutionary algorithms address the second part -- fitting the model. Again, there are some canonical models that go with evolutionary algorithms: for example, evolutionary programming typically tries to optimize over all programs of a particular type. However, EAs are essentially a way of finding the right parameter values for a particular model. Usually, you write your model parameters in such a way that the crossover operation is a reasonable thing to do and turn the EA crank to get a reasonable setting of parameters out.
Now, you could, for example, use evolutionary algorithms to train a neural network and I'm sure it's been done. However, the critical bit that EA require to work is that the crossover operation must be a reasonable thing to do -- by taking part of the parameters from one reasonable setting and the rest from another reasonable setting, you'll often end up with an even better parameter setting. Most times EA is used, this is not the case and it ends up being something like simulated annealing, only more confusing and inefficient.
Problems that require "intuition" are better suited to ANNs, for example hand writing recognition. You train a neural network with a huge amount of input and rate it until you're done (this takes a long time), but afterwards you have a blackbox algorithm/system that can "guess" the hand writing, so you keep your little brain and use it as a module for many years or something. Because training a quality ANN for a complex problem can take months I'm worst case, and luck.
Most other evolutionary algorithms "calculate" an adhoc solution on the spot, in a sort of hill climbing pattern.
Also as pointed out in another answer, during runtime an ANN can "guess" faster than most other evolutionary algorithms can "calculate". However one must be careful, since the ANN is just "guessing" an it might be wrong.
Evolutionary, or more generically genetic algorithms, and neural networks can both be used for similar objectives, and other answers describe well the difference.
However, there is one specific case where evolutionary algorithms are more indicated than neural networks: when the solution space is non-differentiable.
Indeed, neural networks use gradient descent to learn from backpropagation (or similar algorithm). The calculation of a gradient relies on derivatives, which needs a continuous and derivative space, in other words that you can shift gradually and progressively from one solution to the next.
If your solution space is non-differentiable (ie, either you can choose solution A, or B, or C, but nothing in the middle like 0.5% A + 0.5% B, so that some solutions are impossible), then you are trying to fit a non-differentiable function, and then neural networks cannot work.
(Side note: discrete state space partially share the same issue and so are a common issue for most algorithms but there are usually some work done to workaround these issues, for example decision trees can work easily on categorical variables, while other models like svm have more difficulties and generally require encoding categorical variables into continuous values).
In this case, evolutionary and genetic algorithms are perfect, one could even say a god send, since they can "jump" from one solution to the next without any issue. They don't care that some solutions are impossible, nor that the gaps are big or small between subset of the possible state space, evolutionary algorithms can jump randomly far away or close by until they find appropriate solutions.
Also worth mentioning is that evolutionary algorithms are not subject to the curse of dimensionality as much as any other machine learning algorithm, including neural networks. This might seem a bit counter intuitive, since the convergence to a global maximum is not guaranteed, and the procedure might seem to be slow to evolve to a good solution, but in practice the selection procedure works fast and converges to a good local maximum.
This makes evolutionary algorithms a very versatile and generic tool to approach naively any problem, and one of the very few tools to deal with either non-differentiable functions, discrete functions, or with astronomically high dimensional datasets.
Look at Neuro Evolution. (NE)
The current best methods is NEAT and HyperNEAT by Kenneth Stanley.
Genetic Algorithms only find a genome of some sort; It's great to create the genome of a neural network, because you get the reactive nature of the neural network, rather than just a bunch of static genes.
There's not many limits to what it can learn. But it takes time of course. Neural topology have to be evolved through the usual mutation and crossover, as well as weights updated. There can be no back propagation.
Also you can train it with a fitness function, which is thus superior to back propagation when you do not know what the output should be. Perfect for learning complex behaviour for systems that you do not know any optimal strategies for. Only problem is that it'll learn behaviour you didn't anticipate. Often that behaviour can be very alien, although it does exactly what you rewarded it for in the fitness function. Thus you'll be using as much time deriving fitness functions as you would have creating output sets for backpropagation :P
Evolutionary algorithms (EAs) are slow because they rely on unsupervised learning: EAs are told that some solutions are better than others, but not how to improve them. Neural networks are generally faster, being an instance of supervised learning: they know how to make a solution better by using gradient descent within a function space over certain parameters; this allows them to reach a valid solution faster. Neural networks are often used when there isn't enough knowledge about the problem for other methods to work.
In terms of problem domains, I compare artificial neural networks trained by backpropagation to an evolutionary algorithm.
An evolutionary algorithm deploys a randomized beamsearch, that means your evolutionary operators develop candidates to be tested and compared by their fitness. Those operators are usually non deterministic and you can design them so they can both find candidates in close proximity and candidates that are further away in the parameter space to overcome the problem of getting stuck in local optima.
However the success of a EA approach greatly depends on the model you develop, which is a tradeoff between high expression potential (you might overfit) and generality (the model might not be able to express the target function).
Because neural networks usually are multilayered the parameter space is not convex and contains local optima, the gradient descent algorithms might get stuck in. The gradient descent is a deterministic algorithm, that searches through close proximity. That's why neural networks usually are randomly initialised and why you should train many more than one model.
Moreover you know each hidden node in a neural network defines a hyperplane you can design a neural network so it fits your problem well. There are some techniques to prevent neural networks from overfitting.
All in all, neural networks might be trained fast and get reasonable results with few efford (just try some parameters). In theory a neural network that is large enough is able to approximate every target function, which on the other side makes it prone to overfitting. Evolutionary algorithms require you to make a lot of design choices to get good results, the hardest probably being which model to optimise. But EA are able to search through very complex problem spaces (in a manner you define) and get good results quickly. AEs even can stay successful when the problem (the target function) is changing over time.
Tom Mitchell's Machine Learning Book:
http://www.cs.cmu.edu/~tom/mlbook.html
Evolutionary algorithms (EA) represent a manner of training a model, where as neuronal nets (NN) ARE a model. Most commonly throughout the literature, you will find that NNs are trained using the backpropagation algorithm. This method is very attractive to mathematicians BUT it requires that you can express the error rate of the model using a mathematical formula. This is the case for situations in which you know lots of input and output values for the function that you are trying to approximate. This problem can be modeled mathematically, as the minimization of a loss function, which can be achieved thanks to calculus (and that is why mathematicians love it).
But neuronal nets are also useful for modeling systems which try to maximize or minimize some outcome, the formula of which is very difficult to model mathematically. For instance, a neuronal net could control the muscles of a cyborg to achieve running. At each different time frame, the model would have to establish how much tension should be present in each muscle of the cyborg's body, based on the input from various sensors. It is impossible to provide such training data. EAs allow training by only providing a manner of evaluation of the model. For our example, we would punish falling and reward the traveled distance across a surface (in a fixed timeframe). EA would just select the models which do their best in this sense. First generations suck but, surprisingly, after a few hundred generations, such individuals achieve very "natural" movements and manage to run without falling off. Such models may also be capable of dealing with obstacles and external physical forces.

Did you apply computational complexity theory in real life?

I'm taking a course in computational complexity and have so far had an impression that it won't be of much help to a developer.
I might be wrong but if you have gone down this path before, could you please provide an example of how the complexity theory helped you in your work? Tons of thanks.
O(1): Plain code without loops. Just flows through. Lookups in a lookup table are O(1), too.
O(log(n)): efficiently optimized algorithms. Example: binary tree algorithms and binary search. Usually doesn't hurt. You're lucky if you have such an algorithm at hand.
O(n): a single loop over data. Hurts for very large n.
O(n*log(n)): an algorithm that does some sort of divide and conquer strategy. Hurts for large n. Typical example: merge sort
O(n*n): a nested loop of some sort. Hurts even with small n. Common with naive matrix calculations. You want to avoid this sort of algorithm if you can.
O(n^x for x>2): a wicked construction with multiple nested loops. Hurts for very small n.
O(x^n, n! and worse): freaky (and often recursive) algorithms you don't want to have in production code except in very controlled cases, for very small n and if there really is no better alternative. Computation time may explode with n=n+1.
Moving your algorithm down from a higher complexity class can make your algorithm fly. Think of Fourier transformation which has an O(n*n) algorithm that was unusable with 1960s hardware except in rare cases. Then Cooley and Tukey made some clever complexity reductions by re-using already calculated values. That led to the widespread introduction of FFT into signal processing. And in the end it's also why Steve Jobs made a fortune with the iPod.
Simple example: Naive C programmers write this sort of loop:
for (int cnt=0; cnt < strlen(s) ; cnt++) {
/* some code */
}
That's an O(n*n) algorithm because of the implementation of strlen(). Nesting loops leads to multiplication of complexities inside the big-O. O(n) inside O(n) gives O(n*n). O(n^3) inside O(n) gives O(n^4). In the example, precalculating the string length will immediately turn the loop into O(n). Joel has also written about this.
Yet the complexity class is not everything. You have to keep an eye on the size of n. Reworking an O(n*log(n)) algorithm to O(n) won't help if the number of (now linear) instructions grows massively due to the reworking. And if n is small anyway, optimizing won't give much bang, too.
While it is true that one can get really far in software development without the slightest understanding of algorithmic complexity. I find I use my knowledge of complexity all the time; though, at this point it is often without realizing it. The two things that learning about complexity gives you as a software developer are a way to compare non-similar algorithms that do the same thing (sorting algorithms are the classic example, but most people don't actually write their own sorts). The more useful thing that it gives you is a way to quickly describe an algorithm.
For example, consider SQL. SQL is used every day by a very large number of programmers. If you were to see the following query, your understanding of the query is very different if you've studied complexity.
SELECT User.*, COUNT(Order.*) OrderCount FROM User Join Order ON User.UserId = Order.UserId
If you have studied complexity, then you would understand if someone said it was O(n^2) for a certain DBMS. Without complexity theory, the person would have to explain about table scans and such. If we add an index to the Order table
CREATE INDEX ORDER_USERID ON Order(UserId)
Then the above query might be O(n log n), which would make a huge difference for a large DB, but for a small one, it is nothing at all.
One might argue that complexity theory is not needed to understand how databases work, and they would be correct, but complexity theory gives a language for thinking about and talking about algorithms working on data.
For most types of programming work the theory part and proofs may not be useful in themselves but what they're doing is try to give you the intuition of being able to immediately say "this algorithm is O(n^2) so we can't run it on these one million data points". Even in the most elementary processing of large amounts of data you'll be running into this.
Thinking quickly complexity theory has been important to me in business data processing, GIS, graphics programming and understanding algorithms in general. It's one of the most useful lessons you can get from CS studies compared to what you'd generally self-study otherwise.
Computers are not smart, they will do whatever you instruct them to do. Compilers can optimize code a bit for you, but they can't optimize algorithms. Human brain works differently and that is why you need to understand the Big O. Consider calculating Fibonacci numbers. We all know F(n) = F(n-1) + F(n-2), and starting with 1,1 you can easily calculate following numbers without much effort, in linear time. But if you tell computer to calculate it with that formula (recursively), it wouldn't be linear (at least, in imperative languages). Somehow, our brain optimized algorithm, but compiler can't do this. So, you have to work on the algorithm to make it better.
And then, you need training, to spot brain optimizations which look so obvious, to see when code might be ineffective, to know patterns for bad and good algorithms (in terms of computational complexity) and so on. Basically, those courses serve several things:
understand executional patterns and data structures and what effect they have on the time your program needs to finish;
train your mind to spot potential problems in algorithm, when it could be inefficient on large data sets. Or understand the results of profiling;
learn well-known ways to improve algorithms by reducing their computational complexity;
prepare yourself to pass an interview in the cool company :)
It's extremely important. If you don't understand how to estimate and figure out how long your algorithms will take to run, then you will end up writing some pretty slow code. I think about compuational complexity all the time when writing algorithms. It's something that should always be on your mind when programming.
This is especially true in many cases because while your app may work fine on your desktop computer with a small test data set, it's important to understand how quickly your app will respond once you go live with it, and there are hundreds of thousands of people using it.
Yes, I frequently use Big-O notation, or rather, I use the thought processes behind it, not the notation itself. Largely because so few developers in the organization(s) I frequent understand it. I don't mean to be disrespectful to those people, but in my experience, knowledge of this stuff is one of those things that "sorts the men from the boys".
I wonder if this is one of those questions that can only receive "yes" answers? It strikes me that the set of people that understand computational complexity is roughly equivalent to the set of people that think it's important. So, anyone that might answer no perhaps doesn't understand the question and therefore would skip on to the next question rather than pause to respond. Just a thought ;-)
There are points in time when you will face problems that require thinking about them. There are many real world problems that require manipulation of large set of data...
Examples are:
Maps application... like Google Maps - how would you process the road line data worldwide and draw them? and you need to draw them fast!
Logistics application... think traveling sales man on steroids
Data mining... all big enterprises requires one, how would you mine a database containing 100 tables and 10m+ rows and come up with a useful results before the trends get outdated?
Taking a course in computational complexity will help you in analyzing and choosing/creating algorithms that are efficient for such scenarios.
Believe me, something as simple as reducing a coefficient, say from T(3n) down to T(2n) can make a HUGE differences when the "n" is measured in days if not months.
There's lots of good advice here, and I'm sure most programmers have used their complexity knowledge once in a while.
However I should say understanding computational complexity is of extreme importance in the field of Games! Yes you heard it, that "useless" stuff is the kind of stuff game programming lives on.
I'd bet very few professionals probably care about the Big-O as much as game programmers.
I use complexity calculations regularly, largely because I work in the geospatial domain with very large datasets, e.g. processes involving millions and occasionally billions of cartesian coordinates. Once you start hitting multi-dimensional problems, complexity can be a real issue, as greedy algorithms that would be O(n) in one dimension suddenly hop to O(n^3) in three dimensions and it doesn't take much data to create a serious bottleneck. As I mentioned in a similar post, you also see big O notation becoming cumbersome when you start dealing with groups of complex objects of varying size. The order of complexity can also be very data dependent, with typical cases performing much better than general cases for well designed ad hoc algorithms.
It is also worth testing your algorithms under a profiler to see if what you have designed is what you have achieved. I find most bottlenecks are resolved much better with algorithm tweaking than improved processor speed for all the obvious reasons.
For more reading on general algorithms and their complexities I found Sedgewicks work both informative and accessible. For spatial algorithms, O'Rourkes book on computational geometry is excellent.
In your normal life, not near a computer you should apply concepts of complexity and parallel processing. This will allow you to be more efficient. Cache coherency. That sort of thing.
Yes, my knowledge of sorting algorithms came in handy one day when I had to sort a stack of student exams. I used merge sort (but not quicksort or heapsort). When programming, I just employ whatever sorting routine the library offers. ( haven't had to sort really large amount of data yet.)
I do use complexity theory in programming all the time, mostly in deciding which data structures to use, but also in when deciding whether or when to sort things, and for many other decisions.
'yes' and 'no'
yes) I frequently use big O-notation when developing and implementing algorithms.
E.g. when you should handle 10^3 items and complexity of the first algorithm is O(n log(n)) and of the second one O(n^3), you simply can say that first algorithm is almost real time while the second require considerable calculations.
Sometimes knowledges about NP complexities classes can be useful. It can help you to realize that you can stop thinking about inventing efficient algorithm when some NP-complete problem can be reduced to the problem you are thinking about.
no) What I have described above is a small part of the complexities theory. As a result it is difficult to say that I use it, I use minor-minor part of it.
I should admit that there are many software development project which don't touch algorithm development or usage of them in sophisticated way. In such cases complexity theory is useless. Ordinary users of algorithms frequently operate using words 'fast' and 'slow', 'x seconds' etc.
#Martin: Can you please elaborate on the thought processes behind it?
it might not be so explicit as sitting down and working out the Big-O notation for a solution, but it creates an awareness of the problem - and that steers you towards looking for a more efficient answer and away from problems in approaches you might take. e.g. O(n*n) versus something faster e.g. searching for words stored in a list versus stored in a trie (contrived example)
I find that it makes a difference with what data structures I'll choose to use, and how I'll work on large numbers of records.
A good example could be when your boss tells you to do some program and you can demonstrate by using the computational complexity theory that what your boss is asking you to do is not possible.

Resources