How does a non deterministic turing machine work? - theory

I understand they aren't real and they seem to branch computation whenever there are 2 options, instead of picking one. But, for example, if I say this:
"Non deterministically guess a bijection p of vertices from Graph G to Graph H" (context here is Graph Isomorphism)
What is that supposed to mean? I understand the bijection, but it says "non deterministically guess". If it's guessing, how is that an algorithmic approach? How can it guarantee it's going to work?

They don't, they just sort of illustrate a point. Basically what they do is guess an answer, and check if it's right(deterministically). It's not the guessing the answer part that's important though, it's checking that the answer is right. It's just like saying given an arbitrary solution, is it correct? So for example there are problems that take exponential time to compute, and some of their answers can be checked in polynomial time, but some can't. So what the non-deterministic TM does is it divides those two, the ones that can be checked quickly from the ones that can't. And then this brings up the bigger question, if one group of questions solutions can be verified much quicker than another, can their solutions also be generated quicker? This question hasn't been answered, yet.

There's different ways to picture one. One I find useful is the oracle model. Did you ever see the Far Side cartoon where a derivation on the blackboard has "Here a miracle occurs" as one of the intermediate steps? In this version of a NDTM, when you need to choose something, the oracle writes the correct version on the right part of the tape. (This is taken from Garey and Johnson, Computers and Intractability, their classic book on NP-complete problems.) You aren't allowed to assume you've got the right one, though, and there may not be a correct one.
Therefore, when you non-deterministically guess a bijection, you're getting the correct bijection for your purposes, provided one exists.
It isn't a good basis for an algorithm, since the complexity of implementing a non-deterministic Turing machine is basically exponential in the nondeterministic states, and the algorithmic equivalent of the nondeterministic guess is to try every possible bijection.
From a theoretical point of view, I'd translate it as "If there is a bijection such that....". From an algorithmic point of view, find another book, or another chapter of the same book, since that approach is useless for even moderately large graphs.

I believe what is meant is "non deterministically choose a solution" and then test that the solution is true. Since all possible choices (guesses) are tested, the solution is guaranteed.

A physical implementation of the non-deterministic Turing machine is the DNA computer. For example, here's an outline of how to solve the traveling salesman problem in DNA:
Get/make a bunch of DNA sequences, each with length proportional to the cost of an edge in your graph and sticky ends with sequences uniquely identifying one of the vertices that the edge connects.
Mix them together, with DNA ligase in a big beaker. They'll anneal to each other in sequences that represent every possible path through the graph (ok, not the really long ones).
Remove all the sequences that are missing at least one vertex. To do this, sequentially select for each vertex using hybridization. For example, if "ACGTACA" encodes vertex 1, select for sequences that bind to "TGTACGA". Then repeat this selection for every other vertex.
Sort the remaining sequences by size using gel electrophoresis. Then sequence the shortest one. The sequence encodes the shortest path through your graph.

Related

Is the HTM cortical learning algorithm defined by Numenta's paper restricted by Euclidean geometry?

Specifically, their most recent implementation.
http://www.numenta.com/htm-overview/htm-algorithms.php
Essentially, I'm asking whether non-euclidean relationships, or relationships in patterns that exceed the dimensionality of the inputs, can be effectively inferred by the algorithm in its present state?
HTM uses Euclidean geometry to determine "neighborship" when analyzing patterns. Consistently framed input causes the algorithm to exhibit predictive behavior, and sequence length is practically unlimited. This algorithm learns very well - but I'm wondering whether it has the capacity to infer nonlinear attributes from its input data.
For example, if you input the entire set of texts from Project Gutenberg, it's going to pick up on the set of probabilistic rules that comprise English spelling, grammar, and readily apparent features from the subject matter, such as gender associations with words, and so forth. These are first level "linear" relations, and can be easily defined with probabilities in a logical network.
A nonlinear relation would be an association of assumptions and implications, such as "Time flies like an arrow, fruit flies like a banana." If correctly framed, the ambiguity of the sentence causes a predictive interpretation of the sentence to generate many possible meanings.
If the algorithm is capable of "understanding" nonlinear relations, then it would be able to process the first phrase and correctly identify that "Time flies" is talking about time doing something, and "fruit flies" are a type of bug.
The answer to the question is probably a simple one to find, but I can't decide either way. Does mapping down the input into a uniform, 2d, Euclidean plane preclude the association of nonlinear attributes of the data?
If it doesn't prevent nonlinear associations, my assumption would then be that you could simply vary the resolution, repetition, and other input attributes to automate the discovery of nonlinear relations - in effect, adding a "think harder" process to the algorithm.
From what I understand of HTM's, the structure of layers and columns mimics the structure of the neocortex. See appendix B here: http://www.numenta.com/htm-overview/education/HTM_CorticalLearningAlgorithms.pdf
So the short answer would be that since the brain can understand non-linear phenomenon with this structure, so can an HTM.
Initial, instantaneous sensory input is indeed mapped to 2D regions within an HTM. This does not limit HTM's to dealing with 2D representations any more than a one dimensional string of bits is limited to representing only one dimensional things. It's just a way of encoding stuff so that sparse distributed representations can be formed and their efficiencies can be taken advantage of.
To answer your question about Project Gutenberg, I don't think an HTM will really understand language without first understanding the physical world on which language is based and creates symbols for. That said, this is a very interesting sequence for an HTM, since predictions are only made in one direction, and in a way the understanding of what's happening to the fruit goes backwards. i.e. I see the pattern 'flies like a' and assume the phrase applies to the fruit the same way it did to time. HTM's do group subsequent input (words in this case) together at higher levels, so if you used Fuzzy Grouping (perhaps) as Davide Maltoni has shown to be effective, the two halves of the sentence could be grouped together into the same high level representation and feedback could be sent down linking the two specific sentences. Numenta, to my knowledge has not done too much with feedback messages yet, but it's definitely part of the theory.
The software which runs the HTM is called NuPIC (Numenta Platform for Intelligent Computing). A NuPIC region (representing a region of neocortex) can be configured to either use topology or not, depending on the type of data it's receiving.
If you use topology, the usual setup maps each column to a set of inputs which is centred on the corresponding position in the input space (the connections will be selected randomly according to a probability distribution which favours the centre). The spatial pattern recognising component of NuPIC, known as the Spatial Pooler (SP), will then learn to recognise and represent localised topological features in the data.
There is absolutely no restriction on the "linearity" of the input data which NuPIC can learn. NuPIC can learn sequences of spatial patterns in extremely high-dimensional spaces, and is limited only by the presence (or lack of) spatial and temporal structure in the data.
To answer the specific part of your question, yes, NuPIC can learn non-Euclidean and non-linear relationships, because NuPIC is not, and cannot be modelled by, a linear system. On the other hand, it seems logically impossible to infer relationships of a dimensionality which exceeds that of the data.
The best place to find out about HTM and NuPIC, its Open Source implementation, is at NuPIC's community website (and mailing list).
Yes, It can do non-linear. Basically it is multilayer. And all multilayer neural networks can infer non linear relationships. And I think the neighborship is calculated locally. If it is calcualted locally then globally it can be piece wise non linear for example look at Local Linear Embedding.
Yes HTM uses euclidean geometry to connect synapses, but this is only because it is mimicking a biological system that sends out dendrites and creates connections to other nearby cells that have strong activation at that point in time.
The Cortical Learning Algorithm (CLA) is very good at predicting sequences, so it would be good at determining "Time flies like an arrow, fruit flies like a" and predict "banana" if it has encountered this sequence before or something close to it. I don't think it could infer that a fruit fly is a type of insect unless you trained it on that sequence. Thus the T for Temporal. HTMs are sequence association compressors and retrievers (a form of memory). To get the pattern out of the HTM you play in a sequence and it will match the strongest representation it has encountered to date and predict the next bits of the sequence. It seems to be very good at this and the main application for HTMs right now are predicting sequences and anomalies out of streams of data.
To get more complex representations and more abstraction you would cascade a trained HTMs outputs to another HTMs inputs along with some other new sequence based input to correlate to. I suppose you could wire in some feedback and do some other tricks to combine multiple HTMs, but you would need lots of training on primitives first, just like a baby does, before you will ever get something as sophisticated as associating concepts based on syntax of the written word.
ok guys, dont get silly, htms just copy data into them, if you want a concept, its going to be a group of the data, and then you can have motor depend on the relation, and then it all works.
our cortex, is probably way better, and actually generates new images, but a computer cortex WONT, but as it happens, it doesnt matter, and its very very useful already.
but drawing concepts from a data pool, is tricky, the easiest way to do it is by recording an invarient combination of its senses, and when it comes up, associate everything else to it, this will give you organism or animal like intelligence.
drawing harder relations, is what humans do, and its ad hoc logic, imagine a set explaining the most ad hoc relation, and then it slowly gets more and more specific, until it gets to exact motor programs... and all knowledge you have is controlling your motor, and making relations that trigger pathways in the cortex, and tell it where to go, from the blast search that checks all motor, and finds the most successful trigger.
woah that was a mouthful, but watch out dummies, you wont get no concepts from a predictive assimilator, which is what htm is, unless you work out how people draw relations in the data pool, like a machine, and if you do that, its like a program thats programming itself.
no shit.

How do I pick a good representation for a board game tactic for a genetic algorithm?

For my bachelor's thesis I want to write a genetic algorithm that learns to play the game of Stratego (if you don't know this game, it's probably safe to assume I said chess). I haven't ever before done actual AI projects, so it's an eye-opener to see how little I actually know of implementing things.
The thing I'm stuck with is coming up with a good representation for an actual strategy. I'm probably making some thinking error, but some problems I encounter:
I don't assume you would have a representation containing a lot of
transitions between board positions, since that would just be
bruteforcing it, right?
What could branches of a decision tree look
like? Any representation I come up with don't have interchangeable
branches... If I were to use a bit string, which is apparently also
common, what would the bits represent?
Do I assign scores to the distance between certain pieces? How would I represent that?
I think I ought to know these things after three+ years of study, so I feel pretty stupid - this must look likeI have no clue at all. Still, any help or tips on what to Google would be appreciated!
I think, you could define a decision model and then try to optimize the parameters of that model. You can create multi-stage decision models also. I once did something similar for solving a dynamic dial-a-ride problem (paper here) by modeling it as a two stage linear decision problem. To give you an example, you could:
For each of your figures decide which one is to move next. Each figure is characterized by certain features derived from its position on the board, e.g. ability to make a score, danger, protecting x other figures, and so on. Each of these features can be combined (e.g. in a linear model, through a neural network, through a symbolic expression tree, a decision tree, ...) and give you a rank on which figure to act next with.
Acting with the figure you selected. Again there are a certain number of actions that can be taken, each has certain features. Again you can combine and rank them and one action will have the highest priority. This is the one you choose to perform.
The features you extract can be very simple or insanely complex, it's up to what you think will work best vs what takes how long to compute.
To evaluate and improve the quality of your decision model you can then simulate these decisions in several games against opponents and train the parameters of the model that combines these features to rank the moves (e.g. using a GA). This way you tune the model to win as many games as possible against the specified opponents. You can test the generality of that model by playing against opponents it has not seen before.
As Mathew Hall just said, you can use GP for this (if your model is a complex rule), but this is just one kind of model. In my case a linear combination of the weights did very well.
Btw, if you're interested we've also got a software on heuristic optimization which provides you with GA, GP and that stuff. It's called HeuristicLab. It's GPL and open source, but comes with a GUI (Windows). We've some Howto on how to evaluate the fitness function in an external program (data exchange using protocol buffers), so you can work on your simulation and your decision model and let the algorithms present in HeuristicLab optimize your parameters.
Vincent,
First, don't feel stupid. You've been (I infer) studying basic computer science for three years; now you're applying those basic techniques to something pretty specialized-- a particular application (Stratego) in a narrow field (artificial intelligence.)
Second, make sure your advisor fully understands the rules of Stratego. Stratego is played on a larger board, with more pieces (and more types of pieces) than chess. This gives it a vastly larger space of legal positions, and a vastly larger space of legal moves. It is also a game of hidden information, increasing the difficulty yet again. Your advisor may want to limit the scope of the project, e.g., concentrate on a variant with full observation. I don't know why you think this is simpler, except that the moves of the pieces are a little simpler.
Third, I think the right thing to do at first is to take a look at how games in general are handled in the field of AI. Russell and Norvig, chapters 3 (for general background) and 5 (for two player games) are pretty accessible and well-written. You'll see two basic ideas: One, that you're basically performing a huge search in a tree looking for a win, and two, that for any non-trivial game, the trees are too large, so you search to a certain depth and then cop out with a "board evaluation function" and look for one of those. I think your third bullet point is in this vein.
The board evaluation function is the magic, and probably a good candidate for using either a genetic algorithm, or a genetic program, either of which might be used in conjunction with a neural network. The basic idea is that you are trying to design (or evolve, actually) a function that takes as input a board position, and outputs a single number. Large numbers correspond to strong positions, and small numbers to weak positions. There is a famous paper by Chellapilla and Fogel showing how to do this for a game of Checkers:
http://library.natural-selection.com/Library/1999/Evolving_NN_Checkers.pdf
I think that's a great paper, tying three great strands of AI together: Adversarial search, genetic algorithms, and neural networks. It should give you some inspiration about how to represent your board, how to think about board evaluations, etc.
Be warned, though, that what you're trying to do is substantially more complex than Chellapilla and Fogel's work. That's okay-- it's 13 years later, after all, and you'll be at this for a while. You're still going to have a problem representing the board, because the AI player has imperfect knowledge of its opponent's state; initially, nothing is known but positions, but eventually as pieces are eliminated in conflict, one can start using First Order Logic or related techniques to start narrowing down individual pieces, and possibly even probabilistic methods to infer information about the whole set. (Some of these may be beyond the scope of an undergrad project.)
The fact you are having problems coming up with a representation for an actual strategy is not that surprising. In fact I would argue that it is the most challenging part of what you are attempting. Unfortunately, I haven't heard of Stratego so being a bit lazy I am going to assume you said chess.
The trouble is that a chess strategy is rather a complex thing. You suggest in your answer containing lots of transitions between board positions in the GA, but a chess board has more possible positions than the number of atoms in the universe this is clearly not going to work very well. What you will likely need to do is encode in the GA a series of weights/parameters that are attached to something that takes in the board position and fires out a move, I believe this is what you are hinting at in your second suggestion.
Probably the simplest suggestion would be to use some sort of generic function approximation like a neural network; Perceptrons or Radial Basis Functions are two possibilities. You can encode weights for the various nodes into the GA, although there are other fairly sound ways to train a neural network, see Backpropagation. You could perhaps encode the network structure instead/as well, this also has the advantage that I am pretty sure a fair amount of research has been done into developing neural networks with a genetic algorithm so you wouldn't be starting completely from scratch.
You still need to come up with how you are going to present the board to the neural network and interpret the result from it. Especially, with chess you would have to take note that a lot of moves will be illegal. It would be very beneficial if you could encode the board and interpret the result such that only legal moves are presented. I would suggest implementing the mechanics of the system and then playing around with different board representations to see what gives good results. A few ideas top of the head ideas to get you started could be, although I am not really convinced any of them are especially great ways to do this:
A bit string with all 64 squares one after another with a number presenting what is present in each square. Most obvious, but probably a rather bad representation as a lot of work will be required to filter out illegal moves.
A bit string with all 64 squares one after another with a number presenting what can move to each square. This has the advantage of embodying the covering concept of chess where you what to gain as much coverage of the board with your pieces as possible, but still has problems with illegal moves and dealing with friendly/enemy pieces.
A bit string with all 32 pieces one after another with a number presenting the location of that piece in each square.
In general though I would suggest that chess is rather a complex game to start with, I think it will be rather hard to get something playing to standard which is noticeably better than random. I don't know if Stratego is any simpler, but I would strongly suggest you opt for a fairly simple game. This will let you focus on getting the mechanics of the implementation correct and the representation of the game state.
Anyway hope that is of some help to you.
EDIT: As a quick addition it is worth looking into how standard chess AI's work, I believe most use some sort of Minimax system.
When you say "tactic", do you mean you want the GA to give you a general algorithm to play the game (i.e. evolve an AI) or do you want the game to use a GA to search the space of possible moves to generate a move at each turn?
If you want to do the former, then look into using Genetic programming (GP). You could try to use it to produce the best AI you can for a fixed tree size. JGAP already comes with support for GP as well. See the JGAP Robocode example for an instance of this. This approach does mean you need a domain specific language for a Stratego AI, so you'll need to think carefully how you expose the board and pieces to it.
Using GP means your fitness function can just be how well the AI does at a fixed number of pre-programmed games, but that requires a good AI player to start with (or a very patient human).
#DonAndre's answer is absolutely correct for movement. In general, problems involving state-based decisions are hard to model with GAs, requiring some form of GP (either explicit or, as #DonAndre suggested, trees that are essentially declarative programs).
A general Stratego player seems to me quite challenging, but if you have a reasonable Stratego playing program, "Setting up your Stratego board" would be an excellent GA problem. The initial positions of your pieces would be the phenotype and the outcome of the external Stratego-playing code would be the fitness. It is intuitively likely that random setups would be disadvantaged versus setups that have a few "good ideas" and that small "good ideas" could be combined into fitter-and-fitter setups.
...
On the general problem of what a decision tree, even trying to come up with a simple example, I kept finding it hard to come up with a small enough example, but maybe in the case where you are evaluation whether to attack a same-ranked piece (which, IIRC destroys both you and the other piece?):
double locationNeed = aVeryComplexDecisionTree();
if(thatRank == thisRank){
double sacrificeWillingness = SACRIFICE_GENETIC_BASE; //Assume range 0.0 - 1.0
double sacrificeNeed = anotherComplexTree(); //0.0 - 1.0
double sacrificeInContext = sacrificeNeed * SACRIFICE_NEED_GENETIC_DISCOUNT; //0.0 - 1.0
if(sacrificeInContext > sacrificeNeed){
...OK, this piece is "willing" to sacrifice itself
One way or the other, the basic idea is that you'd still have a lot of coding of Stratego-play, you'd just be seeking places where you could insert parameters that would change the outcome. Here I had the idea of a "base" disposition to sacrifice itself (presumably higher in common pieces) and a "discount" genetically-determined parameter that would weight whether the piece would "accept or reject" the need for a sacrifice.

Algorithm to find related words in a text

I would like to have a word (e.g. "Apple) and process a text (or maybe more). I'd like to come up with related terms. For example: process a document for Apple and find that iPod, iPhone, Mac are terms related to "Apple".
Any idea on how to solve this?
As a starting point: your question relates to text mining.
There are two ways: a statistical approach, and one form natural language processing (nlp).
I do not know much about nlp, but can say something about the statistical approach:
You need some vector space representation of your documents, see
http://en.wikipedia.org/wiki/Vector_space_model
http://en.wikipedia.org/wiki/Document-term_matrix
http://en.wikipedia.org/wiki/Tf%E2%80%93idf
In order to learn semantics, that is: different words mean the same, or one word can have different meanings, you need a large text corpus for learning. As I said this is a statistical approach, so you need lots of samples.
http://www.daviddlewis.com/resources/testcollections/
Maybe you have lots of documents from the context you are going to use. That is the best situation.
You have to retrieve latent factors from this corpus. Most common are:
LSA (http://en.wikipedia.org/wiki/Latent_semantic_analysis)
PLSA (http://en.wikipedia.org/wiki/Probabilistic_latent_semantic_analysis)
nonnegative matrix factorization (http://en.wikipedia.org/wiki/Non-negative_matrix_factorization)
latent dirichlet allocation (http://en.wikipedia.org/wiki/Latent_Dirichlet_allocation)
These methods involve lots of math. Either you dig it, or you have to find good libraries.
I can recommend the following books:
http://www.oreilly.de/catalog/9780596529321/toc.html
http://www.oreilly.de/catalog/9780596516499/index.html
Like all of AI, it's a very difficult problem. You should look into natural language processing to learn about some of the issues.
One very, very simplistic approach can be to build a 2d-table of words, with for each pair of words the average distance (in words) that they appear in the text. Obviously you'll need to limit the maximum distance considered, and possibly the number of words as well. Then, after processing a lot of text you'll have an indicator of how often certain words appear in the same context.
What I would do is get all the words in a text and make a frequency list (how often each word appears). Maybe also add to it a heuristic factor on how far the word is from "Apple". Then read multiple documents, and cross out words that are not common in all the documents. Then prioritize based on the frequency and distance from the keyword. Of course, you will get a lot of garbage and possibly miss some relevant words, but by adjusting the heuristics you should get at least some decent matches.
The technique that you are looking for is called Latent Semantic Analysis (LSA). It is also sometimes called Latent Semantic Indexing. The technique operates on the idea that related concepts occur together in text. It uses statistics to build the word relationships. Given a large enough corpus of documents it will definitely solve your problem of finding related words.
Take a look at vector space models.

What are the differences between simulated annealing and genetic algorithms?

What are the relevant differences, in terms of performance and use cases, between simulated annealing (with bean search) and genetic algorithms?
I know that SA can be thought as GA where the population size is only one, but I don't know the key difference between the two.
Also, I am trying to think of a situation where SA will outperform GA or GA will outperform SA. Just one simple example which will help me understand will be enough.
Well strictly speaking, these two things--simulated annealing (SA) and genetic algorithms are neither algorithms nor is their purpose 'data mining'.
Both are meta-heuristics--a couple of levels above 'algorithm' on the abstraction scale. In other words, both terms refer to high-level metaphors--one borrowed from metallurgy and the other from evolutionary biology. In the meta-heuristic taxonomy, SA is a single-state method and GA is a population method (in a sub-class along with PSO, ACO, et al, usually referred to as biologically-inspired meta-heuristics).
These two meta-heuristics are used to solve optimization problems, particularly (though not exclusively) in combinatorial optimization (aka constraint-satisfaction programming). Combinatorial optimization refers to optimization by selecting from among a set of discrete items--in other words, there is no continuous function to minimize. The knapsack problem, traveling salesman problem, cutting stock problem--are all combinatorial optimization problems.
The connection to data mining is that the core of many (most?) supervised Machine Learning (ML) algorithms is the solution of an optimization problem--(Multi-Layer Perceptron and Support Vector Machines for instance).
Any solution technique to solve cap problems, regardless of the algorithm, will consist essentially of these steps (which are typically coded as a single block within a recursive loop):
encode the domain-specific details
in a cost function (it's the
step-wise minimization of the value
returned from this function that
constitutes a 'solution' to the c/o
problem);
evaluate the cost function passing
in an initial 'guess' (to begin
iteration);
based on the value returned from the
cost function, generate a subsequent
candidate solution (or more than
one, depending on the
meta-heuristic) to the cost
function;
evaluate each candidate solution by
passing it in an argument set, to
the cost function;
repeat steps (iii) and (iv) until
either some convergence criterion is
satisfied or a maximum number of
iterations is reached.
Meta-heuristics are directed to step (iii) above; hence, SA and GA differ in how they generate candidate solutions for evaluation by the cost function. In other words, that's the place to look to understand how these two meta-heuristics differ.
Informally, the essence of an algorithm directed to solution of combinatorial optimization is how it handles a candidate solution whose value returned from the cost function is worse than the current best candidate solution (the one that returns the lowest value from the cost function). The simplest way for an optimization algorithm to handle such a candidate solution is to reject it outright--that's what the hill climbing algorithm does. But by doing this, simple hill climbing will always miss a better solution separated from the current solution by a hill. Put another way, a sophisticated optimization algorithm has to include a technique for (temporarily) accepting a candidate solution worse than (i.e., uphill from) the current best solution because an even better solution than the current one might lie along a path through that worse solution.
So how do SA and GA generate candidate solutions?
The essence of SA is usually expressed in terms of the probability that a higher-cost candidate solution will be accepted (the entire expression inside the double parenthesis is an exponent:
p = e((-highCost - lowCost)/temperature)
Or in python:
p = pow(math.e, (-hiCost - loCost) / T)
The 'temperature' term is a variable whose value decays during progress of the optimization--and therefore, the probability that SA will accept a worse solution decreases as iteration number increases.
Put another way, when the algorithm begins iterating, T is very large, which as you can see, causes the algorithm to move to every newly created candidate solution, whether better or worse than the current best solution--i.e., it is doing a random walk in the solution space. As iteration number increases (i.e., as the temperature cools) the algorithm's search of the solution space becomes less permissive, until at T = 0, the behavior is identical to a simple hill-climbing algorithm (i.e., only solutions better than the current best solution are accepted).
Genetic Algorithms are very different. For one thing--and this is a big thing--it generates not a single candidate solution but an entire 'population of them'. It works like this: GA calls the cost function on each member (candidate solution) of the population. It then ranks them, from best to worse, ordered by the value returned from the cost function ('best' has the lowest value). From these ranked values (and their corresponding candidate solutions) the next population is created. New members of the population are created in essentially one of three ways. The first is usually referred to as 'elitism' and in practice usually refers to just taking the highest ranked candidate solutions and passing them straight through--unmodified--to the next generation. The other two ways that new members of the population are usually referred to as 'mutation' and 'crossover'. Mutation usually involves a change in one element in a candidate solution vector from the current population to create a solution vector in the new population, e.g., [4, 5, 1, 0, 2] => [4, 5, 2, 0, 2]. The result of the crossover operation is like what would happen if vectors could have sex--i.e., a new child vector whose elements are comprised of some from each of two parents.
So those are the algorithmic differences between GA and SA. What about the differences in performance?
In practice: (my observations are limited to combinatorial optimization problems) GA nearly always beats SA (returns a lower 'best' return value from the cost function--ie, a value close to the solution space's global minimum), but at a higher computation cost. As far as i am aware, the textbooks and technical publications recite the same conclusion on resolution.
but here's the thing: GA is inherently parallelizable; what's more, it's trivial to do so because the individual "search agents" comprising each population do not need to exchange messages--ie, they work independently of each other. Obviously that means GA computation can be distributed, which means in practice, you can get much better results (closer to the global minimum) and better performance (execution speed).
In what circumstances might SA outperform GA? The general scenario i think would be those optimization problems having a small solution space so that the result from SA and GA are practically the same, yet the execution context (e.g., hundreds of similar problems run in batch mode) favors the faster algorithm (which should always be SA).
It is really difficult to compare the two since they were inspired from different domains..
A Genetic Algorithm maintains a population of possible solutions, and at each step, selects pairs of possible solution, combines them (crossover), and applies some random changes (mutation). The algorithm is based the idea of "survival of the fittest" where the selection process is done according to a fitness criteria (usually in optimization problems it is simply the value of the objective function evaluated using the current solution). The crossover is done in hope that two good solutions, when combined, might give even better solution.
On the other hand, Simulated Annealing only tracks one solution in the space of possible solutions, and at each iteration considers whether to move to a neighboring solution or stay in the current one according to some probabilities (which decays over time). This is different from a heuristic search (say greedy search) in that it doesn't suffer from the problems of local optimum since it can get unstuck from cases where all neighboring solutions are worst the current one.
I'm far from an expert on these algorithms, but I'll try and help out.
I think the biggest difference between the two is the idea of crossover in GA and so any example of a learning task that is better suited to GA than SA is going to hinge on what crossover means in that situation and how it is implemented.
The idea of crossover is that you can meaningfully combine two solutions to produce a better one. I think this only makes sense if the solutions to a problem are structured in some way. I could imagine, for example, in multi-class classification taking two (or many) classifiers that are good at classifying a particular class and combining them by voting to make a much better classifier. Another example might be Genetic Programming, where the solution can be expressed as a tree, but I find it hard to come up with a good example where you could combine two programs to create a better one.
I think it's difficult to come up with a compelling case for one over the other because they really are quite similar algorithms, perhaps having been developed from very different starting points.

Artificial Intelligence - Intelligence Agent that cleans and paints

I remember when I was in college we went over some problem where there was a smart agent that was on a grid of squares and it had to clean the squares. It was awarded points for cleaning. It also was deducted points for moving. It had to refuel every now and then and at the end it got a final score based on how many squares on the grid were dirty or clean.
I'm trying to study that problem since it was very interesting when I saw it in college, however I cannot find anything on wikipedia or anywhere online. Is there a specific name for that problem that you know about? Or maybe it was just something my teacher came up with for the class.
I'm searching for AI cleaning agent and similar things, but I don't find anything. I don't know, I'm thinking maybe it has some other name.
If you know where I can find more information about this problem I would appreciate it. Thanks.
Perhaps a "stigmergy" approach is closely related to your problem. There is a starting point here, and you can find something by searching for "dead ants" and "robots" on google scholar.
Basically: instead of modelling a precise strategy you work toward a probabilistic approach. Ants (probably) collect their deads by piling up according to a simple rule such as "if there is a pile of dead ants there, I bring this corpse hither; otherwise, I'll make a new pile". You can start by simplifying your 'cleaning' situation with that, and see where you go.
Also, I think (another?) suitable approach could be modelled with a Genetic Algorithm using a carefully chosen combination of fitness functions such as:
the end number of 'clean' tiles
the number of steps made by the robot
of course if the robots 'dies' out of starvation it automatically removes itself from the gene pool, a-la darwin awards :)
You could start by modelling a very, very simple genotype that will be 'computed' into a behaviour. Consider using a simple GA such as this one by Inman Harvey, then to each gene assign either a part of the strategy, or a complete behaviour. E.g.: if gene A is turned to 1 then the robot will try to wander randomly; if gene B is also turned to 1, then it will give priority to self-charging unless there are dirty tiles at distance X. Or use floats and model probability. Your mileage may vary but I can assure it will be fun :)
The problem is reminiscent of Shakey, although there's cleaning involved (which is like the Roomba -- a device that can also be programmed to perform these very tasks).
If the "problem space" (or room) is small enough, you can solve for an optimal solution using a simple A*-based search, but likely it won't be, since that won't leave for very interesting problems.
The machine learning approach suggested here using genetic algorithms is an interesting approach. Given the problem domain you would only have one "rule" (a move-to action, since clean could be eliminated by implicitly cleaning any square you move to that is dirty) so your learner would essentially be learning how to move around an environment. The problem there would be to build a learner that would be adaptable to any given floor plan, instead of just becoming proficient at cleaning a very specific space.
Whatever approach you have, I'd also consider doing a further meta-reasoning step if the problem sets are big enough, and use a partition approach to divide the floor up into separate areas and then conquering them one at a time.
Can you use techniques to create data to use "offline"? In that case, I'd even consider creating a "database" of optimal routes to take to clean certain floor spaces (1x1 up to, say, 5x5) that include all possible start and end squares. This is similar to "endgame databases" that game AIs use to effectively "solve" games once they reach a certain depth (c.f. Chinook).
This problem reminds me of this. A similar problem is briefly mentioned in the book Complexity as an example of a genetic algorithm. These versions are simplified though, they don't take into account fuel consumption.

Resources