How does the Blue Brain Project (and NEURON software) work?

How does the Blue Brain Project (and NEURON software) work? - c

This question is related to 873448.
From Wikipedia:
The Blue Brain Project is an attempt to create a synthetic brain by reverse-engineering the mammalian brain down to the molecular level. [...] Using a Blue Gene supercomputer running Michael Hines's NEURON software, the simulation does not consist simply of an artificial neural network, but involves a biologically realistic model of neurons.
"If we build it correctly it should speak and have an intelligence and behave very much as a human does."
My question is how the software works internally. If it "involves a biologically realistic model of neurons", how is that different from a neural network, and why can't neural networks simulate a biological brain well while this project would be able to? And, how is NEURON software used in the simulation?
Lastly, I apologize if this question doesn't belong here (maybe the BioStar StackExchance would be a better place to ask).

NEURON software models neuronal cells by modeling fluxes of ions inside and outside the cell through different ion channels. These movement generate a difference of electrical potential between the interior and the exterior of the neuronal membrane, and modulations of this potential allows different neurons to communicate between each other. Several biophysical models for neurons exist, such as the integrate-and-fire model or the Hodgkin-Huxley model
Artificial neural networks have pretty much nothing to do with biological neural networks, apart from sharing the same name. They're mathematical constructs that are connected with each other in a weighted manner, allowing to take one or more inputs and produce one or more outputs.
EDIT: I have to add, as much as the Blue Project is an incredible and very admirable step towards modeling an entire brain, we are far far far far away from that goal. All these are models, so they approximate the behaviour of biological cells, but they are in no way complete. Furthermore, there is a high bias in the "choice" of which neurons these models analyze. Most of the models represent certain areas of the brain (such as the cortex or the hippocampus) of which 1) we have quite a bit of knowledge and 2) are constituted by very organized structures of neuronal cells working together. Other parts of the brain may not be as trivial to model (note that I use "trivial" in a jokingly way, I'm not in any way saying that modeling the cortex is easy!), but I guess the details of this would be a bit outside the scope of SO. Maybe when the cognitive science proposal will be operative you could pose the question there!
Finally, to correct the quoted statement, the project did model a column of the somatosensory cortex of the rat, which is only a very tiny part of an entire rat brain.

Related

How to choose number of hidden layers and nodes in neural network? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
What does number of hidden layers in a multilayer perceptron neural network do to the way neural network behaves? Same question for number of nodes in hidden layers?
Let's say I want to use a neural network for hand written character recognition. In this case I put pixel colour intensity values as input nodes, and character classes as output nodes.
How would I choose number of hidden layers and nodes to solve such problem?

Note: this answer was correct at the time it was made, but has since become outdated.
It is rare to have more than two hidden layers in a neural network. The number of layers will usually not be a parameter of your network you will worry much about.
Although multi-layer neural networks with many layers can represent
deep circuits, training deep networks has always been seen as somewhat
of a challenge. Until very recently, empirical studies often found
that deep networks generally performed no better, and often worse,
than neural networks with one or two hidden layers.
Bengio, Y. & LeCun, Y., 2007. Scaling learning algorithms towards AI. Large-Scale Kernel Machines, (1), pp.1-41.
The cited paper is a good reference for learning about the effect of network depth, recent progress in teaching deep networks, and deep learning in general.

The general answer is to for picking hyperparameters is to cross-validate. Hold out some data, train the networks with different configurations, and use the one that performs best on the held out set.

Most of the problems I have seen were solved with 1-2 hidden layers. It is proven that MLPs with only one hidden layer are universal function approximators (Hornik et. al.). More hidden layers can make the problem easier or harder. You usually have to try different topologies. I heard that you cannot add an arbitrary number of hidden layers if you want to train your MLP with backprop because the gradient will become too small in the first layers (I have no reference for that). But there are some applications where people used up to nine layers. Maybe you are interested in a standard benchmark problem which is solved by different classifiers and MLP topologies.

Besides the fact that cross-validation on different model configurations(no. of hidden layers OR neurons per layer) will lead you to choose better configuration.
One approach is training a model, as big and deep as possible and use dropout regularization to turn off some neurons and reduce overfitting.
the reference to this approach can be seen in this paper.
https://www.cs.toronto.edu/~hinton/absps/JMLRdropout.pdf

All the above answers are of course correct but just to add some more ideas:
Some general rules are the following based on this paper: 'Approximating Number of Hidden layer neurons in Multiple Hidden Layer BPNN Architecture' by Saurabh Karsoliya
In general:
The number of hidden layer neurons are 2/3 (or 70% to 90%) of the size of the input layer. If this is insufficient then number of output layer neurons can be added later on.
The number of hidden layer neurons should be less than twice of the number of neurons in input layer.
The size of the hidden layer neurons is between the input layer size and the output layer size.
Keep always in mind that you need to explore and try a lot of different combinations. Also, using GridSearch you could find the "best model and parameters".
E.g. we can do a GridSearch in order to determine the "best" size of the hidden layer.

How do I pick a good representation for a board game tactic for a genetic algorithm?

For my bachelor's thesis I want to write a genetic algorithm that learns to play the game of Stratego (if you don't know this game, it's probably safe to assume I said chess). I haven't ever before done actual AI projects, so it's an eye-opener to see how little I actually know of implementing things.
The thing I'm stuck with is coming up with a good representation for an actual strategy. I'm probably making some thinking error, but some problems I encounter:
I don't assume you would have a representation containing a lot of
transitions between board positions, since that would just be
bruteforcing it, right?
What could branches of a decision tree look
like? Any representation I come up with don't have interchangeable
branches... If I were to use a bit string, which is apparently also
common, what would the bits represent?
Do I assign scores to the distance between certain pieces? How would I represent that?
I think I ought to know these things after three+ years of study, so I feel pretty stupid - this must look likeI have no clue at all. Still, any help or tips on what to Google would be appreciated!

I think, you could define a decision model and then try to optimize the parameters of that model. You can create multi-stage decision models also. I once did something similar for solving a dynamic dial-a-ride problem (paper here) by modeling it as a two stage linear decision problem. To give you an example, you could:
For each of your figures decide which one is to move next. Each figure is characterized by certain features derived from its position on the board, e.g. ability to make a score, danger, protecting x other figures, and so on. Each of these features can be combined (e.g. in a linear model, through a neural network, through a symbolic expression tree, a decision tree, ...) and give you a rank on which figure to act next with.
Acting with the figure you selected. Again there are a certain number of actions that can be taken, each has certain features. Again you can combine and rank them and one action will have the highest priority. This is the one you choose to perform.
The features you extract can be very simple or insanely complex, it's up to what you think will work best vs what takes how long to compute.
To evaluate and improve the quality of your decision model you can then simulate these decisions in several games against opponents and train the parameters of the model that combines these features to rank the moves (e.g. using a GA). This way you tune the model to win as many games as possible against the specified opponents. You can test the generality of that model by playing against opponents it has not seen before.
As Mathew Hall just said, you can use GP for this (if your model is a complex rule), but this is just one kind of model. In my case a linear combination of the weights did very well.
Btw, if you're interested we've also got a software on heuristic optimization which provides you with GA, GP and that stuff. It's called HeuristicLab. It's GPL and open source, but comes with a GUI (Windows). We've some Howto on how to evaluate the fitness function in an external program (data exchange using protocol buffers), so you can work on your simulation and your decision model and let the algorithms present in HeuristicLab optimize your parameters.

Vincent,
First, don't feel stupid. You've been (I infer) studying basic computer science for three years; now you're applying those basic techniques to something pretty specialized-- a particular application (Stratego) in a narrow field (artificial intelligence.)
Second, make sure your advisor fully understands the rules of Stratego. Stratego is played on a larger board, with more pieces (and more types of pieces) than chess. This gives it a vastly larger space of legal positions, and a vastly larger space of legal moves. It is also a game of hidden information, increasing the difficulty yet again. Your advisor may want to limit the scope of the project, e.g., concentrate on a variant with full observation. I don't know why you think this is simpler, except that the moves of the pieces are a little simpler.
Third, I think the right thing to do at first is to take a look at how games in general are handled in the field of AI. Russell and Norvig, chapters 3 (for general background) and 5 (for two player games) are pretty accessible and well-written. You'll see two basic ideas: One, that you're basically performing a huge search in a tree looking for a win, and two, that for any non-trivial game, the trees are too large, so you search to a certain depth and then cop out with a "board evaluation function" and look for one of those. I think your third bullet point is in this vein.
The board evaluation function is the magic, and probably a good candidate for using either a genetic algorithm, or a genetic program, either of which might be used in conjunction with a neural network. The basic idea is that you are trying to design (or evolve, actually) a function that takes as input a board position, and outputs a single number. Large numbers correspond to strong positions, and small numbers to weak positions. There is a famous paper by Chellapilla and Fogel showing how to do this for a game of Checkers:
http://library.natural-selection.com/Library/1999/Evolving_NN_Checkers.pdf
I think that's a great paper, tying three great strands of AI together: Adversarial search, genetic algorithms, and neural networks. It should give you some inspiration about how to represent your board, how to think about board evaluations, etc.
Be warned, though, that what you're trying to do is substantially more complex than Chellapilla and Fogel's work. That's okay-- it's 13 years later, after all, and you'll be at this for a while. You're still going to have a problem representing the board, because the AI player has imperfect knowledge of its opponent's state; initially, nothing is known but positions, but eventually as pieces are eliminated in conflict, one can start using First Order Logic or related techniques to start narrowing down individual pieces, and possibly even probabilistic methods to infer information about the whole set. (Some of these may be beyond the scope of an undergrad project.)

The fact you are having problems coming up with a representation for an actual strategy is not that surprising. In fact I would argue that it is the most challenging part of what you are attempting. Unfortunately, I haven't heard of Stratego so being a bit lazy I am going to assume you said chess.
The trouble is that a chess strategy is rather a complex thing. You suggest in your answer containing lots of transitions between board positions in the GA, but a chess board has more possible positions than the number of atoms in the universe this is clearly not going to work very well. What you will likely need to do is encode in the GA a series of weights/parameters that are attached to something that takes in the board position and fires out a move, I believe this is what you are hinting at in your second suggestion.
Probably the simplest suggestion would be to use some sort of generic function approximation like a neural network; Perceptrons or Radial Basis Functions are two possibilities. You can encode weights for the various nodes into the GA, although there are other fairly sound ways to train a neural network, see Backpropagation. You could perhaps encode the network structure instead/as well, this also has the advantage that I am pretty sure a fair amount of research has been done into developing neural networks with a genetic algorithm so you wouldn't be starting completely from scratch.
You still need to come up with how you are going to present the board to the neural network and interpret the result from it. Especially, with chess you would have to take note that a lot of moves will be illegal. It would be very beneficial if you could encode the board and interpret the result such that only legal moves are presented. I would suggest implementing the mechanics of the system and then playing around with different board representations to see what gives good results. A few ideas top of the head ideas to get you started could be, although I am not really convinced any of them are especially great ways to do this:
A bit string with all 64 squares one after another with a number presenting what is present in each square. Most obvious, but probably a rather bad representation as a lot of work will be required to filter out illegal moves.
A bit string with all 64 squares one after another with a number presenting what can move to each square. This has the advantage of embodying the covering concept of chess where you what to gain as much coverage of the board with your pieces as possible, but still has problems with illegal moves and dealing with friendly/enemy pieces.
A bit string with all 32 pieces one after another with a number presenting the location of that piece in each square.
In general though I would suggest that chess is rather a complex game to start with, I think it will be rather hard to get something playing to standard which is noticeably better than random. I don't know if Stratego is any simpler, but I would strongly suggest you opt for a fairly simple game. This will let you focus on getting the mechanics of the implementation correct and the representation of the game state.
Anyway hope that is of some help to you.
EDIT: As a quick addition it is worth looking into how standard chess AI's work, I believe most use some sort of Minimax system.

When you say "tactic", do you mean you want the GA to give you a general algorithm to play the game (i.e. evolve an AI) or do you want the game to use a GA to search the space of possible moves to generate a move at each turn?
If you want to do the former, then look into using Genetic programming (GP). You could try to use it to produce the best AI you can for a fixed tree size. JGAP already comes with support for GP as well. See the JGAP Robocode example for an instance of this. This approach does mean you need a domain specific language for a Stratego AI, so you'll need to think carefully how you expose the board and pieces to it.
Using GP means your fitness function can just be how well the AI does at a fixed number of pre-programmed games, but that requires a good AI player to start with (or a very patient human).

#DonAndre's answer is absolutely correct for movement. In general, problems involving state-based decisions are hard to model with GAs, requiring some form of GP (either explicit or, as #DonAndre suggested, trees that are essentially declarative programs).
A general Stratego player seems to me quite challenging, but if you have a reasonable Stratego playing program, "Setting up your Stratego board" would be an excellent GA problem. The initial positions of your pieces would be the phenotype and the outcome of the external Stratego-playing code would be the fitness. It is intuitively likely that random setups would be disadvantaged versus setups that have a few "good ideas" and that small "good ideas" could be combined into fitter-and-fitter setups.
...
On the general problem of what a decision tree, even trying to come up with a simple example, I kept finding it hard to come up with a small enough example, but maybe in the case where you are evaluation whether to attack a same-ranked piece (which, IIRC destroys both you and the other piece?):
double locationNeed = aVeryComplexDecisionTree();
if(thatRank == thisRank){
double sacrificeWillingness = SACRIFICE_GENETIC_BASE; //Assume range 0.0 - 1.0
double sacrificeNeed = anotherComplexTree(); //0.0 - 1.0
double sacrificeInContext = sacrificeNeed * SACRIFICE_NEED_GENETIC_DISCOUNT; //0.0 - 1.0
if(sacrificeInContext > sacrificeNeed){
...OK, this piece is "willing" to sacrifice itself
One way or the other, the basic idea is that you'd still have a lot of coding of Stratego-play, you'd just be seeking places where you could insert parameters that would change the outcome. Here I had the idea of a "base" disposition to sacrifice itself (presumably higher in common pieces) and a "discount" genetically-determined parameter that would weight whether the piece would "accept or reject" the need for a sacrifice.

AI Behavior Decision making

I am running a physics simulation and applying a set of movement instructions to a simulated skeleton. I have a multiple sets of instructions for the skeleton consisting of force application to legs, arms, torso etc. and duration of force applied to their respective bone. Each set of instructions (behavior) is developed by testing its effectiveness performing the desired behavior, and then modifying the behavior with a genetic algorithm with other similar behaviors, and testing it again. The skeleton will have an array behaviors in its set list.
I have fitness functions which test for stability, speed, minimization of entropy and force on joints. The problem is that any given behavior will work for a specific context. One behavior works on flat ground, another works if there is a bump in front of the right foot, another if it's in front of the left, and so on. So the fitness of each behavior varies based on the context. Picking a behavior simply on its previous fitness level won't work because that fitness score doesn't apply to this context.
My question is, how do I program to have the skeleton pick the best behavior for the context? Such as picking the best walking behavior for a randomized bumpy terrain.

In a different answer I've given to this question, I assumed that the "terrain" information you have for your model was very approximate and large-grained, e.g., "smooth and flat", "rough", "rocky", etc. and perhaps only at a grid level. However, if the world model is in fact very detailed, such as from a simulated version of a 3-D laser range scanner, then algorithmic and computational path/motion planning approaches from robotics are likely to be more useful than a machine-learning classifier system.
PATH/MOTION PLANNING METHODS
There are a fairly large number of path and motion planning methods, including some perhaps more suited to walking/locomotion, but a few of the more general ones worth mentioning are:
Visibility graphs
Potential Fields
Sampling-based methods
The general solution approach would be use a path planning method to determine the walking trajectory that your skeleton should follow to avoid obstacles, and then use your GA-based controller to achieve the appropriate motion. This is very much at the core of robotics: sense the world and determine actions and motor control required to achieve some goal(s).
Also, a quick literature search turned up the following papers and a book as a source of ideas and starting points for further investigation. The paper on legged robot motion planning may be especially useful as it discusses several motion planning strategies.
Reading Suggestions
Steven Michael LaValle (2006). Planning Algorithms, Cambridge University Press.
Kris Hauser, Timothy Bretl, Jean-Claude Latombe, Kensuke Harada, Brian Wilcox (2008). "Motion Planning for Legged Robots on Varied Terrain", The International Journal of Robotics Research, Vol. 27, No. 11-12, 1325-1349,
DOI: 10.1177/0278364908098447
Guilherme N. DeSouza and Avinash C. Kak (2002). "Vision for Mobile Robot Navigation: A Survey", IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 24, No. 2, February, pp 237-267.

Why not test the behaviors against a randomized bumpy terrain? Just set the parameters of the GA so that it's a little forgiving, and won't condemn a behavior for one or two failures.
You have two problems:
Bipedal locomotion without senses is very difficult. I've seen good robotic locomotion over rough terrain without senses, but never with only two legs. So the best solution you can possibly find this way might not be very good.
Running a GA is as much art as science. There are a lot of knobs you can turn, and it's hard to find parameters that will allow novelty to grow without drowning it in noise.
Starting simple (e.g. crawling) will help with both of these.
EDIT:
Wait... you're training it over and over on the same randomized terrain? Well no wonder you're having trouble! It's optimizing for that particular layout of rocks and bumps, which is much easier than generalizing. Depending on how your GA works, you might get some benefit from making the course really long, but a better solution is to randomize the terrain for every pass. When it can no longer exploit specific features of the terrain, it will have an evolutionary incentive to generalize. Since this is a more difficult problem it will not learn as quickly as it did before, and it might not be able to get very good at all with its current parameters; be prepared to tinker.

There are three aspects to my answer: (1) control theory, (2) sensing, and (3) merging sensing and action.
CONTROL THEORY
The answer to your problem depends partially on what kind of control scheme you are using: is it feed-forward or feedback control? If the latter, what simulated real-time sensors do you have other than terrain information?
Simply having terrain information and incorporating it into your control strategy would not mean you are using feedback control. It is possible to use such information to select a feed-forward strategy, which seems closest to the problem that you have described.
SENSING
Whether you are using feed-forward or feedback control, you need to represent the terrain information and any other sensory data as an input space for your control system. Part of training your GA-based motion controller should be moving your skeleton through a broad range of random terrain in order to learn feature detectors. The feature detectors classify the terrain scenarios by segmenting the input space into regions critical to deciding what is the best action policy, i.e., what control behavior to employ.
How to best represent the input space depends on the level of granularity of the terrain information you have for your simulation. If it's just a discrete space of terrain type and/or obstacles in some grid space, you may be able to present it directly to your GA without transformation. If, however, the data is in a continuous space such as terrain type and obstacles at arbitrary range/direction, you may need to transform it into a space from which it may be easier to infer spatial relationships, such as coarse-coded range and direction, e.g., near, mid, far and forward, left-forward, left, etc. Gaussian and fuzzy classifiers can be useful for the latter approach, but discrete-valued coding can also work.
MERGING SENSING AND ACTION
Using one of the input-space-encoding approaches above, you have a few options for how to connect behavior selection search space and motion control search space:
Separate the two spaces into two learning problems and use a separate GA to evolve the parameters of a standard multi-layer perceptron neural network. The latter would have your sensor data (perhaps transformed) as inputs and your set of skeleton behaviors as outputs. Instead of using back-propagation or some other ANN-learning method to learn the network weights, your GA could use some fitness function to evolve the parameters over a series of simulated trials, e.g., fitness = distance traveled in a fixed time period toward point B starting from point A. This should evolve over successive generations from completely random selection of behaviors to something more coordinated and useful.
Merge the two search spaces (behavior selection and skeleton motor control) by linking a multi-layer perceptron network as described in (1) above into the existing GA-based controller framework that you have, using the skeleton behavior set as the linkage. The parameter space that will be evolved will be both the neural network weights and whatever your existing controller parameter space is. Assuming that you are using a multi-objective genetic algorithm, such as the NSGA-II algorithm, (since you have multiple fitness functions), the fitness functions would be stability, speed, minimization of entropy, force on joints, etc, plus some fitness function(s) targeted at learning the behavior-selection policy, e.g., distance moved toward point B starting from point A in a fixed time period.
The difference between this approach and (1) above is that you may be able to learn both better coordination of behaviors and finer-grain motor control since the parameter space is likely to be better explored when the two problems are merged as opposed to being separate. The downside is that it may take much longer to converge on reasonable parameter solutions(s), and not all aspects of motor control may be learned as well as they would if the two learning problems were kept separate.
Given that you already have working evolved solutions for the motor control problem, you are probably better off using approach (1) to learn the behavior-selection model with a separate GA. Also, there are many alternatives to the hybrid GA-ANN scheme I described above for learning the latter model, including not learning a model at all and instead using a path planning algorithm as described in a separate answer from me. I simply offered this approach since you are already familiar with GA-based machine learning.
The action selection problem is a robust area of research in both machine learning and autonomous robotics. It's probably well-worth reading up on this topic in itself to gain better perspective and insight into your current problem, and you may be able to devise a simpler strategy than anything I've suggested so far by viewing your problem through the lens of this paradigm.

You're using a genetic algorithm to modify the behavior, so that must mean you have devised a fitness function for each combination of factors. Is that your question?
If yes, the answer depends on what metrics you use to define best walking behavior:
Maximize stability
Maximize speed
Minimize forces on joints
Minimize energy or entropy production
Or do you just try a bunch of parameters, record the values, and then let the genetic algorithm drive you to the best solution?
If each behavior works well in one context and not another, I'd try quantifying how to sense and interpolate between contexts and blend the strategies to see if that would help.

It sounds like at this point you have just a classification problem. You want to map some knowledge about what you are currently walking on to one of a set of classes. Knowing the class of the terrain allows you to then invoke the proper subroutine. Is this correct?
If so, then there are a wide array of classification engines that you can use including neural networks, Bayesian networks, decision trees, nearest neighbor, etc. In order to pick the best fit, we will need more information about your problem.
First, what kind of input or sensory data do you have available to help you identify the behavior class you should invoke? Second, can you describe the circumstances in which you will be training this classifier and what the circumstances are during runtime when you deploy it, such as any limits on computational resources or requirements of robustness to noise?
EDIT: Since you have a fixed number of classes, and you have some parameterized model for generating all possible terrains, I would consider using k-means clustering. The principle is as follows. You cluster a whole bunch of terrains into k different classes, where each cluster is associated with one of your specialized subroutines that performs best for that cluster of terrains. Then when a new terrain comes in, it will probably fall near one of these clusters. You then invoke the corresponding specialized subroutine to navigate that terrain.
Do this offline: Generate enough random terrains to sufficiently sample the parameter space, map these terrains to your sensory space (but remember which points in sensory space correspond to which terrains), and then run k-means clustering on this sensory space corpus where k is the number of classes you want to learn. Your distance function between a class representative C and a point P in sensory space would be simply the fitness function of letting algorithm C navigate the terrain that generated P. You would then get a partitioning of your sensory space into k clusters, each cluster mapping to the best subroutine that you've got. Each cluster will have a representative point in sensory space.
Now during runtime: You will get some unlabeled point in sensory space. Use a different distance function to find the closest representative point to this new incoming point. That tells you what class the terrain is.
Note that the success of this method depends on the quality of the mapping from the parameter space of terrain generation to sensory space, from sensory space to your fitness functions, and the eventual distance function you use to compare points in sensory space.
Note also that if you had enough memory, instead of only using the k representative sensory points to tell you which class an unlabeled sensory point belongs to, you might go through your training set and label all points with the learned class. Then during runtime you pick the nearest neighbor, and conclude that your unlabeled point in sensory space is in the same class as that neighbor.

Neural Network "Breeding"

I just watched a Google tech talk video covering "Polyworld" (found here) and they talk about breeding two neural networks together to form offspring. My question is, how would one go about combining two neural networks? They seem so different that any attempt to combine them would simply form a third, totally unrelated network. Perhaps I'm missing something, but I don't see a good way to take the positive aspects of two separate neural networks and combine them into a single one. If anyone could elaborate on this process, I'd appreciate it.

Neither response so far is true to the nature of Polyworld!...
They both describe a typical Genetic Algorithm (GA) application. While GA incorporates some of the elements found in Polyworld (breeding, selection), GA also implies some form of "objective" criteria aimed at guiding evolution towards [relatively] specific goals.
Polyworld, on the other hand is a framework for Artificial Life (ALife). With ALife, the survival of individual creatures and their ability to pass their genes to other generations, is not directed so much by their ability to satisfy a particular "fitness function", but instead it is tied to various broader, non-goal-oriented, criteria, such as the ability of the individual to feed itself in ways commensurate with its size and its metabolism, its ability to avoid predators, its ability to find mating partners and also various doses of luck and randomness.
Polyworld's model associated with the creatures and their world is relatively fixed (for example they all have access to (though may elect not to use) various basic sensors (for color, for shape...) and various actuators ("devices" to eat, to mate, to turn, to move...) and these basic sensorial and motor functions do not evolve (as it may in nature, for example when creatures find ways to become sensitive to heat or to sounds and/or find ways of moving that are different from the original motion primitives etc...)
On the other hand, the brain of creatures has structure and connections which are both the product of the creature's genetic make-up ("stuff" from its ancestors) and of its own experience. For example the main algorithm used to determine the strength of connections between neurons uses Hebbian logic (i.e. fire-together, wire-together) during the lifetime of the creature (early on, I'm guessing, as the algorithm often has a "cooling" factor which minimize its ability to change things in a big way, as times goes by). It is unclear if the model includes some form of Lamarkian evolution, whereby some of the high-level behaviors are [directly] passed on through the genes, rather than being [possibly] relearnt with each generation (on the indirect basis of some genetically passed structure).
The salient difference between ALife and GA (and there are others!) is that with ALife, the focus is on observing and fostering in non-directed ways, emergent behaviors -whatever they may be- such as, for example, when some creatures evolve a makeup which prompts them to wait nearby piles of green food and wait for dark green creatures to kill them, or some creatures may start collaborating with one another, for example by seeking each other's presence for other purposes than mating etc. With GA, the focus is on a particular behavior of the program being evolved. For example the goal may be to have the program recognize edges in a video image, and therefore evolution is favored in this specific direction. Individual programs which perform this task better (as measure with some "fitness function") are favored with regards to evolution.
Another less obvious but important difference regards the way creatures (or programs in the case of GA) reproduce themselves. With ALife, individual creatures find their own mating partners, at random at first although, after some time they may learn to reproduce only with creatures exhibiting a particular attribute or behavior. With GA, on the other hand, "sex" is left to the GA framework itself, which chooses, for example, to preferably cross-breed individuals (and clones thereof) which score well on the fitness function (and always leaving room for some randomness, lest the solution search stays stuck at some local maxima, but the point is that the GA framework decides mostly who has sex with whom)...
Having clarified this, we can return to the OP's original question...
... how would one go about combining two neural networks? They seem so different that any attempt to combine them would simply form a third, totally unrelated network. ...I don't see a good way to take the positive aspects of two separate neural networks and combine them into a single one...
The "genetic makeup" of a particular creature affects parameters such as the size of the creature, its color and such. It also includes parameters associated with the brain, in particular its structure: the number of neurons, the existence of connection from various sensors (eg. does the creature see the Blue color very well ?) the existence of connections towards various actuators (eg. does the creature use its light?). The specific connections between neurons and the relative strength of these may also be passed in the genes, if only to serve as initial values, to be quickly changed during brain learning phase.
By taking two creatures, we [nature!] can select in a more or less random fashion, which parameter come from the first creature and which come from the other creature (as well as a few novel "mutations" which come from neither parents). For example if the "father" had many connections with red color sensor, but the mother didn't the offspring may look like the father in this area, but also get his mother's 4 neuron-layers structure rather than father's 6 neuron-layers structure.
The interest of doing so is to discover new capabilities from the individuals; in the example above, the creature may now better detect red colored predators, and also process info more quickly in its slightly simpler brain (compared with the father's). Not all offspring are better equipped than their parents, such weaker individuals, may disappear in short order (or possibly and luckily survive long enough, to provide, say, their fancy way of moving and evading predators, even though their parent made them blind or too big or whatever... The key thing again: is not to be so worried about immediate usefulness of a particular trait, only to see it play in the long term.

They wouldn't really be breeding two neural networks together. Presumably they have a variety of genetic algorithm that produces a particular neural network structure given a particular sequence of "genes". They would start with a population of gene sequences, produce their characteristic neural networks, and then expose each of these networks to the same training regimen. Presumably, some of these networks would respond to the training better than some others (i.e. they would be more easily "trainable" to achieve the desired behavior). They would then take the genetic sequences that produced the best "trainees", cross-breed them with each other, produce their characteristic neural networks, which would then be exposed to the same training regimen. Presumably, some of these neural networks in the second generation would be even more trainable than those from the first generation. These would become the parents of the third generation, and so on and so forth.

Neural networks aren't (probably) in this case arbitrary trees. They are probably networks with a constant structure, i.e. same nodes and connections, so 'breeding' them would involve 'averaging' the weights of nodes. You could average the weights for each pair of nodes in the two corresponding nets to produce the 'offspring' net. Or you could use a more complicated function dependent on ever-further sets of neighboring nodes – the possibilities are Vast.
My answer is incomplete if the assumption about the fixed structure is false or unwarranted.

Brain modelling

Just wondering, since we've reached 1 teraflop per PC, yet we are still not able to model an insect's brain.
Has anyone seen a decent implementation of a self-learning, self-developing neural network?

I saw an interesting experiment mapping the physical neural layout of a rat's brain to a digital neural network with weighting modelled on the neuron chemistry of each component taken using MRI and others. Quite interesting. (new scientist or Focus, 2 issues ago?)
IBM Blue Brain comes to mind
http://news.bbc.co.uk/1/hi/sci/tech/8012496.stm
The problem is computation power as you rightly point out. But for a sequence of stimuli to a neural network the range of calculations tends to be exponential as that stimuli encounters deeper nested nodes. Any complex weighting algorithm means that time spent at each node can get expensive. Domain specific neural-maps tend to be quicker because they are specialized. Brains in mammals have many general paths, making it harder to teach them, and for a computer to model a real mammal brain in a given space/time.
Real brains also have tons of cross-talk like static (some people think this is where creativity or original thought stems from). Brains also don't learn using 'direct' stimulus/reward ... they use past experience of non-related matter to create their own learning. Recreating the neurons is one thing in a computational space, creating an accurate learning is another. Never-mind the dopamine (octopamine in insects) and other neurological chemicals.
imagine giving a digital brain LSD or anti-depressants. As a real simulation. Awesome. That would be a complex simulation I suspect.

I think you're kind of making the assumption that our idea of how neural networks work is a good model for the brain at a large-scale level; I'm not sure that is a good assumption. Hell, not too many years ago we didn't think the glial cells were important to mental functions, and it was the idea for a long time that there is no neurogenesis after the brain matures.
On the other hand, neural networks do seem to handle some apparently complex functions pretty well.
So, here's a little puzzle question for you: how many teraflops or petaflops do you think a human brain's computation represents?

Jeff Hawkins would say that a neural net is a poor approximation of a brain. His "On Intelligence" is a terrific read.

Yup: OpenCog is working on it.

It's the structure. Even if we had computers today with the same or higher performance than a human brain (there are different predictions when we'll get there, but there are still a few years to go), we still need to program it. And while we know a lot of the brain today, there are still many, many more things we do not know. And these aren't just details, but large areas that are not understood at all.
Focusing only on the Tera-/Peta-FLOPS is like looking only at megapixels with digital cameras: it focuses on only one value when there are many factors involved (and there are a few more of those in a brain than in a camera). I also believe that many of the estimates just how many FLOPS would be needed to simulate a brain are way off - but that's a different discussion altogether.

Just wondering, we've reached 1 teraflop per PC, and we are still not able to model an insect's brain. has anyone seen a decent implementation of a self-learning self-developing neural network?
We can already model brains. The question these days, is how fast, and how accurate.
In the beginning, there was effort expended on trying to find the most abstract representation of neurons with the least amount of physical properties needed.
This led to the invention of the perceptron at Cornell University, which is a very simple model indeed. In fact, it may have been too simple, as the famous MIT AI professor, Marvin Minsky, wrote a paper which mistakenly concluded that it would be impossible for this type of model to learn XOR (a basic logic gate that could be emulated by every computer we have today). Unfortunately, his paper plunged neural network research into the dark ages for at least 10 years.
While probably not as impressive as many would like, there are learning networks that are already in existence that can do visual and speech learning and recognition.
And even though we have faster CPUs, it is still not the same as a neuron. Neurons in our brain are, at the very least, parallel adder units. So imagine 100 billion simulated human neurons, adding each second, sending their outputs to 100 trillion connections with a "clock" of about 20hz. The amount of computation going on here far exceeds the petaflops of processing power we have, especially when our cpus are mostly serial instead of parallel.

In 2007, they simulated the equivalent of a half mouse brain for 10 seconds at half the actual speed: http://news.bbc.co.uk/1/hi/technology/6600965.stm

There is a worm named C. Elegance and its anatomy is completely know to us. Every cell is mapped out and every neuron is well studied. This worm has an interesting property by birth and that is it follows or grow towards only those temperature regions in which it was born. Here is link to the paper. This paper has implementation of the property with neuronal model. And there are some students who have built robot that only follows dark regions in the region having different shades of light, using this neuronal model. This work could have been done using other methods as well but this method is more noise resilient as proved by paper to which I have given link above.