It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 10 years ago.
Here might not be the proper place to ask this question but I didn't find any better place to ask it. I have a program that have for example 10 parameters. Every time I ran it, it could lead to 3 results. 0, 0.5 or 1. I don't know how the parameters would influence the last result. I need something to little by little improve my program so it gets more 1s and less 0s.
First, just to get the terminology right, this is really a "search" problem, not a "machine learning" problem (you're trying to find a very good solution, not trying to recognize how inputs relate to outputs). Your problem sounds like a classic "function optimization" search problem.
There are many techniques that can be used. The right one depends on a few different factors, but the biggest question is the size and shape of the solution space. The biggest question there is "how sensitive is the output to small changes in the inputs?" If you hold all the inputs except one the same and make a tiny change, are you going to get a huge change in the output or just a small change? Do the inputs interact with each other, especially in complex ways?
The smaller and "smoother" the solution space (that is, the less sensitive it is to tiny changes in inputs), the more you would want to pursue straightforward statistical techniques , guided search, or perhaps, if you wanted something a little more interesting, simulated annealing.
The larger and more complex the solution space, the more that would guide you towards either more sophisticated statistical techniques or my favorite class of algorithms, which are genetic algorithms, which can very rapidly search a large solution space.
Just to sketch out how you might apply genetic algorithms to your problem, let's assume that the inputs are independent from each other (a rare case, I know):
Create a mapping to your inputs from a series of binary digits 0011 1100 0100 ...etc...
Generate a random population of some significant size using this mapping
Determine the fitness of each individual in the population (in your case, "count the 1s" in the output)
Choose two "parents" by lottery:
For each half-point in the output, an individual gets a "lottery ticket" (in other words, an output that has 2 "1"s and 3 "0.5"s will get 7 "tickets" while one with 1 "1" and 2 "0.5"s will get 4 "tickets")
Choose a lottery ticket randomly. Since "more fit" individuals will have more "tickets" this means that "more fit" individuals will be more likely to be "parents"
Create a child from the parents' genomes:
Start copying one parents genome from left to right 0011 11...
At every step, switch to the other parent with some fixed probability (say, 20% of the time)
The resulting child will have some amount of one parents genome and some amount of the other's. Because the child was created from "high fitness" individuals, it is likely that the child will have a fitness higher than the average of the current generation (although it is certainly possible that it might have lower fitness)
Replace some percentage of the population with children generated in this manner
Repeat from the "Determine fitness" step... In the ideal case, every generation will have an average fitness that is higher than the previous generation and you will find a very good (or maybe even ideal) solution.
Are you just trying to modify the parameters so the results come out to 1? It sounds like the program is a black box where you can pick the input parameters and then see the results. Since that is the case I think it would be best to choose a range of input parameters, cycle through those inputs, and view the outputs to try to discern a pattern. If you could automate it it'll help out a lot. After you run through the data you may be able to spot check to see which parameter give you which results, or you could apply some machine learning techniques to determine which parameters lead to which outputs.
As Larry said, looks like a combinatorial search and the solution will depends on the "topology" of the problem.
If you can, try to get the Algorithm Design Manuel book (S. Skiena), it has a chapter on this that can help determine the good method for this problem...
Related
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
Here the problem:
you got a list of ingredients (assuming their value unitary) with
their respective quantities, and a list of products. Each product got
a price and the recipe which contain the needed ingredients an their
quantities.
You need is to maximize the total proceeds from those products with
the given ingredients.
The first thing blowing up in my mind is to create a price/(n° needed items) ratio and start creating the products with the highest ratio. I know that this is some kind of greedy algorithm (if I'm not wrong) and not always lead to the best solution but I had no other implementable ideas.
Another way may be to brute-force all the possibilities, but I'm not able to realize how I can implement it; I'm not so familiar with the brute-forcing. My first brute-force algorithm was this one, but it was easy because it was with numbers and, furthermore, the element that comes after is not precluded by the previous elements.
Here the things are different, because the next element is a function of the available ingredients, whom are influenced from the previous products, and so on.
Have you any hint? This is some kind of homework, so I prefer not a direct solution, but something to start from!
The language I have to use is C
Many thanks in advance :)
I would first try looking at this as a linear programming problem; there are algorithms available to solve them efficiently.
If your problem can't accept a fractional number of items, then it is actually an integer programming problem. There are algorithms available to solve these as well, but in general it can be difficult (as in time-consuming) to solve large integer programming problems exactly.
Note that a linear programming solution may be a good first approximation to an integer programming solution, e.g. if your production quantities are large.
If you have the CPU cycles to do it (and efficiency doesn't matter), brute force is probably the best way to go, because it's the simplest and also guaranteed to always (eventually) find the best answer.
Probably the first thing to do is figure out how to enumerate your options -- i.e. come up with a way to list all the different possible combinations of pastries you could make with the given ingredients. Don't worry about prices at first.
As a (contrived) example, with a cup of milk and a dozen eggs and some flour and sugar, I could make:
12 brownies
11 brownies and 1 cookie
10 brownies and 2 cookies
[...]
1 brownie and 11 cookies
12 cookies
Then once you have that list, you can iterate over the list, calculate how much money you would make on each option, and choose the one that makes the most money.
As far as generating the list of options goes, I would start by calculating how many cookies you could make if you were to make only cookies; then how many brownies you could make if you were to make only brownies, and so on. That will give you an absolute upper bound on how many of each item you ever need to consider. Then you can just consider every combination of items with per-type-numbers less than or equal to that bound, and throw out any combinations that turn out to require more ingredients than you have on hand. This would be really inefficient and slow, of course, but it would work.
It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 10 years ago.
According to Wikipedia (which is a bad source, I know) A neural network is comprised of
An input layer of A neurons
Multiple (B) Hidden layers each comprised of C neurons.
An output layer of "D" neurons.
I understand what does input and output layers mean.
My question is how to determine an optimal amount of layers and neuron-per-layer?
What is the advantage/disadvantage of a increasing "B"?
What is the advantage/disadvantage of a increasing "C"?
What is the difference between increasing "B" vs. "C"?
Is it only the amount of time (limits of processing power) or will making the network deeper limit quality of results and should I focus more on depth (more layers) or on breadth (more neurons per layer)?
Answer 1. One Layer will model most of the problems OR at max two layers can be used.
Answer 2. If an inadequate number of neurons are used, the network will be unable to model complex data, and the resulting fit will be poor. If too many neurons are used, the training time may become excessively long, and, worse, the network may over fit the data. When overfitting $ occurs, the network will begin to model random noise in the data. The result is that the model fits the training data extremely well, but it generalizes poorly to new, unseen data. Validation must be used to test for this.
$ What is overfitting?
In statistics, overfitting occurs when a statistical model describes random error or noise instead of the underlying relationship. Overfitting generally occurs when a model is excessively complex, such as having too many parameters relative to the number of observations. A model which has been overfit will generally have poor predictive performance, as it can exaggerate minor fluctuations in the data.
The concept of overfitting is important in machine learning. Usually a learning algorithm is trained using some set of training examples, i.e. exemplary situations for which the desired output is known. The learner is assumed to reach a state where it will also be able to predict the correct output for other examples, thus generalizing to situations not presented during training (based on its inductive bias). However, especially in cases where learning was performed too long or where training examples are rare, the learner may adjust to very specific random features of the training data, that have no causal relation to the target function. In this process of overfitting, the performance on the training examples still increases while the performance on unseen data becomes worse.
Answer 3. Read Answer 1 & 2.
Supervised Learning article on wikipedia (http://en.wikipedia.org/wiki/Supervised_learning) will give you more insight on what are the factors which are relly important with respect to any supervised learning system including Neural Netowrks. The article talks about Dimensionality of Input Space, Amount of training data, Noise etc.
The number of layers/nodes depends on the classification task and what you expect of the NN. Theoretically, if you have a linearly separable function/decision (e.g the boolean AND function), 1 layer (i.e only the input layer with no hidden layer) will be able to form a hyperplane and would be enough. If your function isn't linearly separable (e.g the boolean XOR), then you need hidden layers.
With 1 hidden layer, you can form any, possibly unbounded convex region. Any bounded continuous function with a finite mapping can be represented. More on that here.
2 hidden layers, on the other hand, are capable of representing arbitrarily complex decision boundaries. The only limitation is the number of nodes. In a typical 2-hidden layer network, first layer computes the regions and the second layer computes an AND operation (one for each hypercube). Lastly, the output layer computes an OR operation.
According to Kolmogorov's Theorem, all functions can be learned by a 2-hidden layer network and you never ever need more than 2 hidden layers. However, in practice, 1-hidden-layer almost always does the work.
In summary, fix B=0 for linearly separable functions and B=1 for everything else.
As for C and the relationship of B and C, have a look The Number of Hidden Layers. It provides general information and mentions underfitting, overfitting.
The author suggests one of the following as a rule of thumb:
size of the input layer < C < size of the output layer.
C = 2/3 the size of the input layer, plus the size of the output layer.
C < twice the size of the input layer.
In a rule system, or any reasoning system that deduces facts via forward-chaining inference rules, how would you prune "unnecessary" branches? I'm not sure what the formal terminology is, but I'm just trying to understand how people are able to limit their train-of-thought when reasoning over problems, whereas all semantic reasoners I've seen appear unable to do this.
For example, in John McCarthy's paper An Example for Natural Language Understanding and the AI Problems It Raises, he describes potential problems in getting a program to intelligently answer questions about a news article in the New York Times. In section 4, "The Need For Nonmonotonic Reasoning", he discusses the use of Occam's Razer to restrict the inclusion of facts when reasoning about the story. The sample story he uses is one about robbers who victimize a furniture store owner.
If a program were asked to form a "minimal completion" of the story in predicate calculus, it might need to include facts not directly mentioned in the original story. However, it would also need some way of knowing when to limit its chain of deduction, so as not to include irrelevant details. For example, it might want to include the exact number of police involved in the case, which the article omits, but it won't want to include the fact that each police officer has a mother.
Good Question.
From your Question i think what you refer to as 'pruning' is a model-building step performed ex ante--ie, to limit the inputs available to the algorithm to build the model. The term 'pruning' when used in Machine Learning refers to something different--an ex post step, after model construction and that operates upon the model itself and not on the available inputs. (There could be a second meaning in the ML domain, for the term 'pruning.' of, but i'm not aware of it.) In other words, pruning is indeed literally a technique to "limit its chain of deduction" as you put it, but it does so ex post, by excision of components of a complete (working) model, and not by limiting the inputs used to create that model.
On the other hand, isolating or limiting the inputs available for model construction--which is what i think you might have had in mind--is indeed a key Machine Learning theme; it's clearly a factor responsible for the superior performance of many of the more recent ML algorithms--for instance, Support Vector Machines (the insight that underlies SVM is construction of the maximum-margin hyperplane from only a small subset of the data, i.e, the 'support vectors'), and Multi-Adaptive Regression Splines (a regression technique in which no attempt is made to fit the data by "drawing a single continuous curve through it", instead, discrete section of the data are fit, one by one, using a bounded linear equation for each portion, ie., the 'splines', so the predicate step of optimal partitioning of the data is obviously the crux of this algorithm).
What problem is solving by pruning?
At least w/r/t specific ML algorithms i have actually coded and used--Decision Trees, MARS, and Neural Networks--pruning is performed on an initially over-fit model (a model that fits the training data so closely that it is unable to generalize (accurately predict new instances). In each instance, pruning involves removing marginal nodes (DT, NN) or terms in the regression equation (MARS) one by one.
Second, why is pruning necessary/desirable?
Isn't it better to just accurately set the convergence/splitting criteria? That won't always help. Pruning works from "the bottom up"; the model is constructed from the top down, so tuning the model (to achieve the same benefit as pruning) eliminates not just one or more decision nodes but also the child nodes that (like trimming a tree closer to the trunk). So eliminating a marginal node might also eliminate one or more strong nodes subordinate to that marginal node--but the modeler would never know that because his/her tuning eliminated further node creation at that marginal node. Pruning works from the other direction--from the most subordinate (lowest-level) child nodes upward in the direction of the root node.
I understand they aren't real and they seem to branch computation whenever there are 2 options, instead of picking one. But, for example, if I say this:
"Non deterministically guess a bijection p of vertices from Graph G to Graph H" (context here is Graph Isomorphism)
What is that supposed to mean? I understand the bijection, but it says "non deterministically guess". If it's guessing, how is that an algorithmic approach? How can it guarantee it's going to work?
They don't, they just sort of illustrate a point. Basically what they do is guess an answer, and check if it's right(deterministically). It's not the guessing the answer part that's important though, it's checking that the answer is right. It's just like saying given an arbitrary solution, is it correct? So for example there are problems that take exponential time to compute, and some of their answers can be checked in polynomial time, but some can't. So what the non-deterministic TM does is it divides those two, the ones that can be checked quickly from the ones that can't. And then this brings up the bigger question, if one group of questions solutions can be verified much quicker than another, can their solutions also be generated quicker? This question hasn't been answered, yet.
There's different ways to picture one. One I find useful is the oracle model. Did you ever see the Far Side cartoon where a derivation on the blackboard has "Here a miracle occurs" as one of the intermediate steps? In this version of a NDTM, when you need to choose something, the oracle writes the correct version on the right part of the tape. (This is taken from Garey and Johnson, Computers and Intractability, their classic book on NP-complete problems.) You aren't allowed to assume you've got the right one, though, and there may not be a correct one.
Therefore, when you non-deterministically guess a bijection, you're getting the correct bijection for your purposes, provided one exists.
It isn't a good basis for an algorithm, since the complexity of implementing a non-deterministic Turing machine is basically exponential in the nondeterministic states, and the algorithmic equivalent of the nondeterministic guess is to try every possible bijection.
From a theoretical point of view, I'd translate it as "If there is a bijection such that....". From an algorithmic point of view, find another book, or another chapter of the same book, since that approach is useless for even moderately large graphs.
I believe what is meant is "non deterministically choose a solution" and then test that the solution is true. Since all possible choices (guesses) are tested, the solution is guaranteed.
A physical implementation of the non-deterministic Turing machine is the DNA computer. For example, here's an outline of how to solve the traveling salesman problem in DNA:
Get/make a bunch of DNA sequences, each with length proportional to the cost of an edge in your graph and sticky ends with sequences uniquely identifying one of the vertices that the edge connects.
Mix them together, with DNA ligase in a big beaker. They'll anneal to each other in sequences that represent every possible path through the graph (ok, not the really long ones).
Remove all the sequences that are missing at least one vertex. To do this, sequentially select for each vertex using hybridization. For example, if "ACGTACA" encodes vertex 1, select for sequences that bind to "TGTACGA". Then repeat this selection for every other vertex.
Sort the remaining sequences by size using gel electrophoresis. Then sequence the shortest one. The sequence encodes the shortest path through your graph.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I'm writing a genetic programming (GP) system (in C but that's a minor detail). I've read a lot of the literature (Koza, Poli, Langdon, Banzhaf, Brameier, et al) but there are some implementation details I've never seen explained. For example:
I'm using a steady state population rather than a generational approach, primarily to use all of the computer's memory rather than reserve half for the interim population.
Q1. In GP, as opposed to GA, when you perform crossover you select two parents but do you create one child or two, or is that a free choice you have?
Q2. In steady state GP, as opposed to a generational system, what members of the population do the children created by crossover replace? This is what I haven't seen discussed. Is it the two parents, or is it two other, randomly-selected members? I can understand if it's the latter, and that you might use negative tournament selection to choose members to replace, but would that not create premature convergence? (After a crossover event the population contains the two original parents plus two children of those parents, and two other random members get removed. Elitism is inherent.)
Q3. Is there a Web forum or mailing list focused on GP? Oddly I haven't found one. Yahoo's GP group is used almost exclusively for announcements, the Poli/Langdon Field Guide forum is almost silent, and GP discussions on general/game programming sites like gamedev.net are very basic.
Thanks for any help you can provide!
Firstly, relax.
There are no "correct" methods in GP. GP is more art than science. Try lots of schemes and pick the ones that work best.
Q1: 1, 2, or many. You choose.
Q2: Replace, 1, 2, all. Or try some elitism.
Q3: You probably won't find forums discussing these questions b/c there are no right/best answers. Sorry.
PS. In my research, crossover never really performed well...
If you can read Python, you may want to take a look at Pyevolve. I am mainly involved in it on the GA side, but it has support for GP as well. May be you can get some hint there.
Q1 is your choice, but single child would probably be more common. Every time you do the lottery selection of parents, you're applying selection pressure, which is what you want.
Q2: Negative tournament selection is exactly the right approach. Yes, losing low-fitness members of the population causes rapid convergence initially, but once your population gets into the hard-to-search part of the solution space, it won't be as cut-and-dried which ones lose the tournament / lottery. What you do have to beware of is stagnation of the gene pool; I suggest monitoring the entropy of the genome to track its heterogeneity. "elitism is inherent" -- Well, yeah, that's the point! ;-)
Q3: comp.ai.genetic is probably your best bet. Sometimes the topic is picked up in game development fora, like on Gamasutra.
P.S. Genetic programming in C?!? How are you assuring the viability of the offspring? Doing genetic programming in a non-homoiconic language is a real challenge.
Check out MetaOptimize.com for your stacky needs.
As Ray, says, it's mostly up to you but typically in a steady-state setup you would only create a single offspring.
Again you have options. I wouldn't replace the parents. If they've been picked as parents based on their fitness you could be eliminating some of the fittest members of the population. Easiest is just to randomly pick an individual to be replaced. Alternatively, you could replace the least fit individual, but that can lead to premature convergence. Another option is to use the same selection strategy that you use to choose parents but use the inverse fitness so that it favours less fit individuals.
You could try comp.ai.genetic on USENET (and Google Groups).
It sounds like some of your questions are not necessarily specific to genetic programming; if that's true, you might have some luck asking the folks over at the NEAT Users Group.
They primarily discuss the Neuroevolution of Augmenting Topologies (or NEAT) algorithm, which is a genetic algorithm used to evolve neural networks. But topics like elitism and crossover strategies are pretty general, and can apply to both GA and GP algorithms.
Otherwise, as Dan and Ray have said, a lot of these decisions are made after experimentation with one's particular software and domain. Try applying your algorithm to different problems and pay attention to how it behaves -- after a while, you'll probably develop an intuition for what works and what doesn't.
I would create an unlimited number of offspring, but only on the basis of success, and let older members of the population die. Lack of fitness can also lead to early death. This just seems to follow a natural order.
Q1. In GP, as opposed to GA, when you perform crossover you select two parents but
do you create one child or two, or is that a free choice you have?
Yes its your choice; but generally, its not advisable to create many individuals with the same parents, because the difference among the individual's trends created by the same parents would be very limited and that could cost processing speed and memory which could have been spent on other individuals showing different trends and behaviors that requires analysis (but creating more individuals cannot be a problem if the evolution process is close to reaching its endpoint).
Q2. In steady state GP...
It is advisable to replace individuals based on the ranking provided by the fitness function you have adopted.