Let's say I have a set of training examples where A_i is an attribute and the outcome is binary (yes or no):
A1, A2, A3, Outcome
red dark large yes
green dark small yes
orange bright large no
I know I have to define the fitness function. But what is it for this problem? In my actual problem there are 10 parameters and 100 training examples but this is a similar problem.
I think the confusion here is coming from the fact that usually fitness functions give you back some scalar, sometimes on a discrete scale, but never a binary yes/no (or true/false). In this sense, this looks more like a 'classification' problem to be solved with neural nets (or possibly bayesian logic). Said so, you could certainly devise a GA to evolve whatever kind of classifier, and the fitness function would basically be expressed in terms of correct classifications over total evaluations.
Another pure GA approach to this - probably more relevant to the question - is to encode the whole classification rule set as a given individual for the genetic algorithm. In this sense, the fitness function could be expressed as a scalar representing how many yes/no classifications the given candidate solution at hand gets right over the total, and so forth. A similar approach can be found in this paper Using Real-Valued Genetic: Algorithms to Evolve R,de Sets for
Classification.
Example (one of the possible ways to encode this):
A1, A2, A3, Outcome
red dark large yes
green dark small yes
orange bright large no
Encoding: red = 000, dark = 001, large = 010, green = 011, small = 100, orange = 101, bright = 111, etc.
Outcome: yes = 1, no = 0
Chromosome:
A1, A2, A3, Outcome
000 001 010 1
011 001 100 1
101 111 010 0
All of the above gets translated into a candidate solution as:
000001010-1/011001100-1/101111010-0
You will generate a random bunch of these and evolve them whichever way you like by testing fitness (correct classification/ total classifications in the ruleset) of the entire rule set (be careful picking your cross-over strategy here!).
I also suggest you listen to a binary solo, to get you in the mood.
NOTE: I highly doubt this would work with a rule-set composed by only 3 rules, not enough breadth for the GA.
Related
I have created a Gomoku(5 in a row) AI using Alpha-Beta Pruning. It makes moves on a not-so-stupid level. First, let me vaguely describe the grading function of the Alpha-Beta algorithm.
When it receives a board as an input, it first finds all repetitions of stones and gives it a score out of 4 possible values depending on its usefulness as an threat, which is decided by length. And it will return the summation of all the repetition scores.
But, the problem is that I explicitly decided the scores(4 in total), and they don't seem like the best choices. So I've decided to implement a genetic algorithm to generate these scores. Each of the genes will be one of 4 scores. So for example, the chromosome of the hard-coded scores would be: [5, 40000,10000000,50000]
However, because I'm using the genetic algorithm to create the scores of the grading function, I'm not sure how I should implement the genetic fitness function. So instead, I have thought of the following:
Instead of using a fitness function, I'll just merge the selection process together: If I have 2 chromosomes, A and B, and need to select one, I'll simulate a game using both A and B chromosomes in each AI, and select the chromosome which wins.
1.Is this a viable replacement to the Fitness function?
2.Because of the characteristics of the Alpha-Beta algorithm, I need to give the max score to the win condition, which in most cases is set to infinity. However, because I can't use Infinity, I just used an absurdly large number. Do I also need to add this score to the chromosome? Or because it's insignificant and doesn't change the values of the grading function, leave it as a constant?
3.When initially creating chromosomes, random generation, following standard distribution is said to be the most optimal. However, genes in my case have large deviation. Would it still be okay to generate chromosomes randomly?
Is this a viable replacement to the Fitness function?
Yes, it is. It's a fairly common way to define a fitness function for board games. Probably a single round is not enough (but you have to experiment).
A slight variant is something like:
double fitness(Agent_k)
fit = 0
repeat M times
randomly extract an individual Agent_i (i <> k)
switch (result of Agent_k vs Agent_i)
case Agent_k wins: fit = fit + 1
case Agent_i wins: fit = fit - 2
case draw: fit doesn't change
return fit
i.e. an agent plays against M randomly selected opponents from the population (with replacement but avoiding self match).
Increasing M the noise decreases but longer simulation times are required (M=5 is a value used in some chess-related experiments).
2.Because of the characteristics of the Alpha-Beta algorithm...
Not sure of the question. A very large value is a standard approach for a static evaluation function signaling a winning condition.
The exact value isn't very important and shouldn't probably be subject to optimization.
3.When initially creating chromosomes, random generation, following standard distribution is said to be the most optimal. However, genes in my case have large deviation. Would it still be okay to generate chromosomes randomly?
This is somewhat related to the specific genetic algorithm "flavor" you are going to use.
A standard genetic algorithm could work better with not completely random initial values.
Other variants (e.g. Differential Evolution) could be less sensitive to this aspect.
Take also a look at this question / answer: Getting started with machine learning a zero sum game?
I have just learned the basic (more of an introduction) of genetic algorithm. For an assignment, we are to find the value of x that maximizes f(x) = sin (x*pi/ 256) in the interval 0 <= x <= 256.
While I understand how to get the fitness of an individual and how to normalize the fitness, I am a little lost on generating the population. In the text, for the purposes of performing crossover and mutation, represent each individual using 8 bits. Example:
189 = 10111101
35 = 00100011
My questions are this:
Using c, what is the best way to generate the population? I have looked it up and all I could find was using uint8_t. I'm thinking of generating it as an array and then find a way to convert to it's integer representation.
What purposes does normalizing fitness serves?
As this is my first time at writing a program that uses genetic algorithm, is there any advice I should keep in mind?
Thank you for your time.
The usual way is the population to be random, but if you have some preliminary optimization you can form population around results already available.
It is very common to use hybrid algorithms when GA is mixed with altoritmsh like PSO, simulated annealing and so on.
I would like to use a genetic program (gp) to estimate the probability of an 'outcome' from an 'event'. To train the nn I am using a genetic algorithm.
So, in my database I have many events, with each event containing many possible outcomes.
I will give the gp a set of input variables that relate to each outcome in each event.
My questions is - what should the fitness function be in the gp be ????
For instance, right now I am giving the gp a set of input data (outcome input variables), and a set of target data (1 if outcome DID occur, 0 if outcome DIDN'T occur, with the fitness function being the mean squared error of the outputs and targets). I then take the sum of each output for each outcome, and divide each output by the sum (to give the probability). However, I know for sure that this is not the right way to be doing this.
For clarity, this is how I am CURRENTLY doing this:
I would like to estimate the probability of 5 different outcomes occurring in an event:
Outcome 1 - inputs = [0.1, 0.2, 0.1, 0.4]
Outcome 1 - inputs = [0.1, 0.3, 0.1, 0.3]
Outcome 1 - inputs = [0.5, 0.6, 0.2, 0.1]
Outcome 1 - inputs = [0.9, 0.2, 0.1, 0.3]
Outcome 1 - inputs = [0.9, 0.2, 0.9, 0.2]
I will then calculate the gp output for each input:
Outcome 1 - output = 0.1
Outcome 1 - output = 0.7
Outcome 1 - output = 0.2
Outcome 1 - output = 0.4
Outcome 1 - output = 0.4
The sum of the outputs for each outcome in this event would be: 1.80. I would then calculate the 'probability' of each outcome by dividing the output by the sum:
Outcome 1 - p = 0.055
Outcome 1 - p = 0.388
Outcome 1 - p = 0.111
Outcome 1 - p = 0.222
Outcome 1 - p = 0.222
Before you start - I know that these aren't real probabilities, and that this approach does not work !! I just put this here to help you understand what I am trying to achieve.
Can anyone give me some pointers on how I can estimate the probability of each outcome ? (also, please note my maths is not great)
Many thanks
I understand the first part of your question: What you described is a classification problem. You're learning if your inputs relate to whether an outcome was observed (1) or not (0).
There are difficulties with the second part though. If I understand you correctly you take the raw GP output for a certain row of inputs (e.g. 0.7) and treat it as a probability. You said this doesn't work, obviously. In GP you can do classification by introducing a threshold value that splits your classes. If it's bigger than say 0.3 the outcome should be 1 if it's smaller it should be 0. This threshold isn't necessarily 0.5 (again it's just a number, not a probability).
I think if you want to obtain a probability you should attempt to learn multiple models that all explain your classification problem well. I don't expect you have a perfect model that explains your data perfectly, respectively if you have you wouldn't want a probability anyway. You can bag these models together (create an ensemble) and for each outcome you can observe how many models predicted 1 and how many models predicted 0. The amount of models that predicted 1 divided by the number of models could then be interpreted as a probability that this outcome will be observed. If the models are all equally good then you can forget weighing between them, if they're different in quality of course you could factor these into your decision. Models with less quality on their training set are less likely to contribute to a good estimate.
So in summary you should attempt to apply GP e.g. 10 times and then use all 10 models on the training set to calculate their estimate (0 or 1). However, don't force yourself to GP only, there are many classification algorithms that can give good results.
As a sidenote, I'm part of the development team of a software called HeuristicLab which runs under Windows and with which you can run GP and create such ensembles. The software is open source.
AI is all about complex algorithms. Think about it, the downside is very often, that these algorithms become black boxes. So the counterside to algoritms, such as NN and GA, are they are inherently opaque. That is what you want if you want to have a car driving itself. On the other hand this means, that you need tools to look into the black box.
What I'm saying is that GA is probably not what you want to solve your problem. If you want to solve AI types of problems, you first have to know how to use standard techniques, such as regression, LDA etc.
So, combining NN and GA is usually a bad sign, because you are stacking one black box on another. I believe this is bad design. An NN and GA are nothing else than non-linear optimizers. I would suggest to you to look at principal component analysis (PDA), SVD and linear classifiers first (see wikipedia). If you figure out to solve simple statistical problems move on to more complex ones. Check out the great textbook by Russell/Norvig, read some of their source code.
To answer the questions one really has to look at the dataset extensively. If you are working on a small problem, define the probabilities etc., and you might get an answer here. Perhaps check out Bayesian statistics as well. This will get you started I believe.
I once wrote a Tetris AI that played Tetris quite well. The algorithm I used (described in this paper) is a two-step process.
In the first step, the programmer decides to track inputs that are "interesting" to the problem. In Tetris we might be interested in tracking how many gaps there are in a row because minimizing gaps could help place future pieces more easily. Another might be the average column height because it may be a bad idea to take risks if you're about to lose.
The second step is determining weights associated with each input. This is the part where I used a genetic algorithm. Any learning algorithm will do here, as long as the weights are adjusted over time based on the results. The idea is to let the computer decide how the input relates to the solution.
Using these inputs and their weights we can determine the value of taking any action. For example, if putting the straight line shape all the way in the right column will eliminate the gaps of 4 different rows, then this action could get a very high score if its weight is high. Likewise, laying it flat on top might actually cause gaps and so that action gets a low score.
I've always wondered if there's a way to apply a learning algorithm to the first step, where we find "interesting" potential inputs. It seems possible to write an algorithm where the computer first learns what inputs might be useful, then applies learning to weigh those inputs. Has anything been done like this before? Is it already being used in any AI applications?
In neural networks, you can select 'interesting' potential inputs by finding the ones that have the strongest correlation, positive or negative, with the classifications you're training for. I imagine you can do similarly in other contexts.
I think I might approach the problem you're describing by feeding more primitive data to a learning algorithm. For instance, a tetris game state may be described by the list of occupied cells. A string of bits describing this information would be a suitable input to that stage of the learning algorithm. actually training on that is still challenging; how do you know whether those are useful results. I suppose you could roll the whole algorithm into a single blob, where the algorithm is fed with the successive states of play and the output would just be the block placements, with higher scoring algorithms selected for future generations.
Another choice might be to use a large corpus of plays from other sources; such as recorded plays from human players or a hand-crafted ai, and select the algorithms who's outputs bear a strong correlation to some interesting fact or another from the future play, such as the score earned over the next 10 moves.
Yes, there is a way.
If you choose M selected features there are 2^M subsets, so there is a lot to look at.
I would to the following:
For each subset S
run your code to optimize the weights W
save S and the corresponding W
Then for each pair S-W, you can run G games for each pair and save the score L for each one. Now you have a table like this:
feature1 feature2 feature3 featureM subset_code game_number scoreL
1 0 1 1 S1 1 10500
1 0 1 1 S1 2 6230
...
0 1 1 0 S2 G + 1 30120
0 1 1 0 S2 G + 2 25900
Now you can run some component selection algorithm (PCA for example) and decide which features are worth to explain scoreL.
A tip: When running the code to optimize W, seed the random number generator, so that each different 'evolving brain' is tested against the same piece sequence.
I hope it helps in something!
I'm working with a couple of AI algorithms at school and I find people use the words Fuzzy Logic to explain any situation that they can solve with a couple of cases. When I go back to the books I just read about how instead of a state going from On to Off it's a diagonal line and something can be in both states but in different "levels".
I've read the wikipedia entry and a couple of tutorials and even programmed stuff that "uses fuzzy logic" (an edge detector and a 1-wheel self-controlled robot) and still I find it very confusing going from Theory to Code... for you, in the less complicated definition, what is fuzzy logic?
Fuzzy logic is logic where state membership is, essentially, a float with range 0..1 instead of an int 0 or 1. The mileage you get out of it is that things like, for example, the changes you make in a control system are somewhat naturally more fine-tuned than what you'd get with naive binary logic.
An example might be logic that throttles back system activity based on active TCP connections. Say you define "a little bit too many" TCP connections on your machine as 1000 and "a lot too many" as 2000. At any given time, your system has a "too many TCP connections" state from 0 (<= 1000) to 1 (>= 2000), which you can use as a coefficient in applying whatever throttling mechanisms you have available. This is much more forgiving and responsive to system behavior than naive binary logic that only knows how to determine "too many", and throttle completely, or "not too many", and not throttle at all.
I'd like to add to the answers (that have been modded up) that, a good way to visualize fuzzy logic is follows:
Traditionally, with binary logic you would have a graph whose membership function is true or false whereas in a fuzzy logic system, the membership function is not.
1|
| /\
| / \
| / \
0|/ \
------------
a b c d
Assume for a second that the function is "likes peanuts"
a. kinda likes peanuts
b. really likes peanuts
c. kinda likes peanuts
d. doesn't like peanuts
The function itself doesn't have to be triangular and often isn't (it's just easier with ascii art).
A fuzzy system will likely have many of these, some even overlapping (even opposites) like so:
1| A B
| /\ /\ A = Likes Peanuts
| / \/ \ B = Doesn't Like Peanuts
| / /\ \
0|/ / \ \
------------
a b c d
so now c is "kind likes peanuts, kinda doesn't like peanuts" and d is "really doesn't like peanuts"
And you can program accordingly based on that info.
Hope this helps for the visual learners out there.
The best definition of fuzzy logic is given by its inventor Lotfi Zadeh:
“Fuzzy logic means of representing problems to computers in a way akin to the way human solve them and the essence of fuzzy logic is that everything is a matter of degree.”
The meaning of solving problems with computers akin to the way human solve can easily be explained with a simple example from a basketball game; if a player wants to guard another player firstly he should consider how tall he is and how his playing skills are. Simply if the player that he wants to guard is tall and plays very slow relative to him then he will use his instinct to determine to consider if he should guard that player as there is an uncertainty for him. In this example the important point is the properties are relative to the player and there is a degree for the height and playing skill for the rival player. Fuzzy logic provides a deterministic way for this uncertain situation.
There are some steps to process the fuzzy logic (Figure-1). These steps are; firstly fuzzification where crisp inputs get converted to fuzzy inputs secondly these inputs get processed with fuzzy rules to create fuzzy output and lastly defuzzification which results with degree of result as in fuzzy logic there can be more than one result with different degrees.
Figure 1 – Fuzzy Process Steps (David M. Bourg P.192)
To exemplify the fuzzy process steps, the previous basketball game situation could be used. As mentioned in the example the rival player is tall with 1.87 meters which is quite tall relative to our player and can dribble with 3 m/s which is slow relative to our player. Addition to these data some rules are needed to consider which are called fuzzy rules such as;
if player is short but not fast then guard,
if player is fast but not short then don’t guard
If player is tall then don’t guard
If player is average tall and average fast guard
Figure 2 – how tall
Figure 3- how fast
According to the rules and the input data an output will be created by fuzzy system such as; the degree for guard is 0.7, degree for sometimes guard is 0.4 and never guard is 0.2.
Figure 4-output fuzzy sets
On the last step, defuzzication, is using for creating a crisp output which is a number which may determine the energy that we should use to guard the player during game. The centre of mass is a common method to create the output. On this phase the weights to calculate the mean point is totally depends on the implementation. On this application it is considered to give high weight to guard or not guard but low weight given to sometimes guard. (David M. Bourg, 2004)
Figure 5- fuzzy output (David M. Bourg P.204)
Output = [0.7 * (-10) + 0.4 * 1 + 0.2 * 10] / (0.7 + 0.4 + 0.2) ≈ -3.5
As a result fuzzy logic is using under uncertainty to make a decision and to find out the degree of decision. The problem of fuzzy logic is as the number of inputs increase the number of rules increase exponential.
For more information and its possible application in a game I wrote a little article check this out
To build off of chaos' answer, a formal logic is nothing but an inductively defined set that maps sentences to a valuation. At least, that's how a model theorist thinks of logic. In the case of a sentential boolean logic:
(basis clause) For all A, v(A) in {0,1}
(iterative) For the following connectives,
v(!A) = 1 - v(A)
v(A & B) = min{v(A), v(B)}
v(A | B) = max{v(A), v(B)}
(closure) All sentences in a boolean sentential logic are evaluated per above.
A fuzzy logic changes would be inductively defined:
(basis clause) For all A, v(A) between [0,1]
(iterative) For the following connectives,
v(!A) = 1 - v(A)
v(A & B) = min{v(A), v(B)}
v(A | B) = max{v(A), v(B)}
(closure) All sentences in a fuzzy sentential logic are evaluated per above.
Notice the only difference in the underlying logic is the permission to evaluate a sentence as having the "truth value" of 0.5. An important question for a fuzzy logic model is the threshold that counts for truth satisfaction. This is to ask: for a valuation v(A), for what value D it is the case the v(A) > D means that A is satisfied.
If you really want to found out more about non-classical logics like fuzzy logic, I would recommend either An Introduction to Non-Classical Logic: From If to Is or Possibilities and Paradox
Putting my coder hat back on, I would be careful with the use of fuzzy logic in real world programming, because of the tendency for a fuzzy logic to be undecidable. Maybe it's too much complexity for little gain. For instance a supervaluational logic may do just fine to help a program model vagueness. Or maybe probability would be good enough. In short, I need to be convinced that the domain model dovetails with a fuzzy logic.
Maybe an example clears up what the benefits can be:
Let's say you want to make a thermostat and you want it to be 24 degrees.
This is how you'd implement it using boolean logic:
Rule1: heat up at full power when
it's colder than 21 degrees.
Rule2:
cool down at full power when it's
warmer than 27 degrees.
Such a system will only once and a while be 24 degrees, and it will be very inefficient.
Now, using fuzzy logic, it would be like something like this:
Rule1: For each degree that it's colder than 24 degrees, turn up the heater one notch (0 at 24).
Rule2: For each degree that it's warmer than 24 degress, turn up the cooler one notch (0 at 24).
This system will always be somewhere around 24 degrees, and it only once and will only once and a while make a tiny adjustment. It will also be more energy-efficient.
Well, you could read the works of Bart Kosko, one of the 'founding fathers'. 'Fuzzy Thinking: The New Science of Fuzzy Logic' from 1994 is readable (and available quite cheaply secondhand via Amazon). Apparently, he has a newer book 'Noise' from 2006 which is also quite approachable.
Basically though (in my paraphrase - not having read the first of those books for several years now), fuzzy logic is about how to deal with the world where something is perhaps 10% cool, 50% warm, and 10% hot, where different decisions may be made on the degree to which the different states are true (and no, it wasn't entirely an accident that those percentages don't add up to 100% - though I'd accept correction if needed).
A very good explanation, with a help of Fuzzy Logic Washing Machines.
I know what you mean about it being difficult to go from concept to code. I'm writing a scoring system that looks at the values of sysinfo and /proc on Linux systems and comes up with a number between 0 and 10, 10 being the absolute worst. A simple example:
You have 3 load averages (1, 5, 15 minute) with (at least) three possible states, good, getting bad, bad. Expanding that, you could have six possible states per average, adding 'about to' to the three that I just noted. Yet, the result of all 18 possibilities can only deduct 1 from the score. Repeat that with swap consumed, actual VM allocated (committed) memory and other stuff .. and you have one big bowl of conditional spaghetti :)
Its as much a definition as it is an art, how you implement the decision making process is always more interesting than the paradigm itself .. whereas in a boolean world, its rather cut and dry.
It would be very easy for me to say if load1 < 2 deduct 1, but not very accurate at all.
If you can teach a program to do what you would do when evaluating some set of circumstances and keep the code readable, you have implemented a good example of fuzzy logic.
Fuzzy Logic is a problem-solving methodology that lends itself to implementation in systems ranging from simple, small, embedded micro-controllers to large, networked, multi-channel PC or workstation-based data acquisition and control systems. It can be implemented in hardware, software, or a combination of both. Fuzzy Logic provides a simple way to arrive at a definite conclusion based upon vague, ambiguous, imprecise, noisy, or missing input information. Fuzzy Logic approach to control problems mimics how a person would make decisions, only much faster.
Fuzzy logic has proved to be particularly useful in expert system and other artificial intelligence applications. It is also used in some spell checkers to suggest a list of probable words to replace a misspelled one.
To learn more, just check out: http://en.wikipedia.org/wiki/Fuzzy_logic.
The following is sort of an empirical answer.
A simple (possibly simplistic answer) is that "fuzzy logic" is any logic that returns values other than straight true / false, or 1 / 0. There are a lot of variations on this and they tend to be highly domain specific.
For example, in my previous life I did search engines that used "content similarity searching" as opposed to then common "boolean search". Our similarity system used the Cosine Coefficient of weighted-attribute vectors representing the query and the documents and produced values in the range 0..1. Users would supply "relevance feedback" which was used to shift the query vector in the direction of desirable documents. This is somewhat related to the training done in certain AI systems where the logic gets "rewarded" or "punished" for results of trial runs.
Right now Netflix is running a competition to find a better suggestion algorithm for their company. See http://www.netflixprize.com/. Effectively all of the algorithms could be characterized as "fuzzy logic"
Fuzzy logic is calculating algorithm based on human like way of thinking. It is particularly useful when there is a large number of input variables. One online fuzzy logic calculator for two variables input is given:
http://www.cirvirlab.com/simulation/fuzzy_logic_calculator.php