I'm current using the expectiminimax algorithm, which is working great in my current situation:
max -> min -> chance -> max -> min -> chance -> (repeat)
I cannot in any way do
max/min -> Chance -> (repeat)
due to way the game works.
I feel as though the alpha is going to be inaccurate if I proceed to convert my algorithm over.
Is there any side effects to implementing pruning (apart from the horizon effect) with my current set up or am I just overthinking this?
I'm not sure which game you are trying to implement, but it does sound interesting. It's really hard to come up with a good answer when we know nothing about the branching factor of any of the nodes, or how the game proceeds. I think this is really depending on your game.
I've tried different pruning methods in the game of backgammon, and based on this experience I really think the result of your search algorithm depends on how much the chance node influence the game expectation compared to how much the min and max nodes changes the expectation.
If the chance node can change the expectation of the game drastically (high variance for each dice roll) but the five-six moves to choose among does not change the situation much, then I don't think you should worry to much.
However if it is the opposite, the chance nodes are basically just driving the game forward without influencing the game result expectation, and the game moves (or actions) really is important, I think you can gain a lot on getting a good search algorithm.
Check also *-minimax algorithm. (Bruce Ballard 1983)
The same things applies for Expectiminimax and *-minimax as normal minimax. Try the assumed best moves first (from some heuristics) to create cutoffs, but you should also try to heuristic order the chance outcomes in the chance nodes.
This is actually really interesting when you start testing this, but the only way to get good answers is to get you fingers dirty and try it out.
Good luck!
It appears that pruning still holds in this environment, I had serious doubts but everything has checked out so far. That is to say that all test cases in both the none pruning algorithm and the pruning algorithm provided the same move with the same heuristic just in fewer nodes.
I also found that in my case I would relinquish control of the chance nodes to the user (they can determine an acceptable loss if any for certain events). If the ability has a 95% chance to hit and a 5% chance to miss the user could deem that this 5% is negligible and ignore the miss chance, this does not mean the heuristic is treated as 100% of its value though, it maintains its 95%. I had rare situations where 16 possible outcomes could occur from using 1 ability given the environment and I wanted flexibility for such a situation.
As for the game, it would be similar to Pokémon in terms of play style at least.
Related
In backpropagation training, during gradient descent down the error surface, network with large amount of neurons in hidden layer can get stuck in local minimum. I have read that reinitializing weights to random numbers in all cases will eventually avoid this problem.
This means that there always IS a set of "correct" initial weight values. (Is this safe to assume?)
I need to find or make an algorithm that finds them.
I have tried googling the algorithm, tried devising it myself but to no avail. Can anyone propose a solution? Perhaps a name of algorithm that I can search for?
Note: this is a regular feed-forward 3-layer burrito :)
Note: I know attempts have been made to use GAs for that purpose, but that requires re-training the network on each iteration which is time costly when it gets large enough.
Thanks in advance.
There is never a guarantee that you will not get stuck in a local optimum, sadly. Unless you can prove certain properties about the function you are trying to optimize, local optima exist and hill-climbing methods will fall prey to them. (And typically, if you can prove the things you need to prove, you can also select a better tool than a neural network.)
One classic technique is to gradually reduce the learning rate, then increase it and slowly draw it down, again, several times. Raising the learning rate reduces the stability of the algorithm, but gives the algorithm the ability to jump out of a local optimum. This is closely related to simulated annealing.
I am surprised that Google has not helped you, here, as this is a topic with many published papers: Try terms like, "local minima" and "local minima problem" in conjunction with neural networks and backpropagation. You should see many references to improved backprop methods.
I think I understand the basic concept of simulated annealing. It's basically adding random solutions to cover a better area of the search space at the beginning then slowly reducing the randomness as the algorithm continues running.
I'm a little confused on how I would implement this into my genetic algorithm.
Can anyone give me a simple explanation of what I need to do and clarify that my understand of how simulated annealing works is correct?
When constructing a new generation of individuals in a genetic algorithm, there are three random aspects to it:
Matching parent individuals to parent individuals, with preference according to their proportional fitness,
Choosing the crossover point, and,
Mutating the offspring.
There's not much you can do about the second one, since that's typically a uniform random distribution. You could conceivably try to add some random factor to the roulette wheel as you're selecting your parent individuals, and then slowly decrease that random function. But that goes against the spirit of the genetic algorithm and (more importantly) I don't think it will do much good. I think it would hurt, actually.
That leaves the third factor-- change the mutation rate from high mutation to low mutation as the generations go by.
It's really not any more complicated than that.
For my bachelor's thesis I want to write a genetic algorithm that learns to play the game of Stratego (if you don't know this game, it's probably safe to assume I said chess). I haven't ever before done actual AI projects, so it's an eye-opener to see how little I actually know of implementing things.
The thing I'm stuck with is coming up with a good representation for an actual strategy. I'm probably making some thinking error, but some problems I encounter:
I don't assume you would have a representation containing a lot of
transitions between board positions, since that would just be
bruteforcing it, right?
What could branches of a decision tree look
like? Any representation I come up with don't have interchangeable
branches... If I were to use a bit string, which is apparently also
common, what would the bits represent?
Do I assign scores to the distance between certain pieces? How would I represent that?
I think I ought to know these things after three+ years of study, so I feel pretty stupid - this must look likeI have no clue at all. Still, any help or tips on what to Google would be appreciated!
I think, you could define a decision model and then try to optimize the parameters of that model. You can create multi-stage decision models also. I once did something similar for solving a dynamic dial-a-ride problem (paper here) by modeling it as a two stage linear decision problem. To give you an example, you could:
For each of your figures decide which one is to move next. Each figure is characterized by certain features derived from its position on the board, e.g. ability to make a score, danger, protecting x other figures, and so on. Each of these features can be combined (e.g. in a linear model, through a neural network, through a symbolic expression tree, a decision tree, ...) and give you a rank on which figure to act next with.
Acting with the figure you selected. Again there are a certain number of actions that can be taken, each has certain features. Again you can combine and rank them and one action will have the highest priority. This is the one you choose to perform.
The features you extract can be very simple or insanely complex, it's up to what you think will work best vs what takes how long to compute.
To evaluate and improve the quality of your decision model you can then simulate these decisions in several games against opponents and train the parameters of the model that combines these features to rank the moves (e.g. using a GA). This way you tune the model to win as many games as possible against the specified opponents. You can test the generality of that model by playing against opponents it has not seen before.
As Mathew Hall just said, you can use GP for this (if your model is a complex rule), but this is just one kind of model. In my case a linear combination of the weights did very well.
Btw, if you're interested we've also got a software on heuristic optimization which provides you with GA, GP and that stuff. It's called HeuristicLab. It's GPL and open source, but comes with a GUI (Windows). We've some Howto on how to evaluate the fitness function in an external program (data exchange using protocol buffers), so you can work on your simulation and your decision model and let the algorithms present in HeuristicLab optimize your parameters.
Vincent,
First, don't feel stupid. You've been (I infer) studying basic computer science for three years; now you're applying those basic techniques to something pretty specialized-- a particular application (Stratego) in a narrow field (artificial intelligence.)
Second, make sure your advisor fully understands the rules of Stratego. Stratego is played on a larger board, with more pieces (and more types of pieces) than chess. This gives it a vastly larger space of legal positions, and a vastly larger space of legal moves. It is also a game of hidden information, increasing the difficulty yet again. Your advisor may want to limit the scope of the project, e.g., concentrate on a variant with full observation. I don't know why you think this is simpler, except that the moves of the pieces are a little simpler.
Third, I think the right thing to do at first is to take a look at how games in general are handled in the field of AI. Russell and Norvig, chapters 3 (for general background) and 5 (for two player games) are pretty accessible and well-written. You'll see two basic ideas: One, that you're basically performing a huge search in a tree looking for a win, and two, that for any non-trivial game, the trees are too large, so you search to a certain depth and then cop out with a "board evaluation function" and look for one of those. I think your third bullet point is in this vein.
The board evaluation function is the magic, and probably a good candidate for using either a genetic algorithm, or a genetic program, either of which might be used in conjunction with a neural network. The basic idea is that you are trying to design (or evolve, actually) a function that takes as input a board position, and outputs a single number. Large numbers correspond to strong positions, and small numbers to weak positions. There is a famous paper by Chellapilla and Fogel showing how to do this for a game of Checkers:
http://library.natural-selection.com/Library/1999/Evolving_NN_Checkers.pdf
I think that's a great paper, tying three great strands of AI together: Adversarial search, genetic algorithms, and neural networks. It should give you some inspiration about how to represent your board, how to think about board evaluations, etc.
Be warned, though, that what you're trying to do is substantially more complex than Chellapilla and Fogel's work. That's okay-- it's 13 years later, after all, and you'll be at this for a while. You're still going to have a problem representing the board, because the AI player has imperfect knowledge of its opponent's state; initially, nothing is known but positions, but eventually as pieces are eliminated in conflict, one can start using First Order Logic or related techniques to start narrowing down individual pieces, and possibly even probabilistic methods to infer information about the whole set. (Some of these may be beyond the scope of an undergrad project.)
The fact you are having problems coming up with a representation for an actual strategy is not that surprising. In fact I would argue that it is the most challenging part of what you are attempting. Unfortunately, I haven't heard of Stratego so being a bit lazy I am going to assume you said chess.
The trouble is that a chess strategy is rather a complex thing. You suggest in your answer containing lots of transitions between board positions in the GA, but a chess board has more possible positions than the number of atoms in the universe this is clearly not going to work very well. What you will likely need to do is encode in the GA a series of weights/parameters that are attached to something that takes in the board position and fires out a move, I believe this is what you are hinting at in your second suggestion.
Probably the simplest suggestion would be to use some sort of generic function approximation like a neural network; Perceptrons or Radial Basis Functions are two possibilities. You can encode weights for the various nodes into the GA, although there are other fairly sound ways to train a neural network, see Backpropagation. You could perhaps encode the network structure instead/as well, this also has the advantage that I am pretty sure a fair amount of research has been done into developing neural networks with a genetic algorithm so you wouldn't be starting completely from scratch.
You still need to come up with how you are going to present the board to the neural network and interpret the result from it. Especially, with chess you would have to take note that a lot of moves will be illegal. It would be very beneficial if you could encode the board and interpret the result such that only legal moves are presented. I would suggest implementing the mechanics of the system and then playing around with different board representations to see what gives good results. A few ideas top of the head ideas to get you started could be, although I am not really convinced any of them are especially great ways to do this:
A bit string with all 64 squares one after another with a number presenting what is present in each square. Most obvious, but probably a rather bad representation as a lot of work will be required to filter out illegal moves.
A bit string with all 64 squares one after another with a number presenting what can move to each square. This has the advantage of embodying the covering concept of chess where you what to gain as much coverage of the board with your pieces as possible, but still has problems with illegal moves and dealing with friendly/enemy pieces.
A bit string with all 32 pieces one after another with a number presenting the location of that piece in each square.
In general though I would suggest that chess is rather a complex game to start with, I think it will be rather hard to get something playing to standard which is noticeably better than random. I don't know if Stratego is any simpler, but I would strongly suggest you opt for a fairly simple game. This will let you focus on getting the mechanics of the implementation correct and the representation of the game state.
Anyway hope that is of some help to you.
EDIT: As a quick addition it is worth looking into how standard chess AI's work, I believe most use some sort of Minimax system.
When you say "tactic", do you mean you want the GA to give you a general algorithm to play the game (i.e. evolve an AI) or do you want the game to use a GA to search the space of possible moves to generate a move at each turn?
If you want to do the former, then look into using Genetic programming (GP). You could try to use it to produce the best AI you can for a fixed tree size. JGAP already comes with support for GP as well. See the JGAP Robocode example for an instance of this. This approach does mean you need a domain specific language for a Stratego AI, so you'll need to think carefully how you expose the board and pieces to it.
Using GP means your fitness function can just be how well the AI does at a fixed number of pre-programmed games, but that requires a good AI player to start with (or a very patient human).
#DonAndre's answer is absolutely correct for movement. In general, problems involving state-based decisions are hard to model with GAs, requiring some form of GP (either explicit or, as #DonAndre suggested, trees that are essentially declarative programs).
A general Stratego player seems to me quite challenging, but if you have a reasonable Stratego playing program, "Setting up your Stratego board" would be an excellent GA problem. The initial positions of your pieces would be the phenotype and the outcome of the external Stratego-playing code would be the fitness. It is intuitively likely that random setups would be disadvantaged versus setups that have a few "good ideas" and that small "good ideas" could be combined into fitter-and-fitter setups.
...
On the general problem of what a decision tree, even trying to come up with a simple example, I kept finding it hard to come up with a small enough example, but maybe in the case where you are evaluation whether to attack a same-ranked piece (which, IIRC destroys both you and the other piece?):
double locationNeed = aVeryComplexDecisionTree();
if(thatRank == thisRank){
double sacrificeWillingness = SACRIFICE_GENETIC_BASE; //Assume range 0.0 - 1.0
double sacrificeNeed = anotherComplexTree(); //0.0 - 1.0
double sacrificeInContext = sacrificeNeed * SACRIFICE_NEED_GENETIC_DISCOUNT; //0.0 - 1.0
if(sacrificeInContext > sacrificeNeed){
...OK, this piece is "willing" to sacrifice itself
One way or the other, the basic idea is that you'd still have a lot of coding of Stratego-play, you'd just be seeking places where you could insert parameters that would change the outcome. Here I had the idea of a "base" disposition to sacrifice itself (presumably higher in common pieces) and a "discount" genetically-determined parameter that would weight whether the piece would "accept or reject" the need for a sacrifice.
I remember when I was in college we went over some problem where there was a smart agent that was on a grid of squares and it had to clean the squares. It was awarded points for cleaning. It also was deducted points for moving. It had to refuel every now and then and at the end it got a final score based on how many squares on the grid were dirty or clean.
I'm trying to study that problem since it was very interesting when I saw it in college, however I cannot find anything on wikipedia or anywhere online. Is there a specific name for that problem that you know about? Or maybe it was just something my teacher came up with for the class.
I'm searching for AI cleaning agent and similar things, but I don't find anything. I don't know, I'm thinking maybe it has some other name.
If you know where I can find more information about this problem I would appreciate it. Thanks.
Perhaps a "stigmergy" approach is closely related to your problem. There is a starting point here, and you can find something by searching for "dead ants" and "robots" on google scholar.
Basically: instead of modelling a precise strategy you work toward a probabilistic approach. Ants (probably) collect their deads by piling up according to a simple rule such as "if there is a pile of dead ants there, I bring this corpse hither; otherwise, I'll make a new pile". You can start by simplifying your 'cleaning' situation with that, and see where you go.
Also, I think (another?) suitable approach could be modelled with a Genetic Algorithm using a carefully chosen combination of fitness functions such as:
the end number of 'clean' tiles
the number of steps made by the robot
of course if the robots 'dies' out of starvation it automatically removes itself from the gene pool, a-la darwin awards :)
You could start by modelling a very, very simple genotype that will be 'computed' into a behaviour. Consider using a simple GA such as this one by Inman Harvey, then to each gene assign either a part of the strategy, or a complete behaviour. E.g.: if gene A is turned to 1 then the robot will try to wander randomly; if gene B is also turned to 1, then it will give priority to self-charging unless there are dirty tiles at distance X. Or use floats and model probability. Your mileage may vary but I can assure it will be fun :)
The problem is reminiscent of Shakey, although there's cleaning involved (which is like the Roomba -- a device that can also be programmed to perform these very tasks).
If the "problem space" (or room) is small enough, you can solve for an optimal solution using a simple A*-based search, but likely it won't be, since that won't leave for very interesting problems.
The machine learning approach suggested here using genetic algorithms is an interesting approach. Given the problem domain you would only have one "rule" (a move-to action, since clean could be eliminated by implicitly cleaning any square you move to that is dirty) so your learner would essentially be learning how to move around an environment. The problem there would be to build a learner that would be adaptable to any given floor plan, instead of just becoming proficient at cleaning a very specific space.
Whatever approach you have, I'd also consider doing a further meta-reasoning step if the problem sets are big enough, and use a partition approach to divide the floor up into separate areas and then conquering them one at a time.
Can you use techniques to create data to use "offline"? In that case, I'd even consider creating a "database" of optimal routes to take to clean certain floor spaces (1x1 up to, say, 5x5) that include all possible start and end squares. This is similar to "endgame databases" that game AIs use to effectively "solve" games once they reach a certain depth (c.f. Chinook).
This problem reminds me of this. A similar problem is briefly mentioned in the book Complexity as an example of a genetic algorithm. These versions are simplified though, they don't take into account fuel consumption.
I would just like to know the various AI algorithms or logics used in arcade/strategy games for finding/selecting best target to attack for individual unit.
Because, I had to write an small AI logic, where their will be group of unit were attacked by an various tankers, so i am stuck in getting the better logic or algorithm for selecting an best target for unit to attack onto the tankers.
Data available are:
Tanker position, range, hitpoints, damage.
please anybody know the best suitable algorithm/logic for solving this problem, respond early.
Thanks in advance,
Ramanand.
I'm going to express this in a perspective similar to RPG gamers:
What character would you bring down first in order to strike a crippling blow to the rest of your enemies? It would be common sense to bring down the healers of the party, as they can heal the rest of the team. Once the healers are gone, the team needs to use medicine - which is limited in supply - and once medicine is exhausted, the party is screwed.
Similar logic would apply to the tank program. In your AI, you need to figure out which tanks provide the most strength and support to the user's fleet, and eliminate them first. Don't focus on any other tanks unless they become critical in achieving their goal: Kill the strongest, most useful members of the group first.
So I'm going to break down what I feel is most likely pertains to the attributes of your tanks.
RANGE: Far range tanks can hit from a distance but have weak STRENGTH in their attacks.
TANKER POSITION: Closer tanks are faster tanks, but have less STRENGTH in their attacks. Also low HITPOINTS because they're meant for SPEED, and not for DAMAGE.
TANKER HP: Higher HP means a slower-moving tank, as they're stronger. But they won't be close to the front lines.
DAMAGE: Higher DAMAGE means a STRONGER tank with lots of HP, but SLOWER as well to move.
So if I were you, I'd focus first on the tanks that have the highest HP/strongest attacks, followed by the closest ones, and then worry about the ranged tanks - you can't do anything to them anyway until they move into your attack radius :P
And the algorithm would be pretty simple. if you have a list of tanks in a party, create a custom sort for them (using CompareTo) and sort the tanks by class with the highest possible HP to the top of the list, followed by tanks with their focus being speed, and then range.
And then go through each item in the list. If it is possible to attack Tank(0), attack. If not, go to Tank(1).
The goal is to attack only one opponent at a time and receive fire from at most one enemy at a time (though, preferably, none).
Ideally, you would attack the tanks by remaining behind cover and flanking them with surprise attacks. This allows you to destroy the tanks one at a time, while receiving no or little fire.
If you don't have cover, then you should use the enemy as cover. Move into a position that puts the enemy behind the enemy. This also improves your chance to hit.
You can also use range to reduce fire from multiple enemies. Retreat until you are only within range of one enemy.
If the enemies can all fire on you, you want to attack one target until it is no longer a threat, then move on to the next target. The goal is to reduce the amount of fire that you receive as quickly as possible.
If more than one enemy can fire on you at the same time, and you can choose your target, you should fire at the one that allows you to reduce the most amount of damage for the least cost. Simply divide the hit points by the damage, and attack the one with the smallest result. You should also figure in any other relevant stats. Range probably affects you and the enemy equally, but considering the ability to maneuver out of the way of fire, closer enemies are more harmful and should be given some weight in the calculation.
If moving decreases the likelihood of being hit, then you should keep moving, typically by circling your opponent to stay at their flank.
Team tactics would mostly include flanking and diversions.
What's the ammo situation, and is it possible to miss a stationary target?
Based on your comments it sounds like you already have some adhoc set of rules or heuristics to give you something around 70% success based on your own measures, and you want to optimize this further to get a higher win rate.
As a general solution method I would use a hill-climbing algorithm. Since I don't know the details of your current algorithm that is responsible for the 70% success rate, I can only describe in abstract terms how to adapt hill-climbing to optimize your algorithm.
The general principle of hill-climbing is as follows. Hopefully, a small change in some numeric parameter of your current algorithm would be responsible for a small (hopefully linear) change in the resulting success rate. If this is true then you would first parameterize your current set of rules -- meaning you must decide in your current algorithm which numeric parameters may be tweaked and optimized to achieve a higher success rate. Once you've decided what they are, the learning process is straight-forward. Start with your current algorithm. Generate a variety of new algorithms with slightly tweaked parameters than before, and run your simulations to evaluate the performance of this new set of algorithms. Pick the best one as your next starting point. Repeat this process until the algorithm can't get any better.
If your algorithm is a set of if-then rules (this includes rule-matching systems), and improving the performance involves reordering or restructuring those rules, then you may want to consider genetic algorithms, which is a little more complex. To apply genetic algorithms, it is essential that you define the mutation and crossover operators such that a single application of mutation or crossover results in a small change in the overall performance while a many applications of mutation and crossover results in a large change in the overall performance of your algorithm. I'm not an expert in this field but there should be much that comes up when you google for "genetic algorithms on decision trees". The pitfall to avoid is that if you simply consider swapping branches in a decision tree for the mutation operator, a single application might modify the root of your decision tree, generating a huge performance difference. This typically adds too much noise for a genetic algorithm, so my advice in this approach is to be very careful about the encoding of your operators.
Note that these two methods are very popular AI methods for learning or improving your current algorithm. You would do all of these simulations and learning offline. Then you would simply deploy the resulting, learned algorithm.