the description of the minimax algo says, that both player have to play optimal, so that the algorithm is optimal. Intuitively it is understandable. But colud anyone concretise, or proof what happens if min plays not optimal?
thx
The definition of "optimal" is that you play so as to minimize the "score" (or whatever you measure) of your opponent's optimal answer, which is defined by the play that minimizes the score of your optimal answer and so forth.
Thus, by definition, if you don't play optimal, your opponent has at least one path that will give him a higher score than his best score if you played optimal.
One way to find out what is optimal is to brute force the entire game tree. For less than trivial problems you can use alpha-beta search, which guarantees optimum without needing to search the entire tree. If you tree is still too complex, you need a heuristic that estimates what the score of a "position" is and halts at a certain depth.
Was that understandable?
I was having problems with that precise question.
When you think about it for a bit you will get the idea that the minimax graph contains ALL possible games including the bad games. So if a player plays a sub optimal game then that game is part of the tree - but has been discarded in favor of a better game.
Its similar to alpha beta. I was getting stuck on what happens if I sacrifice some pieces intentionally to create space and then make a winning move through the gap. ie there is a better move further down the tree.
With alpha beta - lets say a sequence of losing moves followed by a killer move is in fact in the tree - but in that case the alpha and beta act as a window filter "a< x < b" and would have discarded it if YOU had a better game. You can see it in alpha beta if you imagine putting a +/- infinity into a pruned branch to see what happens.
In any case both algorithms recalculate every move so that if a player plays a sub optimal game them that will open up branches of the graph that are better for the opponent.
rinse repeat.
Consider a MIN node whose children are terminal nodes. If MIN plays suboptimally, then the value of the node is greater than or equal to the value it would have if MIN played optimally. Hence, the value of the MAX node that is the MIN node’s parent can only be increased. This argument can be extended by a simple induction all the way to the root. If the suboptimal play by MIN is predictable, then one can do better than a minimax strategy. For example, if MIN always falls for a certain kind of trap and loses, then setting the trap guarantees a win even if there is actually a devastating response for MIN.
Source: https://www.studocu.com/en-us/document/university-of-oregon/introduction-to-artificial-intelligence/assignments/solution-2-past-exam-questions-on-computer-information-system/1052571/view
Related
I have implemented a MCTS for a 4 player game which is working well, but I'm not sure I understand expansion when the game ending move is in the actual Tree rather than in the rollout.
At the start the game game winning/losing positions are only found in the rollout and I understand how to score these and propagate them back up the tree. But as the game progresses, I eventually find a leaf node, chosen by UCB1 that cannot be expanded as it is a losing position with no possible move allowed, so there is nothing to expand, nor is there a game to 'rollout'. At the moment I just score this as a 'win' for the last remaining player and backpropagate a win for them.
However when I look at the visit stats this node gets revisited thousands of time, so obviously UCB1 'chooses' to visit this node many times, but really this is a bit of a waste, should I be back-propagating something other than a single win for these 'always win' nodes?
I've had a good Google search for this and cant really find much mention of it, so am I misunderstanding something or missing something obvious, none of the 'standard' MCTS tutorials/algorithms even mention game ending nodes in the tree as special cases, so I'm worried I've misunderstood something fundamental.
At the moment I just score this as a 'win' for the last remaining player and backpropagate a win for them.
However when I look at the visit stats this node gets revisited thousands of time, so obviously UCB1 'chooses' to visit this node many times, but really this is a bit of a waste, should I be back-propagating something other than a single win for these 'always win' nodes?
No, what you're currently already doing is correct.
MCTS essentially evaluates the value of a node as the average of the outcomes of all paths you have run through that node. In reality, we are generally interested in minimax-style evaluations.
For MCTS' average-based evaluations to become equal to minimax-evaluations in the limit (after an infinite amount of time), we rely on the Selection phase (e.g. UCB1) to send so many simulations (= Selection + Play-out phases) down the path(s) that would be optimal according to minimax evaluations that the average evaluations also tend, in the limit, to the minimax evaluations.
Suppose, for example, that there is a winning node directly below the root node. This is an extreme example of your situation, where the terminal node is already reached in the Selection phase, and no Play-out is required afterwards. The minimax evaluation of the root node would be a win, since we can directly get to a win in one step. This means we want the average-based scoring of MCTS to also become very close to a winning evaluation for the root node. This means that we want the Selection phase to send the vast majority of simulations immediately down into this node. If e.g. 99% of all simulations immediately go to this winning node from the root node, the average evaluation of the root node will also become very close to a win, and that's exactly what we need.
This answer is only about the implementation of basic UCT (MCTS with UCB1 for Selection). For more sophisticated modifications to that basic MCTS implementation related to the question, see manlio's answer
none of the 'standard' MCTS tutorials/algorithms even mention game ending nodes in the tree as special cases
There are MCTS variants able to prove the game theoretical value of a position.
MCTS-Solver is (quite) well known: the backpropagation and selection steps are modified for this variant, as well as the procedure for choosing the final move
to play.
Terminal win and loss positions occurring in the tree are handled differently and a special provision is taken when backing such proven values up the tree.
You can take a look at:
Monte-Carlo Tree Search Solver by Mark H. M. Winands, Yngvi Björnsson, Jahn Takeshi Saito (part of the Lecture Notes in Computer Science book series volume 5131)
for details.
so I'm worried I've misunderstood something fundamental.
Although in the long run MCTS equipped with the UCT formula is able to converge to the game-theoretical value, basic MCTS is unable to prove the game-theoretical value.
I'm trying to solve a variant of 2048 by a Monte-Carlo Tree Search. I found that UCT could a good way to have some trade-off between exploration/exploitation.
My only issue is that all the versions I've seen assume that the score is a win percentage. How can I adapt it to a game where the score is the value of the board at the last state, and thus going from 1-MAX and not a win.
I could normalize the score using the constant c by dividing by MAX but then it would overweight exploration at early stage of the game (since you get bad average score) and overweight exploitation at late stage of the game.
Indeed most of the literature assumes your games are either lost or won and award a score of 0 or 1, which will turn into a win ratio when averaged over the number of games played. Then exploration parameter C is usually set to sqrt(2) which is optimal for the UCB in bandit problems.
To find out what a good C is in general you have to step back a bit and see what the UCT is really doing. If one node in your tree had an exceptionally bad score in the one rollout it had then exploitation says you should never choose it again. But you've only played that node once, so it might have just been bad luck. To acknowledge this you give that node a bonus. How much? Enough to make it a viable choice even if its average score is the lowest possible and some other node has the highest average score possible. Because with enough plays it might turn out that the one rollout your bad node had was indeed a fluke, and the node actually turns out to be pretty reliable with good scores. Of course, if you get more bad scores then it will likely not be bad luck so it won't deserve more rollouts.
So with scores ranging from 0 to 1 a C of sqrt(2) is a good value. If your game has a maximum achievable score then you can normalize your scores by dividing by the max and force your scores into to 0-1 range to suit a C of sqrt(2). Alternatively you don't normalize the scores but multiply C by your maximum score. The effect is the same: the UCT exploration bonus is large enough to give your underdog nodes some rollouts and a chance to prove themselves.
There is an alternative way of setting C dynamically that has given me good results. As you play, you keep track of the highest and lowest scores you've ever seen in each node (and subtree). This is the range of scores possible and this gives you a hint of how big C should be in order to give not well explored underdog nodes a fair chance. Every time i descend into the tree and pick a new root i adjust C to be sqrt(2) * score range for the new root. In addition, as rollouts complete and their scores turn out the be a new highest or lowest score i adjust C in the same way. By continually adjusting C this way as you play but also as you pick a new root you keep C as large as it needs to be to converge but as small as it can be to converge fast. Note that the minimum score is as important as the max: if every rollout will yield at minimum a certain score then C won't need to overcome it. Only the difference between max and min matters.
In the minimax algorithm, the first player plays optimally, which means it wants to maximise its score, and the second player tries to minimise the first player's chances of winning. Does this mean that the second player also plays optimally to win the game? Trying to choose some path in order to minimise the first player's chances of winning also means trying to win?
I am actually trying to solve this task from TopCoder: EllysCandyGame. I wonder whether we can apply the minimax algorithm here. That statement "both to play optimally" really confuses me and I would like some advice how to deal with this type of problems, if there is some general idea.
Yes, you can use the minimax algorithm here.
The problem statement says that the winner of the game is "the girl who has more candies at the end of the game." So one reasonable scoring function you could use is the difference in the number of candies held by the first and second player.
Does this mean that the second player also plays optimally to win the game?
Yes. When you are evaluating a MIN level, the MIN player will always choose the path with the lowest score for the MAX player.
Note: both the MIN and MAX levels can be implemented with the same code, if you evaluate every node from the perspective of the player making the move in that round, and convert scores between levels. If the score is a difference in number of candies, you could simply negate it between levels.
Trying to choose some path in order to minimize the first player's chances of winning also means trying to win?
Yes. The second player is trying to minimize the first player's score. A reasonable scoring function will give the first player a lower score for a loss than a tie.
I wonder whether we can apply the minimax algorithm here.
Yes. If I've read the problem correctly, the number of levels will be equal to the number of boxes. If there's no limit on the number of boxes, you'll need to use an n-move lookahead, evaluating nodes in the minimax tree to a maximum depth.
Properties of the game:
At each point, there are a limited, well defined number of moves (picking one of the non-empty boxes)
The game ends after a finite number of moves (when all boxes are empty)
As a result, the search tree consists of a finite number of leafs. You are right that by applying Minimax, you can find the best move.
Note that you only have to evaluate the game at the final positions (when there are no more moves left). At that point, there are only three results: The first player won, the second player won, or it is a draw.
Note that the standard Minimax algorithm has nothing to do with probabilities. The result of the Minimax algorithm determines the perfect play for both side (assuming that both sides make no mistakes).
By the way, if you need to improve the search algorithm, a safe and simple optimization is to apply Alpha Beta pruning.
I'm trying to implement Pacman. It works fine, but so far, the ghosts aren't using any pathfinding, but instead just decide randomly on each path junction which path to take. So you can imagine that it isn't really difficult for Pacman to win the game ;)
So I read a little bit about path finding algorithms in Pacman and here on SO I found a really good answer: Pathfinding Algorithm For Pacman
The answers are referring to http://home.comcast.net/~jpittman2/pacman/pacmandossier.html#Chapter%204
This is all fine, but in my implementation of Pacman, there are two Pacmans which are played by two different players. So I wonder how to adapt the pathfinding algorithms, so that the ghosts are not always chasing one player.
Any thoughts on how to modify the algorithm so that the ghosts are more or less equally fair to both players?
I think the easiest strategy is to make each ghost chase the player closest to it. Proximity can be calculated using Manhattan distance (there was a link to it in the pathfinding question) or Euclidean distance or by a path length to the players. The last option means that you will have to compute paths to both players. Try all these options and choose one to your taste.
Also, on a side note. All people answering the pathfinding question didn't mention Dijkstra's algorithm which is even slower than BFS :) but allows to search all shortest paths only once. That is, if you implement A* or BFS and have n ghosts you will make at least n pathfinding queries. With Dijkstra you can do it only once starting from the player. But it all depends. If your game field is too large, Dijkstra is not the best choice. Try, experiment and maybe it'll suit you.
(Haven't looked but) I'm guessing that all the ghost algorithms base their behaviour on the relative positions of the ghost and 'the player' - well, simply have each ghost change its mind about which of the two players it uses as 'the player' in its algorithm, every so often.
Determining what exactly "every so often* means here is going to be a question for playtesting - should it be on a fixed schedule? Vary per ghost? Vary based on the relative proximity of the two players? Randomly - on a uniform / Poisson / other distribution?
There are as you can see many possibilities. Bear in mind that you want to avoid both behaviour which is 'too good' and behaviour which is 'too stupid'...
If you can query the distance and direction to any one Pacman from any one Ghost and also the number of Ghosts (and which Ghosts) are currently chasing any one Pacman, you should be able to make a pretty good and simple AI with some creativity.
I think you keep the pathfinding algorithms described on this web page you mentioned. That will make the game feel more true to the original. The only problem then is to determine how many ghosts chase a particular Pacman. I think this behavior should include scenarios where all of the ghosts are chasing one player. So, an algorithm is needed to determine if 1, 2, 3, or 4 ghosts are chasing a player. The algorithm could be based on the point difference between the players. So, the player in the lead would get chased by more ghosts. The algorithm should probably factor in the number of lives left for the player. So, if the player in the lead has fewer lives, the algorithm should delay increasing the number of ghosts chasing the player in the lead. The frequency of change in the number of ghosts chasing a player should also not happen too often. If a ghost changes the player being chased too much, then the ghost will seem to not really be chasing either. Just like the web page mentioned, getting a good behavior is going to take some experimentation. I think keeping it simple at first is key because sometimes complex looking behavior can be achieved by using a few simple rules. Good luck and I would love to see what you come up with. Please post a link when you get done!
I don't know if this coincides with your notion of "fairness", but I imagine one would like to prevent the case where one player happened to be the closer target to all 4 ghosts and so they end up ganging up on him and following him around, never again to chase the other player. This would be a possible result of the rule to have the ghost always follow the closest player.
You might consider first allocating fairly 2 ghosts to player 1 and 2 other ghosts to player 2, and then have them chase their targets (and reassigning this every so often). Although, if I were a ghost in the real world I wouldn't care if all my friends and I were ganging up on one pacman.
Instead of BFS or Dijkstra, I would use depth first search to depth 3 or 4, using Cartesian distance between your ghost and the Pacman at the leaves of this search tree and picking the value of the best leaf up to the root. For a small lookahead, it would be faster and easier to code compared to BFS and Dijkstra. Depth limited search should give you pretty intelligent behavior for your ghosts, assuming your gameboard does not have spiraling corridors where the number of moves required to escape the spiral is greater than 3 or 4. It also means the running time of the algorithm doesn't increase with larger and larger boards as does BFS and Dijkstra, again assuming you don't have spiraling corridors.
I've read in one of my AI books that popular algorithms (A-Star, Dijkstra) for path-finding in simulation or games is also used to solve the well-known "15-puzzle".
Can anyone give me some pointers on how I would reduce the 15-puzzle to a graph of nodes and edges so that I could apply one of these algorithms?
If I were to treat each node in the graph as a game state then wouldn't that tree become quite large? Or is that just the way to do it?
A good heuristic for A-Star with the 15 puzzle is the number of squares that are in the wrong location. Because you need at least 1 move per square that is out of place, the number of squares out of place is guaranteed to be less than or equal to the number of moves required to solve the puzzle, making it an appropriate heuristic for A-Star.
A quick Google search turns up a couple papers that cover this in some detail: one on Parallel Combinatorial Search, and one on External-Memory Graph Search
General rule of thumb when it comes to algorithmic problems: someone has likely done it before you, and published their findings.
This is an assignment for the 8-puzzle problem talked about using the A* algorithm in some detail, but also fairly straightforward:
http://www.cs.princeton.edu/courses/archive/spring09/cos226/assignments/8puzzle.html
The graph theoretic way to solve the problem is to imagine every configuration of the board as a vertex of the graph and then use a breath-first search with pruning based on something like the Manhatten Distance of the board to derive a shortest path from the starting configuration to the solution.
One problem with this approach is that for any n x n board where n > 3 the game space becomes so large that it is not clear how you can efficiently mark the visited vertices. In other words there is no obvious way to assess if the current configuration of the board is identical to one that has previously been discovered through traversing some other path. Another problem is that the graph size grows so quickly with n (it's approximately (n^2)!) that it is just not suitable for a brue-force attack as the number of paths becomes computationally infeasible to traverse.
This paper by Ian Parberry A Real-Time Algorithm for the (n^2 − 1) - Puzzle describes a simple greedy algorithm that iteritively arrives at a solution by completing the first row, then the first column, then the second row... It arrives at a solution almost immediately, however the solution is far from optimal; essentially it solves the problem the way a human would without leveraging any computational muscle.
This problem is closely related to that of solving the Rubik's cube. The graph of all game states it too large to solve by brue force, but there is a fairly simple 7 step method that can be used to solve any cube in about 1 ~ 2 minutes by a dextrous human. This path is of course non-optimal. By learning to recognise patterns that define sequences of moves the speed can be brought down to 17 seconds. However, this feat by Jiri is somewhat superhuman!
The method Parberry describes moves only one tile at a time; one imagines that the algorithm could be made better up by employing Jiri's dexterity and moving multiple tiles at one time. This would not, as Parberry proves, reduce the path length from n^3, but it would reduce the coefficient of the leading term.
Remember that A* will search through the problem space proceeding down the most likely path to goal as defined by your heurestic.
Only in the worst case will it end up having to flood fill the entire problem space, this tends to happen when there is no actual solution to your problem.
Just use the game tree. Remember that a tree is a special form of graph.
In your case the leaves of each node will be the game position after you make one of the moves that is available at the current node.
Here you go http://www.heyes-jones.com/astar.html
Also. be mindful that with the A-Star algorithm, at least, you will need to figure out a admissible heuristic to determine whether a possible next step is closer to the finished route than another step.
For my current experience, on how to solve an 8 puzzle.
it is required to create nodes. keep track of each step taken
and get the manhattan distance from each following steps, taking/going to the one with the shortest distance.
update the nodes, and continue until reaches the goal