What to Do when Monte Carlo Tree Search Hits Memory Limit - artificial-intelligence

I have taken interest into monte carlo tree search applied in games recently.
I have read several papers, but i use "Monte-Carlo Tree Search" A Phd thesis by Chaslot, G as i find it more easy to understand the basics of monte carlo tree search
I have tried to code it, and stuck on certain problem. The algorithm tries to expand one node into the game tree for every one simulation. This quickly escalates to memory problem. I have quickly read the paper, but it doesnt seem to explain what the technique will do if it hits certain memory limit.
Can you suggest what should the technique do if it hits certain memory limit?
you can see the paper here :
http://www.unimaas.nl/games/files/phd/Chaslot_thesis.pdf

One very effective approach is to grow the tree more slowly. That is, instead of expanding the tree every time you reach a leaf node, you expand it once it has at least k visits. This will significantly slow the growth of the tree, and often does not reduce performance. I was told by one of the authors of the Fuego Go program that he tried the approach, and it worked well in practice.
This idea was originally described in this paper:
Remi Coulom. Efficient selectivity and backup operators in monte-carlo tree search. In Computers and games, pages 72–83. Springer, 2007.
It was also used in:
Max Roschke and Nathan Sturtevant. UCT Enhancements in Chinese Checkers Using an Endgame Database, IJCAI Workshop on Computer Games, 2013.

The paper Memory Bounded Monte Carlo Tree Search evaluates a variety of solutions for this problem :
Stopping : you stop the algorithm when you hit your memory limit
Stunting : you stop growing the tree when you hit your memory limit (but keep updating it)
Ensemble : you keep your result and restart the search from an empty tree when you hit your memory limit (fusing the results at the end)
Flattening : when you hit your memory limit you get rid of all the nodes exept the root and its direct children and restart the search from this new basis
Garbage collection : when you hit your memory limit, you remove all nodes that have not been visited a given number of times
Recycling : when you add a node, you delete the node that has not been visited for the longest time

You can throw away all nodes with number of visits smaller than some threshold that was not visited recently (how many playouts ago).
That's a quick but not efficient solution.
It's better to implement progressive widening too.

Related

Monte Carlo tree search - handling game ending nodes

I have implemented a MCTS for a 4 player game which is working well, but I'm not sure I understand expansion when the game ending move is in the actual Tree rather than in the rollout.
At the start the game game winning/losing positions are only found in the rollout and I understand how to score these and propagate them back up the tree. But as the game progresses, I eventually find a leaf node, chosen by UCB1 that cannot be expanded as it is a losing position with no possible move allowed, so there is nothing to expand, nor is there a game to 'rollout'. At the moment I just score this as a 'win' for the last remaining player and backpropagate a win for them.
However when I look at the visit stats this node gets revisited thousands of time, so obviously UCB1 'chooses' to visit this node many times, but really this is a bit of a waste, should I be back-propagating something other than a single win for these 'always win' nodes?
I've had a good Google search for this and cant really find much mention of it, so am I misunderstanding something or missing something obvious, none of the 'standard' MCTS tutorials/algorithms even mention game ending nodes in the tree as special cases, so I'm worried I've misunderstood something fundamental.
At the moment I just score this as a 'win' for the last remaining player and backpropagate a win for them.
However when I look at the visit stats this node gets revisited thousands of time, so obviously UCB1 'chooses' to visit this node many times, but really this is a bit of a waste, should I be back-propagating something other than a single win for these 'always win' nodes?
No, what you're currently already doing is correct.
MCTS essentially evaluates the value of a node as the average of the outcomes of all paths you have run through that node. In reality, we are generally interested in minimax-style evaluations.
For MCTS' average-based evaluations to become equal to minimax-evaluations in the limit (after an infinite amount of time), we rely on the Selection phase (e.g. UCB1) to send so many simulations (= Selection + Play-out phases) down the path(s) that would be optimal according to minimax evaluations that the average evaluations also tend, in the limit, to the minimax evaluations.
Suppose, for example, that there is a winning node directly below the root node. This is an extreme example of your situation, where the terminal node is already reached in the Selection phase, and no Play-out is required afterwards. The minimax evaluation of the root node would be a win, since we can directly get to a win in one step. This means we want the average-based scoring of MCTS to also become very close to a winning evaluation for the root node. This means that we want the Selection phase to send the vast majority of simulations immediately down into this node. If e.g. 99% of all simulations immediately go to this winning node from the root node, the average evaluation of the root node will also become very close to a win, and that's exactly what we need.
This answer is only about the implementation of basic UCT (MCTS with UCB1 for Selection). For more sophisticated modifications to that basic MCTS implementation related to the question, see manlio's answer
none of the 'standard' MCTS tutorials/algorithms even mention game ending nodes in the tree as special cases
There are MCTS variants able to prove the game theoretical value of a position.
MCTS-Solver is (quite) well known: the backpropagation and selection steps are modified for this variant, as well as the procedure for choosing the final move
to play.
Terminal win and loss positions occurring in the tree are handled differently and a special provision is taken when backing such proven values up the tree.
You can take a look at:
Monte-Carlo Tree Search Solver by Mark H. M. Winands, Yngvi Björnsson, Jahn Takeshi Saito (part of the Lecture Notes in Computer Science book series volume 5131)
for details.
so I'm worried I've misunderstood something fundamental.
Although in the long run MCTS equipped with the UCT formula is able to converge to the game-theoretical value, basic MCTS is unable to prove the game-theoretical value.

best-first Vs. breadth-first

What is the difference between best-first-search and the breadth-first-search ? and which one do we call "BFS" ?
To answer your second question first:
which one do we call "BFS" ?
Typically when we refer to BFS, we are talking Breadth-first Search.
What is the difference between best-first-search and the breadth-first-search
The analogy that I like to consult when comparing such algorithms is robots digging for gold.
Given a hill, our goal is to simply find gold.
Breadth-first search has no prior knowledge of the whereabouts of the gold so the robot simply digs 1 foot deep along the 10-foot strip if it doesn't find any gold, it digs 1 foot deeper.
Best-first search, however, has a built-in metal detector, thus meaning it has prior knowledge. There is, of course, the cost in having a metal detector, and cost in turning it on and seeing which place would be the best to start digging.
Best-first search is informed whereas Breadth-first search is uninformed, as in one has a metal detector and the other doesn't!
Breadth-first search is complete, meaning it'll find a solution if one exists, and given enough resources will find the optimal solution.
Best-first search is also complete provided the heuristic — estimator of the cost/ so the prior knowledge — is admissible — meaning it overestimates the cost of getting to the solution)
I got the BFS image from http://slideplayer.com/slide/9063462/ the Best-first search is my failed attempt at photoshop!
Thats 2 algorithms to search a graph (tree).
Breadth first looks at all elements(nodes) of a certain depth, trying to find a solutuion (searched value or whatever) then continous one level deeper and looks at every node and so on.
Best first looks at the "best" node defined mostly by a heuristic, checks the best subnode of that node and so on.
A* would be an example for heursitic (best first search) and its way faster. But you need a heuristic what you wouldn't need for breadth search.
Creating a heuristic needs some own effort. Breadth first is out of the box.

Number of simulation per node in Monte Carlo tree search

In the mcts algorithm described in Wikipedia, it performs exactly one playout(simulation) in each node selection. Now, I am experimenting this algorithm in a simple connect-k game. I wonder, in practice, do we perform more playouts to reduce the variance?
I tried the original algorithm with exactly one random playout (non-biased). The result is bad compared to my heuristic search with alpha-beta pruning. It converges very slowly. When I perform 500 playouts instead, the noise is a lot less. However, each node simulation is too slow for the algorithm to explore other parts of the tree in the given time hence missing the most critical move sometimes.
I then added the AMAF (in particular with RAVE transition) heuristic to the basic MCTS. I don't notice too much difference with 500 playouts perhaps because the variance is already low. I haven't analyzed the result with 1 playout yet.
Could anyone give me any insights?
Typically, you'd do exactly one play-out per selection step. However, subsequent selection steps can go through the same node multiple times.
Consider, for example, a case where there are only two moves available in the root node. If you then run, let's say, 10,000 complete iterations of MCTS (where one iteration = Selection + Expansion + Play-out + Backpropagation), each of the two nodes below the root node will get selected roughly 5,000 times (or maybe one gets selected 9,000 times and the other 1,000 times if the first is clearly a better option than the seocnd, but still, both get selected more than once).
Does this match what you are currently doing in your implementation? If not, try providing some code that you currently have so that we can see where it goes wrong. But if this is how you implemented it (which is how it should be), then there should be no problems with doing only one play-out per selection step

How can I dynamically select and expand a certain search node depending on its score and the score of its children?

I'm familiar with most path finding and graph search algorithms, but I'm not sure how I can solve this dynamically and I'm sure I've overlooked something .
Currently, my approach is very static and hard-coded. This is about a single player tetris like game for which I create the AI.
The current and next piece are known, no further. Since the branching factor is pretty wide, I only look at the best 3 states of all possible states for the current piece and then again the best 3 states that are generated with the next piece.
To get a deeper look at the future at depth 3 I generate all states for all possible pieces, get the best and calculate the average for them. This could be continued for further depths and only depend on CPU power.
Since I only take the best 3, then the next best 3 and then only the best to calculate the average, this doesn't seem balanced. I would need something that dynamically selects and expands a certain search node depending on its score and the score of its children..
Do have a more informed search strategy, you can look at
Expecti-Max algorithm, which is a version of Alpha-Beta search for stochastic problems, or
Monte-Carlo Tree Search (or UCT in particular).

How do you solve the 15-puzzle with A-Star or Dijkstra's Algorithm?

I've read in one of my AI books that popular algorithms (A-Star, Dijkstra) for path-finding in simulation or games is also used to solve the well-known "15-puzzle".
Can anyone give me some pointers on how I would reduce the 15-puzzle to a graph of nodes and edges so that I could apply one of these algorithms?
If I were to treat each node in the graph as a game state then wouldn't that tree become quite large? Or is that just the way to do it?
A good heuristic for A-Star with the 15 puzzle is the number of squares that are in the wrong location. Because you need at least 1 move per square that is out of place, the number of squares out of place is guaranteed to be less than or equal to the number of moves required to solve the puzzle, making it an appropriate heuristic for A-Star.
A quick Google search turns up a couple papers that cover this in some detail: one on Parallel Combinatorial Search, and one on External-Memory Graph Search
General rule of thumb when it comes to algorithmic problems: someone has likely done it before you, and published their findings.
This is an assignment for the 8-puzzle problem talked about using the A* algorithm in some detail, but also fairly straightforward:
http://www.cs.princeton.edu/courses/archive/spring09/cos226/assignments/8puzzle.html
The graph theoretic way to solve the problem is to imagine every configuration of the board as a vertex of the graph and then use a breath-first search with pruning based on something like the Manhatten Distance of the board to derive a shortest path from the starting configuration to the solution.
One problem with this approach is that for any n x n board where n > 3 the game space becomes so large that it is not clear how you can efficiently mark the visited vertices. In other words there is no obvious way to assess if the current configuration of the board is identical to one that has previously been discovered through traversing some other path. Another problem is that the graph size grows so quickly with n (it's approximately (n^2)!) that it is just not suitable for a brue-force attack as the number of paths becomes computationally infeasible to traverse.
This paper by Ian Parberry A Real-Time Algorithm for the (n^2 − 1) - Puzzle describes a simple greedy algorithm that iteritively arrives at a solution by completing the first row, then the first column, then the second row... It arrives at a solution almost immediately, however the solution is far from optimal; essentially it solves the problem the way a human would without leveraging any computational muscle.
This problem is closely related to that of solving the Rubik's cube. The graph of all game states it too large to solve by brue force, but there is a fairly simple 7 step method that can be used to solve any cube in about 1 ~ 2 minutes by a dextrous human. This path is of course non-optimal. By learning to recognise patterns that define sequences of moves the speed can be brought down to 17 seconds. However, this feat by Jiri is somewhat superhuman!
The method Parberry describes moves only one tile at a time; one imagines that the algorithm could be made better up by employing Jiri's dexterity and moving multiple tiles at one time. This would not, as Parberry proves, reduce the path length from n^3, but it would reduce the coefficient of the leading term.
Remember that A* will search through the problem space proceeding down the most likely path to goal as defined by your heurestic.
Only in the worst case will it end up having to flood fill the entire problem space, this tends to happen when there is no actual solution to your problem.
Just use the game tree. Remember that a tree is a special form of graph.
In your case the leaves of each node will be the game position after you make one of the moves that is available at the current node.
Here you go http://www.heyes-jones.com/astar.html
Also. be mindful that with the A-Star algorithm, at least, you will need to figure out a admissible heuristic to determine whether a possible next step is closer to the finished route than another step.
For my current experience, on how to solve an 8 puzzle.
it is required to create nodes. keep track of each step taken
and get the manhattan distance from each following steps, taking/going to the one with the shortest distance.
update the nodes, and continue until reaches the goal

Resources