Unexpected path dependence in alpha-beta search? - artificial-intelligence

I'm writing an artificial intelligence for the old Norse tafl family of board games (project here, source file at issue here). They're close enough to chess in broad strokes for knowledge of chess AI to apply here. (The variant in question is played on a 7x7 board with a radially symmetric starting position, white starting in the middle and black starting at the edge.) I'm running into a curious problem with how I've implemented alpha-beta search: the result of a search to a fixed depth, with no optimizations enabled besides alpha-beta pruning, changes depending on the order in which nodes are explored.
In the file at issue, the important methods are 'explore', 'exploreChildren', 'handleEvaluationResults', and 'generateSuccessorMoves'. 'explore' checks to see if there's a transposition table hit (disabled elsewhere for this test), evaluates the state if it's a victory or a leaf node, or calls exploreChildren. exploreChildren does the recursive searching on child nodes. generateSuccessorMoves generates (and optionally sorts) the moves exiting the current state. handleEvaluationResults determines whether a child evaluation has caused a cutoff.
So, I wrote a minimal test case: generateSuccessorMoves first does no sorting whatsoever, then simply shuffles the list of moves rather than sort it. The results of the search are not equivalent in result, nor in result considering symmetry, nor in value:
MAIN SEARCH
# cutoffs/avg. to 1st a/b a/b
Depth 0: 0/0 0/0
Depth 1: 0/22 0/1
Depth 2: 42/0 3/0
Finding best state...
Best move: d3-g3 with path...
d3-g3
e1-f1
e4-e1xf1
End of best path scored -477
Observed/effective branching factor: 23.00/9.63
Thought for: 72msec. Tree sizes: main search 893 nodes, continuation search: 0 nodes, horizon search: 0 nodes
Overall speed: 12402.77777777778 nodes/sec
Transposition table stats: Table hits/queries: 0/0 Table inserts/attempts: 0/0
1. move: d3-g3 value: -477
Using 5000msec, desired 9223372036854775807
Depth 3 explored 1093 states in 0.037 sec at 29540.54/sec
MAIN SEARCH
# cutoffs/avg. to 1st a/b a/b
Depth 0: 0/0 0/0
Depth 1: 0/21 0/2
Depth 2: 104/0 2/0
Finding best state...
Best move: d3-f3 with path...
d3-f3
d2-c2
d5-f5xf4
End of best path scored -521
Observed/effective branching factor: 23.00/10.30
Thought for: 37msec. Tree sizes: main search 1093 nodes, continuation search: 0 nodes, horizon search: 0 nodes
Overall speed: 29540.540540540544 nodes/sec
Transposition table stats: Table hits/queries: 0/0 Table inserts/attempts: 0/0
7. move: d3-f3 value: -521
This is an extreme case, obviously, but it's my understanding that alpha-beta in this situation (that is, without any feature besides 'alpha-beta pruning') should be stable no matter what the order of the search is—at the very least, it should return a node of the same value. Am I wrong? Am I doing something wrong?
First edit: although I suppose it's obvious from the description of this problem, it turns out that there is some as-yet unknown bug in my alpha-beta implementation. Further testing shows that it does not provide the same result as pure minimax.
Second edit: this is the pseudocode version of the alpha-beta search implemented in the file linked above.
explore(depth, maxDepth, alpha, beta)
// some tafl variants feature rules where one player moves more than once in a turn
// each game tree node knows whether it's maximizing or minimizing
var isMaximizing = this.isMaximizing()
var value = NO_VALUE
if(isTerminal(depth, maxDepth))
value = eval()
else
for(move in successorMoves)
if(cutoff) break
nodeValue = nextNode(move).explore(depth + 1, maxDepth, alpha, beta)
if(value == NO_VALUE) value = nodeValue
if(isMaximizing)
value = max(value, nodeValue)
alpha = max(alpha, value)
if(beta <= alpha) break
else
value = min(value, nodeValue)
beta = min(beta, value)
if(beta <= alpha) break
rootNode.explore(0, 5, -infinity, infinity)

It turns out it was my fault. I have a bit of code which recursively revalues the nodes above a certain node, for use in extension searches, and I was calling it in the wrong place (after exploring all the children of any node). That early back-propagation was causing incorrect alpha and beta values, and therefore early cutoffs.

Related

Can a cache be used for an alpha-beta search algorithm?

I'm working on a minimax tic-tac-toe algorithm. I got it working fine, caching each state in the tree.
Then I implemented alpha-beta pruning, which seemed to affect the game. I think the problem is that nodes cannot be "trusted" if any of their descendants (children, grandchildren, etc.) were pruned. Is this true?
For now, I'm only caching states if they don't have pruned descendants. This image shows my point (not tic tac toe). The max player is the upwards triangle, which should choose the move on the left. However, if the move on the right is cached during alpha-beta pruning, the red triangle will have a false value of 4, so the move on the right would be wrongly chosen.
If by a "cache" you mean a transposition table, then you can't always trust the value in the transposition table. That is, when you store a value in a transposition table, you need to also store the alpha and beta values (perhaps the depth as well) used for the search below that state. If the alpha and beta values are not the same*, then you can't use the value from the transposition table.
*In practice they don't have to be identical, the table just needs to have values that include a superset of the values used at the current node you want to replace with the cached values.
Edit: Additional info for those dealing with this in larger games. When you search at a node you have a lower bound (alpha) and upper bound (beta) on the final value. If the returned value is between alpha and beta, then you know it is the true value of the state. If it is equal to alpha or beta, then you know it is only a bound on the final value. But, you can still use this information to help the search.
In particular, suppose that you have alpha=10 and beta=20 in the current search and the value in the transposition table is [alpha = 12, beta = 30, value = 12]. Then, when you (re-)search below the branch, you can search with bounds of alpha=10 and beta=12.
This is because you've already proven that the value is <= 12 in the previous search. When you get the final result, you can then update the transposition table entry to reflect the additional information from this search.

usage of heuristic function in alpha beta pruning

While solving an alpha beta pruning algorithm, how does a heuristic function help to prune as many nodes as possible? If in the worst case, no nodes got pruned, how to prune them using heuristic function?
In alpha-beta pruning in a perfect-information extensive form two-player game (that is, a game tree) you always have lower and upper bounds during the search from alpha-beta pruning. These bounds essentially say that only values between these bounds are interesting -- anything else will be pruned by an ancestor in the tree.
If you have a heuristic which is a guaranteed lower/upper bound on the value of all children of a given state, you can test that against the bounds for the current state. If the heuristic says the value of all future states are guaranteed to be larger or equal to the upper bound (or smaller or equal to the lower bound), then you can immediately prune and stop your search.
For example, suppose the max player is at a state with alpha=-10 and beta = 3, and your heuristic says the h(s) >= 5. Then, you can immediately update alpha to 5. Since alpha >= beta, you can prune and return immediately. With the same alpha/beta values at a min node you'd need your heuristic to say that h(s) <= -10 to be able to prune.
Alpha-beta pruning can reduce a tree size b^d to b^(d/2). With a perfect heuristic you could essentially reduce the tree size to d. That is, your heuristic could immediately cut off all remaining children at every state. In practice you won't be able to do so well. But, heuristic can significantly reduce the size of search on top of alpha-beta pruning.

How to improve performance using Transposition Table in Game Playing?

I have implemented iterative deepening with alpha-beta pruning in my game and I also added a Transposition Table to store already evaluated boards.
Right now, I am doing the following:
When running iterative deepening, at depth = 0 it evaluates and stores all positions with their scores in TT.
Now, when it re-runs with depth = 1. I simply return the value of the board if it exists in the TT. This stops the algorithm at depth = 0 as all values are already in the TT for depth = 0 boards.
If I return values from TT when the depth limit is reached eg. depth = MAX_DEPTH then big sub-trees will never be cut.
So, I am not understanding how should I re-use the values stored in the TT for making my game faster?
I will use chess for explanation rethoric in this answer, of course this reasoning with slight modifications can be applied for other board games as well.
Transposition Tables in board game programs are caches which store already evaluated boards in a cache. It is great to have an easy-to-handle cache value which would uniquely identify a position, like:
WKe5Qd6Pg2h3h4 BKa8Qa7
So if you get to a position, you check for the cache key to be present and if so, then reuse its evaluation. Whenever you visit a position at depth=0, after it's properly evaluated, it can be cached. So, if some moves are made, in the sub-variations you can more-or-less jump over the evaluation. For example, let's consider the example that in the starting position white moved 1. Nf3 and black replied 1... Nf6. After the resulting positions for both plies the position was cached, white's 2. Ng1 needs evaluation, since this was not evaluated nor cached yet, but Black's possible 2... Ng8 doesn't need to be evaluated, because it's resulting in the starting position.
Of course, you can do more aggressive caching and store positions up to depth = 1 or even more.
You will need to make sure that you do not miss some strategic details of the game. In the case of chess you will need to keep in mind:
the 50-move rule's effect
3-time repetition draw
who's on the move
were/are some special moves like castling or en-passant possible in the past/in the present and not at the other case
So, you might want to add some further nuances into your algorithm, but to answer the original question: positions already occurred in the game or being very high in the variation table can be cached and more-or-less ignored (the more means in most of the cases, the less means the nuances outlined above)

How can I tell if a particular heuristic is admissible, and why mine is not?

The definition of an admissible heuristic is one that "does not overestimate the path of a particular goal".
I am attempting to write a Pac-Man heuristic for finding the fastest method to eat dots, some of which are randomly scattered across the grid. However it is failing my admissibility test.
Here are the steps of my algorithm:
sum = 0, list = grid.getListofDots()
1. Find nearest dot from starting position (or previous dot that was removed) using manhattan distance
2. add to sum
3. Remove dot from list of possible dots
4. repeat steps 1 - 3 until list is empty
5. return the sum
Since I'm using manhattan distance, shouldn't this be admissible? If not, are there any suggestions or other approaches to make this algorithm admissible?
As said your heuristics isn't admissible. Another example is:
Your cost is 9 but the best path has cost 6.
A very, very simple admissible heuristics is:
number_of_remaining_dots
but it isn't very tight. A small improvement is:
manhattan_distance_to_nearest_dot + dots_left_out
Other possibilities are:
distance_to_nearest_dot // Found via Breadth-first search
or
manhattan_distance_to_farthest_dot

A* search algorithm heuristic function

I am trying to find the optimal solution to a Sliding Block Puzzle of any length using the A* algorithm.
The Sliding Block Puzzle is a game with white (W) and black tiles (B) arranged on a linear game board with a single empty space(-). Given the initial state of the board, the aim of the game is to arrange the tiles into a target pattern.
For example my current state on the board is BBW-WWB and I have to achieve BBB-WWW state.
Tiles can move in these ways :
1. slide into an adjacent empty space with a cost of 1.
2. hop over another tile into the empty space with a cost of 1.
3. hop over 2 tiles into the empty space with a cost of 2.
I have everything implemented, but I am not sure about the heuristic function. It computes the shortest distance (minimal cost) possible for a misplaced tile in current state to a closest placed same color tile in goal state.
Considering the given problem for the current state BWB-W and goal state BB-WW the heuristic function gives me a result of 3. (according to minimal distance: B=0 + W=2 + B=1 + W=0). But the actual cost of reaching the goal is not 3 (moving the misplaced W => cost 1 then the misplaced B => cost 1) but 2.
My question is: should I compute the minimal distance this way and don't care about the overestimation, or should I divide it by 2? According to the ways tiles can move, one tile can for the same cost overcome twice as much(see moves 1 and 2).
I tried both versions. While the divided distance gives better final path cost to the achieved goal, it visits more nodes => takes more time than the not divided one. What is the proper way to compute it? Which one should I use?
It is not obvious to me what an admissible heuristic function for this problem looks like, so I won't commit to saying, "Use the divided by two function." But I will tell you that the naive function you came up with is not admissible, and therefore will not give you good performance. In order for A* to work properly, the heuristic used must be admissible; in order to be admissible, the heuristic must absolutely always give an optimistic estimate. This one doesn't, for exactly the reason you highlight in your example.
(Although now that I think about it, dividing by two does seem like a reasonable way to force admissibility. I'm just not going to commit to it.)
Your heuristic is not admissible, so your A* is not guaranteed to find the optimal answer every time. An admissible heuristic must never overestimate the cost.
A better heuristic than dividing your heuristic cost by 3, would be: instead of adding the distance D of each letter to its final position, add ceil(D/2). This way, a letter 1 or 2 away, gets a 1 value, 3 or 4 away, gets a 2 value, an so on.

Resources