usage of heuristic function in alpha beta pruning - artificial-intelligence

While solving an alpha beta pruning algorithm, how does a heuristic function help to prune as many nodes as possible? If in the worst case, no nodes got pruned, how to prune them using heuristic function?

In alpha-beta pruning in a perfect-information extensive form two-player game (that is, a game tree) you always have lower and upper bounds during the search from alpha-beta pruning. These bounds essentially say that only values between these bounds are interesting -- anything else will be pruned by an ancestor in the tree.
If you have a heuristic which is a guaranteed lower/upper bound on the value of all children of a given state, you can test that against the bounds for the current state. If the heuristic says the value of all future states are guaranteed to be larger or equal to the upper bound (or smaller or equal to the lower bound), then you can immediately prune and stop your search.
For example, suppose the max player is at a state with alpha=-10 and beta = 3, and your heuristic says the h(s) >= 5. Then, you can immediately update alpha to 5. Since alpha >= beta, you can prune and return immediately. With the same alpha/beta values at a min node you'd need your heuristic to say that h(s) <= -10 to be able to prune.
Alpha-beta pruning can reduce a tree size b^d to b^(d/2). With a perfect heuristic you could essentially reduce the tree size to d. That is, your heuristic could immediately cut off all remaining children at every state. In practice you won't be able to do so well. But, heuristic can significantly reduce the size of search on top of alpha-beta pruning.

Related

Can a cache be used for an alpha-beta search algorithm?

I'm working on a minimax tic-tac-toe algorithm. I got it working fine, caching each state in the tree.
Then I implemented alpha-beta pruning, which seemed to affect the game. I think the problem is that nodes cannot be "trusted" if any of their descendants (children, grandchildren, etc.) were pruned. Is this true?
For now, I'm only caching states if they don't have pruned descendants. This image shows my point (not tic tac toe). The max player is the upwards triangle, which should choose the move on the left. However, if the move on the right is cached during alpha-beta pruning, the red triangle will have a false value of 4, so the move on the right would be wrongly chosen.
If by a "cache" you mean a transposition table, then you can't always trust the value in the transposition table. That is, when you store a value in a transposition table, you need to also store the alpha and beta values (perhaps the depth as well) used for the search below that state. If the alpha and beta values are not the same*, then you can't use the value from the transposition table.
*In practice they don't have to be identical, the table just needs to have values that include a superset of the values used at the current node you want to replace with the cached values.
Edit: Additional info for those dealing with this in larger games. When you search at a node you have a lower bound (alpha) and upper bound (beta) on the final value. If the returned value is between alpha and beta, then you know it is the true value of the state. If it is equal to alpha or beta, then you know it is only a bound on the final value. But, you can still use this information to help the search.
In particular, suppose that you have alpha=10 and beta=20 in the current search and the value in the transposition table is [alpha = 12, beta = 30, value = 12]. Then, when you (re-)search below the branch, you can search with bounds of alpha=10 and beta=12.
This is because you've already proven that the value is <= 12 in the previous search. When you get the final result, you can then update the transposition table entry to reflect the additional information from this search.

Is best first search optimal and complete?

I have some doubts regarding best first search algorithm. The pseudocode that I have is the following:
best first search pseudocode
First doubt: is it complete? I have read that it is not because it can enter in a dead end, but I don't know when can happen, because if the algorithm chooses a node that has not more neighbours it does not get stucked in it because this node is remove from the open list and in the next iteration the following node of the open list is treated and the search continues.
Second doubt: is it optimal? I thought that if it is visiting the nodes closer to the goal along the search process, then the solution would be the shortest, but it is not in that way and I do not know the reason for that and therefore, the reason that makes this algorithm not optimal.
The heuristic I was using is the straight line distance between two points.
Thanks for your help!!
Of course, if heuristic function underestimates the costs, best first search is not optimal. In fact, even if your heuristic function is exactly right, best first search is never guaranteed to be optimal. Here is a counter example. Consider the following graph:
The green numbers are the actual costs and the red numbers are the exact heuristic function. Let's try to find a path from node S to node G.
Best first search would give you S->A->G following the heuristic function. However, if you look at the graph closer, you would see that the path S->B->C->G has lower cost of 5 instead of 6. Thus, this is an example of best first search performing suboptimal under perfect heuristic function.
In general case best first search algorithm is complete as in worst case scenario it will search the whole space (worst option). Now, it should be also optimal - given the heuristic function is admissible - meaning it does not overestimate the cost of the path from any of the nodes to goal. (It also needs to be consistent - that means that it adheres to triangle inequality, if it is not then the algorithm would not be complete - as it could enter a cycle)
Checking your algorithm I do not see how the heuristic function is calculated. Also I do not see there is calculated the cost of the path to get to the particular node.
So, it needs to calculate the actual cost of the path to reach a particular node and then it needs to add a heuristics estimate of the cost of the path from the node towards goal.
The formula is f(n)=g(n)+h(n) where g(n) is the cost of the path to reach the node and h(n) is the heuristics estimating the cost of the cheapest path from n to the goal.
Check the implementation of A* algorithm which is an example of best first search on path planning.
TLDR In best first search, you need to calculate the cost of a node as a sum of the cost of the path to get to that node and the heuristic function that estimate the cost of the path from that node to the goal. If the heuristic function will be admissible and consistent the algorithm will be optimal and complete.

Unexpected path dependence in alpha-beta search?

I'm writing an artificial intelligence for the old Norse tafl family of board games (project here, source file at issue here). They're close enough to chess in broad strokes for knowledge of chess AI to apply here. (The variant in question is played on a 7x7 board with a radially symmetric starting position, white starting in the middle and black starting at the edge.) I'm running into a curious problem with how I've implemented alpha-beta search: the result of a search to a fixed depth, with no optimizations enabled besides alpha-beta pruning, changes depending on the order in which nodes are explored.
In the file at issue, the important methods are 'explore', 'exploreChildren', 'handleEvaluationResults', and 'generateSuccessorMoves'. 'explore' checks to see if there's a transposition table hit (disabled elsewhere for this test), evaluates the state if it's a victory or a leaf node, or calls exploreChildren. exploreChildren does the recursive searching on child nodes. generateSuccessorMoves generates (and optionally sorts) the moves exiting the current state. handleEvaluationResults determines whether a child evaluation has caused a cutoff.
So, I wrote a minimal test case: generateSuccessorMoves first does no sorting whatsoever, then simply shuffles the list of moves rather than sort it. The results of the search are not equivalent in result, nor in result considering symmetry, nor in value:
MAIN SEARCH
# cutoffs/avg. to 1st a/b a/b
Depth 0: 0/0 0/0
Depth 1: 0/22 0/1
Depth 2: 42/0 3/0
Finding best state...
Best move: d3-g3 with path...
d3-g3
e1-f1
e4-e1xf1
End of best path scored -477
Observed/effective branching factor: 23.00/9.63
Thought for: 72msec. Tree sizes: main search 893 nodes, continuation search: 0 nodes, horizon search: 0 nodes
Overall speed: 12402.77777777778 nodes/sec
Transposition table stats: Table hits/queries: 0/0 Table inserts/attempts: 0/0
1. move: d3-g3 value: -477
Using 5000msec, desired 9223372036854775807
Depth 3 explored 1093 states in 0.037 sec at 29540.54/sec
MAIN SEARCH
# cutoffs/avg. to 1st a/b a/b
Depth 0: 0/0 0/0
Depth 1: 0/21 0/2
Depth 2: 104/0 2/0
Finding best state...
Best move: d3-f3 with path...
d3-f3
d2-c2
d5-f5xf4
End of best path scored -521
Observed/effective branching factor: 23.00/10.30
Thought for: 37msec. Tree sizes: main search 1093 nodes, continuation search: 0 nodes, horizon search: 0 nodes
Overall speed: 29540.540540540544 nodes/sec
Transposition table stats: Table hits/queries: 0/0 Table inserts/attempts: 0/0
7. move: d3-f3 value: -521
This is an extreme case, obviously, but it's my understanding that alpha-beta in this situation (that is, without any feature besides 'alpha-beta pruning') should be stable no matter what the order of the search is—at the very least, it should return a node of the same value. Am I wrong? Am I doing something wrong?
First edit: although I suppose it's obvious from the description of this problem, it turns out that there is some as-yet unknown bug in my alpha-beta implementation. Further testing shows that it does not provide the same result as pure minimax.
Second edit: this is the pseudocode version of the alpha-beta search implemented in the file linked above.
explore(depth, maxDepth, alpha, beta)
// some tafl variants feature rules where one player moves more than once in a turn
// each game tree node knows whether it's maximizing or minimizing
var isMaximizing = this.isMaximizing()
var value = NO_VALUE
if(isTerminal(depth, maxDepth))
value = eval()
else
for(move in successorMoves)
if(cutoff) break
nodeValue = nextNode(move).explore(depth + 1, maxDepth, alpha, beta)
if(value == NO_VALUE) value = nodeValue
if(isMaximizing)
value = max(value, nodeValue)
alpha = max(alpha, value)
if(beta <= alpha) break
else
value = min(value, nodeValue)
beta = min(beta, value)
if(beta <= alpha) break
rootNode.explore(0, 5, -infinity, infinity)
It turns out it was my fault. I have a bit of code which recursively revalues the nodes above a certain node, for use in extension searches, and I was calling it in the wrong place (after exploring all the children of any node). That early back-propagation was causing incorrect alpha and beta values, and therefore early cutoffs.

A* search algorithm heuristic function

I am trying to find the optimal solution to a Sliding Block Puzzle of any length using the A* algorithm.
The Sliding Block Puzzle is a game with white (W) and black tiles (B) arranged on a linear game board with a single empty space(-). Given the initial state of the board, the aim of the game is to arrange the tiles into a target pattern.
For example my current state on the board is BBW-WWB and I have to achieve BBB-WWW state.
Tiles can move in these ways :
1. slide into an adjacent empty space with a cost of 1.
2. hop over another tile into the empty space with a cost of 1.
3. hop over 2 tiles into the empty space with a cost of 2.
I have everything implemented, but I am not sure about the heuristic function. It computes the shortest distance (minimal cost) possible for a misplaced tile in current state to a closest placed same color tile in goal state.
Considering the given problem for the current state BWB-W and goal state BB-WW the heuristic function gives me a result of 3. (according to minimal distance: B=0 + W=2 + B=1 + W=0). But the actual cost of reaching the goal is not 3 (moving the misplaced W => cost 1 then the misplaced B => cost 1) but 2.
My question is: should I compute the minimal distance this way and don't care about the overestimation, or should I divide it by 2? According to the ways tiles can move, one tile can for the same cost overcome twice as much(see moves 1 and 2).
I tried both versions. While the divided distance gives better final path cost to the achieved goal, it visits more nodes => takes more time than the not divided one. What is the proper way to compute it? Which one should I use?
It is not obvious to me what an admissible heuristic function for this problem looks like, so I won't commit to saying, "Use the divided by two function." But I will tell you that the naive function you came up with is not admissible, and therefore will not give you good performance. In order for A* to work properly, the heuristic used must be admissible; in order to be admissible, the heuristic must absolutely always give an optimistic estimate. This one doesn't, for exactly the reason you highlight in your example.
(Although now that I think about it, dividing by two does seem like a reasonable way to force admissibility. I'm just not going to commit to it.)
Your heuristic is not admissible, so your A* is not guaranteed to find the optimal answer every time. An admissible heuristic must never overestimate the cost.
A better heuristic than dividing your heuristic cost by 3, would be: instead of adding the distance D of each letter to its final position, add ceil(D/2). This way, a letter 1 or 2 away, gets a 1 value, 3 or 4 away, gets a 2 value, an so on.

minimax: what happens if min plays not optimal

the description of the minimax algo says, that both player have to play optimal, so that the algorithm is optimal. Intuitively it is understandable. But colud anyone concretise, or proof what happens if min plays not optimal?
thx
The definition of "optimal" is that you play so as to minimize the "score" (or whatever you measure) of your opponent's optimal answer, which is defined by the play that minimizes the score of your optimal answer and so forth.
Thus, by definition, if you don't play optimal, your opponent has at least one path that will give him a higher score than his best score if you played optimal.
One way to find out what is optimal is to brute force the entire game tree. For less than trivial problems you can use alpha-beta search, which guarantees optimum without needing to search the entire tree. If you tree is still too complex, you need a heuristic that estimates what the score of a "position" is and halts at a certain depth.
Was that understandable?
I was having problems with that precise question.
When you think about it for a bit you will get the idea that the minimax graph contains ALL possible games including the bad games. So if a player plays a sub optimal game then that game is part of the tree - but has been discarded in favor of a better game.
Its similar to alpha beta. I was getting stuck on what happens if I sacrifice some pieces intentionally to create space and then make a winning move through the gap. ie there is a better move further down the tree.
With alpha beta - lets say a sequence of losing moves followed by a killer move is in fact in the tree - but in that case the alpha and beta act as a window filter "a< x < b" and would have discarded it if YOU had a better game. You can see it in alpha beta if you imagine putting a +/- infinity into a pruned branch to see what happens.
In any case both algorithms recalculate every move so that if a player plays a sub optimal game them that will open up branches of the graph that are better for the opponent.
rinse repeat.
Consider a MIN node whose children are terminal nodes. If MIN plays suboptimally, then the value of the node is greater than or equal to the value it would have if MIN played optimally. Hence, the value of the MAX node that is the MIN node’s parent can only be increased. This argument can be extended by a simple induction all the way to the root. If the suboptimal play by MIN is predictable, then one can do better than a minimax strategy. For example, if MIN always falls for a certain kind of trap and loses, then setting the trap guarantees a win even if there is actually a devastating response for MIN.
Source: https://www.studocu.com/en-us/document/university-of-oregon/introduction-to-artificial-intelligence/assignments/solution-2-past-exam-questions-on-computer-information-system/1052571/view

Resources