utility functions minimax search - artificial-intelligence

Hi
I'm confused how you can determine the utility functions on with a minimax search
Explain it with any game that you can use a minimax search with
Basically i am asking how do you determine the utility functions
Cheers

The utility value is just some arbitrary value that the player receives when arriving at a certain state in the game. For instance, in Tic-tac-toe, your utility function could simply be 1 for a win, 0 for a tie, or -1 for a loss.
Running minmax on this would at best find a set of actions that result in 1 (a win).
Another example would be chess (not that you can feasibly run minimax on a game of chess). Say your utility function comes from a certain number that is based on the value of the piece you captured or lost

Determining the utility value of a move at a certain state has to do with the experience of the programmer and his/her knowledge of the game.
Utility values on a terminal state are kind of easy to determine. In Tic-tac-toe, for instance, a terminal state for player X is when the Xs are aligned in diagonal, vertically, or horizontally. Any move that creates such a state is a terminal state and you can create a function that checks that. If it is a terminal state, the function returns a 1 or -1.
If your player agent is player X and after player X's move it determines that player O will win, then the function returns a -1. The function returns a 1 if it determines that it is its own winning move.
If all cells are occupied with the last possible move and nobody has won, then the function returns a zero.
This is at terminal states only. It is critical to evaluate intermediate states because, even in a 3x3 game, there are lots of combinations to consider. If you include symmetrical moves you have 9! possible states in Tic-tac-toe. For those intermediate cases, you need to come up with an evaluation function that returns a score for each state as they related to other states.
Suppose that I assign the terminal state values of 810, 0, and -810. For each move, the score would be 810 / (# of moves). So if I reach a terminal state in 6 moves, the score would be 810/6 = 135. In 9 moves, the score would be 90. An evaluation function fashioned this way would favor moves that reach a terminal state faster. However, it still evaluates to a leaf node. We need to evaluate before reaching a leaf node, though, but this could also be part of an evaluation function.
Supposed that, in the game below, player 1 is X. So X moves next. The following are the legal moves (row, column) for X:
(1) 0,0
(2) 0,2
(3) 2,0
(4) 2,1
(5) 2,2
| |O| |
|O|X|X|
| | | |
The utility value for each move should favor the best moves.
The best moves, in this case, are either (2) or (5). So an evaluation function will assign a utility value of 81, for instance to each of those. Move (4) is the worst possible move for the X player (and would also warranty that you lose the game against an intelligent player) so the function would assign a value of -9 to that move. Moves (1) and (3), while not ideal, will not make you lose, so we might assign a 1.
So when minimax evaluates those 5 moves, because your player X, is max, the choice would be either (2) or (5).
If we focus on options (2) or (5), the game will be on a terminal state two moves after these. So, in reality, the evaluation function should look 2 moves ahead of the current legal moves to return the utility values. (This strategy follows the lines of depth limited search, where your function evaluates at a certain depth and produces a utility value without reaching a leaf node - or terminal state)
Now I'll circle back to my first statement. The utility value will be determined by an evaluation function coded per the programmer's knowledge of the game.
Hopefully, I'm not confusing you...

Related

How to improve performance using Transposition Table in Game Playing?

I have implemented iterative deepening with alpha-beta pruning in my game and I also added a Transposition Table to store already evaluated boards.
Right now, I am doing the following:
When running iterative deepening, at depth = 0 it evaluates and stores all positions with their scores in TT.
Now, when it re-runs with depth = 1. I simply return the value of the board if it exists in the TT. This stops the algorithm at depth = 0 as all values are already in the TT for depth = 0 boards.
If I return values from TT when the depth limit is reached eg. depth = MAX_DEPTH then big sub-trees will never be cut.
So, I am not understanding how should I re-use the values stored in the TT for making my game faster?
I will use chess for explanation rethoric in this answer, of course this reasoning with slight modifications can be applied for other board games as well.
Transposition Tables in board game programs are caches which store already evaluated boards in a cache. It is great to have an easy-to-handle cache value which would uniquely identify a position, like:
WKe5Qd6Pg2h3h4 BKa8Qa7
So if you get to a position, you check for the cache key to be present and if so, then reuse its evaluation. Whenever you visit a position at depth=0, after it's properly evaluated, it can be cached. So, if some moves are made, in the sub-variations you can more-or-less jump over the evaluation. For example, let's consider the example that in the starting position white moved 1. Nf3 and black replied 1... Nf6. After the resulting positions for both plies the position was cached, white's 2. Ng1 needs evaluation, since this was not evaluated nor cached yet, but Black's possible 2... Ng8 doesn't need to be evaluated, because it's resulting in the starting position.
Of course, you can do more aggressive caching and store positions up to depth = 1 or even more.
You will need to make sure that you do not miss some strategic details of the game. In the case of chess you will need to keep in mind:
the 50-move rule's effect
3-time repetition draw
who's on the move
were/are some special moves like castling or en-passant possible in the past/in the present and not at the other case
So, you might want to add some further nuances into your algorithm, but to answer the original question: positions already occurred in the game or being very high in the variation table can be cached and more-or-less ignored (the more means in most of the cases, the less means the nuances outlined above)

Better Heuristic function for a game (AI Minimax)

There is a game that I've programmed in java. The game is simple (refer to the figure below). There are 4 birds and 1 larva. It is a 2 player game (AI vs Human).
Larva can move diagonally forward AND diagonally backward
Birds can ONLY move diagonally forward
Larva wins if it can get to line 1 (fence)
Larva also wins if birds have no moves left
Birds CANNOT "eat" the larva.
Birds win if Larva has NO move left (cannot move at all)
When the game starts, Larva begins, then ONE bird can move (any one), then Larva, etc...
I have implemented a MiniMax (Alpha Beta Pruning) and I'm using the following evaluate() function (heuristic function).
Let us give the following numbers to each square on the board.
Therefore, our evaluation function will be
h(n) = value of position of larva - value of position of bird 1 - value of position of bird 2 - value of position of bird 3 - value of position of bird 4
the Larva will try to MAXIMIZE the heuristic value whereas the Birds will try to MINIMIZe it
Example:
However, this is a simple and naive heuristic. It does not act in a smart manner. I am a beginner in AI and I would like to know what can I do to IMPROVE this heuristic function?
What would be a good/informed heuristic?
How about this :
Maximum :larva
Minimum :birds
H(t)=max_distance(larva,line_8)+Σmin_distance(bird_n,larva)
or
H(t)=Σmin_distance(bird_n,larva) - min_distance(larva,line_1)
max_distance(larva,line_8): to reflect the condition that larva is closer to the line 1.
Σmin_distance(bird_n,larva): to reflect the condition that birds are closer to the larva(to block it).
I believe there are still many thing could be considered ,for example ,the bird closest to the larva should have high priority to be chosen to move, but the direction about the function above make sense , and many details can be thought to improve it easily.
There is 1 simple way to improve your heuristic considerably. In your current heuristic, the values of square A1 is 8 less than the value of square A8. This makes the Birds inclined to move towards the left side of the game board, as a move to the left will always be higher than a move to the right. This is nit accurate. All squares on row 1 should have the same value. Thus assign all squares in row 1 a 1, in row 2 a 2, etc. This way the birds and larva won't be inclined to move to the left, and instead can focus on making a good move.
You could take into account the fact that birds will have a positional advantage over the larva when the larva is on the sides of the board, so if Larva is MAX then change the side tile values of the board to be smaller.

Does the min player in the minimax algorithm play optimally?

In the minimax algorithm, the first player plays optimally, which means it wants to maximise its score, and the second player tries to minimise the first player's chances of winning. Does this mean that the second player also plays optimally to win the game? Trying to choose some path in order to minimise the first player's chances of winning also means trying to win?
I am actually trying to solve this task from TopCoder: EllysCandyGame. I wonder whether we can apply the minimax algorithm here. That statement "both to play optimally" really confuses me and I would like some advice how to deal with this type of problems, if there is some general idea.
Yes, you can use the minimax algorithm here.
The problem statement says that the winner of the game is "the girl who has more candies at the end of the game." So one reasonable scoring function you could use is the difference in the number of candies held by the first and second player.
Does this mean that the second player also plays optimally to win the game?
Yes. When you are evaluating a MIN level, the MIN player will always choose the path with the lowest score for the MAX player.
Note: both the MIN and MAX levels can be implemented with the same code, if you evaluate every node from the perspective of the player making the move in that round, and convert scores between levels. If the score is a difference in number of candies, you could simply negate it between levels.
Trying to choose some path in order to minimize the first player's chances of winning also means trying to win?
Yes. The second player is trying to minimize the first player's score. A reasonable scoring function will give the first player a lower score for a loss than a tie.
I wonder whether we can apply the minimax algorithm here.
Yes. If I've read the problem correctly, the number of levels will be equal to the number of boxes. If there's no limit on the number of boxes, you'll need to use an n-move lookahead, evaluating nodes in the minimax tree to a maximum depth.
Properties of the game:
At each point, there are a limited, well defined number of moves (picking one of the non-empty boxes)
The game ends after a finite number of moves (when all boxes are empty)
As a result, the search tree consists of a finite number of leafs. You are right that by applying Minimax, you can find the best move.
Note that you only have to evaluate the game at the final positions (when there are no more moves left). At that point, there are only three results: The first player won, the second player won, or it is a draw.
Note that the standard Minimax algorithm has nothing to do with probabilities. The result of the Minimax algorithm determines the perfect play for both side (assuming that both sides make no mistakes).
By the way, if you need to improve the search algorithm, a safe and simple optimization is to apply Alpha Beta pruning.

A* search algorithm heuristic function

I am trying to find the optimal solution to a Sliding Block Puzzle of any length using the A* algorithm.
The Sliding Block Puzzle is a game with white (W) and black tiles (B) arranged on a linear game board with a single empty space(-). Given the initial state of the board, the aim of the game is to arrange the tiles into a target pattern.
For example my current state on the board is BBW-WWB and I have to achieve BBB-WWW state.
Tiles can move in these ways :
1. slide into an adjacent empty space with a cost of 1.
2. hop over another tile into the empty space with a cost of 1.
3. hop over 2 tiles into the empty space with a cost of 2.
I have everything implemented, but I am not sure about the heuristic function. It computes the shortest distance (minimal cost) possible for a misplaced tile in current state to a closest placed same color tile in goal state.
Considering the given problem for the current state BWB-W and goal state BB-WW the heuristic function gives me a result of 3. (according to minimal distance: B=0 + W=2 + B=1 + W=0). But the actual cost of reaching the goal is not 3 (moving the misplaced W => cost 1 then the misplaced B => cost 1) but 2.
My question is: should I compute the minimal distance this way and don't care about the overestimation, or should I divide it by 2? According to the ways tiles can move, one tile can for the same cost overcome twice as much(see moves 1 and 2).
I tried both versions. While the divided distance gives better final path cost to the achieved goal, it visits more nodes => takes more time than the not divided one. What is the proper way to compute it? Which one should I use?
It is not obvious to me what an admissible heuristic function for this problem looks like, so I won't commit to saying, "Use the divided by two function." But I will tell you that the naive function you came up with is not admissible, and therefore will not give you good performance. In order for A* to work properly, the heuristic used must be admissible; in order to be admissible, the heuristic must absolutely always give an optimistic estimate. This one doesn't, for exactly the reason you highlight in your example.
(Although now that I think about it, dividing by two does seem like a reasonable way to force admissibility. I'm just not going to commit to it.)
Your heuristic is not admissible, so your A* is not guaranteed to find the optimal answer every time. An admissible heuristic must never overestimate the cost.
A better heuristic than dividing your heuristic cost by 3, would be: instead of adding the distance D of each letter to its final position, add ceil(D/2). This way, a letter 1 or 2 away, gets a 1 value, 3 or 4 away, gets a 2 value, an so on.

Understanding definition of minimax value

In Russell and Norvig, third edition, they give the following definition of the minimax value of a node in a game tree (zero-sum, perfect information, deterministic)
The minimax value of a node is the utility (for MAX) of being in the corresponding state, assuming that both players play optimally from there to the end of the game.
Only thing is, that in their setup of a game, the utility of a node is only defined for terminal nodes, so how should one understand the utility of a general node ? Thanks.
Utility is defined for non-terminal nodes. It's that for terminal nodes, utility is estimated by some external heuristic (which they call UTILITY), but for non-terminal nodes utility is computed by the minimax algorithm. The minimax value (or utility) of a non-terminal node is either the maximum or the minimum of the minimax values of its children (depending on whose move it is). The minimax value of the root will be the utility of the outcome you'll get to if both you and your opponent make optimal moves from there on out.
There's a worked-out example here that should make it clearer.
Each node should represent a gamestate, given a set of actions by each player.
Utility should be defined for each gamestate, and thereby, each node. It should represent how favorable a gamestate is for a player.
Minimax tree nodes are computed every other layer. That is to say, I don't evaluate the state of the game directly after my move, but instead after each time my opponent(s) make a move.
For a two player game:
I have X possible moves.
For each of my X possible moves, there is a gamestate. We do not need the utility of these gamestates.
For each of those X gamestates, my opponent has Y possible moves.
For each of those Y possible moves, there is another gamestate. We need the utilities of these gamestates.

Resources