Heuristic function for Pylos game - artificial-intelligence

Pylos is a game constituted of a 4x4 pyramid board (4x4 below a 3x3 below a 2x2 below a 1). There are two players, one with White marbles and the other with Black marbles.
Each player has 15 marbles initially and takes turns placing a marble on the board on a free square (or if it is on a higher level, the 4 'support' squares of the lower level must be occupied).
The goal is that the opponent has no more marbles in his stock.
If you complete a square of marbles of the same color, you can remove two of your marbles from the board.
If you can move a marble to a higher level, you can do so (you save putting down a marble).
In short, my goal is to implement the best possible strategy for this game. For that, I have implemented a MinMax and I need a heuristic evaluation function. I can't go deeper than depth 4 in MinMax.
The naive heuristic returns the difference between my number of marbles in stock and that of my opponent.
I have tried to improve the heuristic by implementing same-color square detection, and upward movement detection, and also by giving importance to constraining an opponent's move, but the naive strategy sometimes still wins if it plays 2nd.
If you have any ideas for improvement, I would be grateful.

Related

Game playing AI, how to find a good board evaluation function?

I'm working on an AI to play a fairly simple game, using minimax and genetic algorithms to find weights to score board states with.
The game resembles 4x4 tictactoe, but a turn can be spent to move a piece to an adjacent space, pieces come in different sizes, and larger pieces can cover up smaller pieces.
I want to score the board by looking at a variety of factors, such as how close they are to completing a 4 in a row, and how many adjacent enemy pieces could potentially be moved onto, but I have no idea what these factors should specifically be.
My ideas:
For each line, making a scoring expression based on number of friendly pieces, number of empty spaces, and number of enemy pieces, but I can't think of a simple expression to score that with weights, since the value probably won't be a linear function.
For each line, making a piecewise scoring expression split based on the number of enemy pieces in the row, and an expression based on the number of allies. Hence having 1 piece in an empty row might be worth more than having 1 piece in a row full of enemies, thus blocking them off, and the inverse would be true for having 3 in a row in a row that is already blocked.
Some complications I've noticed:
Having 3 pieces in a row, but then one of the enemy large pieces also in the row, is virtually worthless for anything except preventing their piece's movement.
Having 3 pieces in a row, with a small enemy piece in that row is almost a win, if you can place a large piece adjacent to their small piece to move onto it. This seems particularly hard to detect. It's also possible if this is worked into a factor, the above "number of adjacent enemies that can be moved onto" won't be necessary.
Thanks for any help. I have no clue how to proceed.

Better Heuristic function for a game (AI Minimax)

There is a game that I've programmed in java. The game is simple (refer to the figure below). There are 4 birds and 1 larva. It is a 2 player game (AI vs Human).
Larva can move diagonally forward AND diagonally backward
Birds can ONLY move diagonally forward
Larva wins if it can get to line 1 (fence)
Larva also wins if birds have no moves left
Birds CANNOT "eat" the larva.
Birds win if Larva has NO move left (cannot move at all)
When the game starts, Larva begins, then ONE bird can move (any one), then Larva, etc...
I have implemented a MiniMax (Alpha Beta Pruning) and I'm using the following evaluate() function (heuristic function).
Let us give the following numbers to each square on the board.
Therefore, our evaluation function will be
h(n) = value of position of larva - value of position of bird 1 - value of position of bird 2 - value of position of bird 3 - value of position of bird 4
the Larva will try to MAXIMIZE the heuristic value whereas the Birds will try to MINIMIZe it
Example:
However, this is a simple and naive heuristic. It does not act in a smart manner. I am a beginner in AI and I would like to know what can I do to IMPROVE this heuristic function?
What would be a good/informed heuristic?
How about this :
Maximum :larva
Minimum :birds
H(t)=max_distance(larva,line_8)+Σmin_distance(bird_n,larva)
or
H(t)=Σmin_distance(bird_n,larva) - min_distance(larva,line_1)
max_distance(larva,line_8): to reflect the condition that larva is closer to the line 1.
Σmin_distance(bird_n,larva): to reflect the condition that birds are closer to the larva(to block it).
I believe there are still many thing could be considered ,for example ,the bird closest to the larva should have high priority to be chosen to move, but the direction about the function above make sense , and many details can be thought to improve it easily.
There is 1 simple way to improve your heuristic considerably. In your current heuristic, the values of square A1 is 8 less than the value of square A8. This makes the Birds inclined to move towards the left side of the game board, as a move to the left will always be higher than a move to the right. This is nit accurate. All squares on row 1 should have the same value. Thus assign all squares in row 1 a 1, in row 2 a 2, etc. This way the birds and larva won't be inclined to move to the left, and instead can focus on making a good move.
You could take into account the fact that birds will have a positional advantage over the larva when the larva is on the sides of the board, so if Larva is MAX then change the side tile values of the board to be smaller.

Proper Heuristic Mechanism For Hill Climbing

The following problem is an exam exercise I found from an Artificial Intelligence course.
"Suggest a heuristic mechanism that allows this problem to be solved, using the Hill-Climbing algorithm. (S=Start point, F=Final point/goal). No diagonal movement is allowed."
Since it's obvious that Manhattan Distance or Euclidean Distance will send the robot at (3,4) and no backtracking is allowed, what is a possible solution (heuristic mechanism) to this problem?
EDIT: To make the problem clearer, I've marked some of the Manhattan distances on the board:
It would be obvious that, using Manhattan distance, the robot's next move would be at (3,4) since it has a heuristic value of 2 - HC will choose that and get stuck forever. The aim is try and never go that path by finding the proper heuristic algorithm.
I thought of the obstructions as being hot, and that heat rises. I make the net cost of a cell the sum of the Manhattan metric distance to F plus a heat-penalty. Thus there is an attractive force drawing the robot towards F as well as a repelling force which forces it away from the obstructions.
There are two types of heat penalties:
1) It is very bad to touch an obstruction. Look at the 2 or 3 cells neighboring cells in the row immediately below a given cell. Add 15 for every obstruction cell which is directly below the given cell and 10 for every diagonal neighbor which is directly below
2) For cells not in direct contact with the instructions -- the heat is more diffuse. I calculate it as 6 times the average number of obstruction blocks below the cell both in its column and in its neighboring columns.
The following shows the result of combining this all, as well as the path taken from S to F:
A crucial point it the way that the averaging causes the robot to turn left rather than right when it hits the top row. The unheated columns towards the left make that the cooler direction. It is interesting to note how all cells (with the possible exception of the two at the upper-right corner) are drawn to F by this heuristic.

Does the min player in the minimax algorithm play optimally?

In the minimax algorithm, the first player plays optimally, which means it wants to maximise its score, and the second player tries to minimise the first player's chances of winning. Does this mean that the second player also plays optimally to win the game? Trying to choose some path in order to minimise the first player's chances of winning also means trying to win?
I am actually trying to solve this task from TopCoder: EllysCandyGame. I wonder whether we can apply the minimax algorithm here. That statement "both to play optimally" really confuses me and I would like some advice how to deal with this type of problems, if there is some general idea.
Yes, you can use the minimax algorithm here.
The problem statement says that the winner of the game is "the girl who has more candies at the end of the game." So one reasonable scoring function you could use is the difference in the number of candies held by the first and second player.
Does this mean that the second player also plays optimally to win the game?
Yes. When you are evaluating a MIN level, the MIN player will always choose the path with the lowest score for the MAX player.
Note: both the MIN and MAX levels can be implemented with the same code, if you evaluate every node from the perspective of the player making the move in that round, and convert scores between levels. If the score is a difference in number of candies, you could simply negate it between levels.
Trying to choose some path in order to minimize the first player's chances of winning also means trying to win?
Yes. The second player is trying to minimize the first player's score. A reasonable scoring function will give the first player a lower score for a loss than a tie.
I wonder whether we can apply the minimax algorithm here.
Yes. If I've read the problem correctly, the number of levels will be equal to the number of boxes. If there's no limit on the number of boxes, you'll need to use an n-move lookahead, evaluating nodes in the minimax tree to a maximum depth.
Properties of the game:
At each point, there are a limited, well defined number of moves (picking one of the non-empty boxes)
The game ends after a finite number of moves (when all boxes are empty)
As a result, the search tree consists of a finite number of leafs. You are right that by applying Minimax, you can find the best move.
Note that you only have to evaluate the game at the final positions (when there are no more moves left). At that point, there are only three results: The first player won, the second player won, or it is a draw.
Note that the standard Minimax algorithm has nothing to do with probabilities. The result of the Minimax algorithm determines the perfect play for both side (assuming that both sides make no mistakes).
By the way, if you need to improve the search algorithm, a safe and simple optimization is to apply Alpha Beta pruning.

Find optimal/good-enough strategy and AI for the game 'Proximity'?

'Proximity' is a strategy game of territorial domination similar to Othello, Go and Risk.
Two players, uses a 10x12 hex grid. Game invented by Brian Cable in 2007.
Seems to be a worthy game for discussing a) optimal algorithm then b) how to build an AI.
Strategies are going to be probabilistic or heuristic-based, due to the randomness factor, and the insane branching factor (20^120).
So it will be kind of hard to compare objectively.
A compute time limit of 5 seconds max per turn seems reasonable => this rules out all brute-force attempts. (Play the game's AI on Expert level to get a feel - it does a very good job based on some simple heuristic)
Game: Flash version here, iPhone version iProximity here and many copies elsewhere on the web
Rules: here
Object: to have control of the most armies after all tiles have been placed. You start with an empty hexboard. Each turn you receive a randomly numbered tile (value between 1 and 20 armies) to place on any vacant board space. If this tile is adjacent to any ALLY tiles, it will strengthen each of those tile's defenses +1 (up to a max value of 20). If it is adjacent to any ENEMY tiles, it will take control over them IF its number is higher than the number on the enemy tile.
Thoughts on strategy: Here are some initial thoughts; setting the computer AI to Expert will probably teach a lot:
minimizing your perimeter seems to be a good strategy, to prevent flips and minimize worst-case damage
like in Go, leaving holes inside your formation is lethal, only more so with the hex grid because you can lose armies on up to 6 squares in one move
low-numbered tiles are a liability, so place them away from your main territory, near the board edges and scattered. You can also use low-numbered tiles to plug holes in your formation, or make small gains along the perimeter which the opponent will not tend to bother attacking.
a triangle formation of three pieces is strong since they mutually reinforce, and also reduce the perimeter
Each tile can be flipped at most 6 times, i.e. when its neighbor tiles are occupied. Control of a formation can flow back and forth. Sometimes you lose part of a formation and plug any holes to render that part of the board 'dead' and lock in your territory/ prevent further losses.
Low-numbered tiles are obvious-but-low-valued liabilities, but high-numbered tiles can be bigger liabilities if they get flipped (which is harder). One lucky play with a 20-army tile can cause a swing of 200 (from +100 to -100 armies). So tile placement will have both offensive and defensive considerations.
Comment 1,2,4 seem to resemble a minimax strategy where we minimize the maximum expected possible loss (modified by some probabilistic consideration of the value ß the opponent can get from 1..20 i.e. a structure which can only be flipped by a ß=20 tile is 'nearly impregnable'.)
I'm not clear what the implications of comments 3,5,6 are for optimal strategy.
Interested in comments from Go, Chess or Othello players.
(The sequel ProximityHD for XBox Live, allows 4-player -cooperative or -competitive local multiplayer increases the branching factor since you now have 5 tiles in your hand at any given time, of which you can only play one. Reinforcement of ally tiles is increased to +2 per ally.)
A former member of the U of A GAMES group here.
That branching factor is insane. Far worse than Go.
Basically, you're hooped.
The problem with this game is that it is not deterministic due to the selection of a random tile. This actually adds another layer of nodes between each existing layer of nodes in the tree. You'll be interested in my publications on *-Minimax to learn about techniques for searching in stochastic domains.
In order to complete one-ply searches before the end of this century, you're going to need some very aggressive forward pruning techniques. Throw provably best move out the window early and concentrate on building good move ordering.
For general algorithms, I would suggest you to check the research done by the Alberta University AI Games group: http://games.cs.ualberta.ca Many of the algorithms there guarantee to find optimal policies. However, I doubt you're really interested in finding the optimal, aim for the "good enough" unless you want to sell that game in Korea :D
From your description, I have understood the game to be a two-player with full-observability i.e. no hidden units and such and fully deterministic i.e. player's actions outcomes do not require rolling, then you should take a look at the real-time bounded-search minimax derivatives proposed by the U Alberta guys. However, being able to do bound as well the depth of the backups of the value function would perhaps be a nice way to add a "difficulty level" to your game. They have been doing some work - a bit fishy imo - on sampling the search space for improving value function estimates.
About the "strategy" section you describe: in the framework I am mentioning, you will have to encode that knowledge as an evaluation function. Look at the work of Michael Büro and others - also in the U Alberta group - for examples of such knowledge engineering.
Another possibility would be to pose the problem as a Reinforcement Learning problem, where adversary moves are compiled as "afterstates". Look that up on the Barto & Sutton book: http://webdocs.cs.ualberta.ca/~sutton/book/the-book.html However the value function for a RL problem resulting from such a compilation might prove a bit difficult to solve optimally - the number of states will blow up like an H-Bomb. However, if you see how to use a factored representation, things can be much easier. And your "strategy" could perhaps be encoded as some shaping function, which would be speeding up the learning process considerably.
EDIT: Damn English prepositions

Resources