I am building a 2048 AI, and it is leading to a rather peculiar observation (peculiar enough to me).
The optimizations are not up to the mark right now (coupled with the fact that the code is written in python), which is letting me reach till only a depth of 3 moves (plies).
As evident from the results, Expectimax is quite dominant over minimax (similar results can be seen without alpha-beta pruning in minimax) in terms of results produced. Both use the same evaluation function and do not proceed any further than 3 moves.
AFAIK, minimax should work optimally in such games, but that doesn't seem to be the case here. My question is, this observation is due to the fact that:
I am not going deep enough into the search tree?
2048 is a stochastic game, and that is hampering the performance of minimax (or boosting the performance of expectimax)?
The opponent (the 2048 game logic) is not playing optimally (90-10 % chance of putting a 2-4 tile, random adversary) (if yes, then why should this affect the performance of minimax)?
Anything else that is not apparent to me?
Related
I have read in some articles on evolutionary computing that the algorithms generally converge to a single solution due to the phenomenon of genetic drift. There is a lot of content on the Internet, but I can't get a deep understanding of this concept. I need to know, simply and precisely:
What is genetic drift in the context of evolutionary computing?
How does it affect the convergence of an evolutionary algorithm?
To better understand the original concept of genetic drift (Biology), I suggest you read this Khan Academy's article. Simply put, you can think of it as an evolutionary phenomenon in which the frequency of one or more alleles (versions of a gene) in a population changes due to random factors (unrelated to the fitness of each individual). If the fittest individual of a population is struck, out of bad luck, by a lightning and dies before reproducing, he won't leave offspring (although he has the highest fitness!). This is an example (somewhat absurd, I know) of genetic drift.
Now, in the specific context of evolutionary algorithms, this paper provides a good summary on the subject:
EAs genetic drift can be as a result of a combination of factors,
primarily related to selection, fitness function and representation.
It happens by unintentional loss of genotypes. For example, random
chance that a good genotype solution never gets selected for
reproduction. Or, if there is a ‘lifespan’ to a solution and it dies
before it can reproduce. Normally such a genotype only resides in the
population for a limited number of generations.
(Sloss & Gustafson, 2019)
Finally, I will give you a real example of genetic drift acting on a genetic algorithm. Recently, I've used a simple neuroevolution algorithm to create an agent capable of playing the Snake game (GitHub repo). In my implementation of the game, the apples appear in random positions of the screen. When executing the evolutionary process for the first time, I noticed a big fluctuation in the population's best fitness between consecutive generations - overall, it wasn't improving much. Because of this, my algorithm was unable to converge to a good solution.
After some debugging, I found out that this was being caused by genetic drift. Because the apples spawned in random positions, some individuals, not necessarily the fittest, were lucky and got "easy apples", thus achieving a high fitness and leaving more offspring. Do you see the problem here?
Suppose that snake A is better at the game than snake B, because it can move towards the food, while B only moves randomly. Now, suppose that the first food that appeared for snake A was in a corner of the screen (a difficult position) and A died shortly after eating the apple. Now, suppose that snake B was lucky enough to have 3 apples spawn in a row, one after the other. Although B is "dumber" than A, it will leave more offspring, because it achieved a greater fitness. B's offspring will "pollute" the next generation, because they'll probably be "dumb" like B.
I solved the problem using a better apple positioning algorithm (I defined a minimum distance between the spawning position of two consecutive apples) and by calculating each individual's final fitness as the average of its fitness in several playing sessions. This greatly reduced (although it did not eliminate) the interference of genetic drift in my algorithm.
I hope this helps. You can also take a look at this video (it's in Portuguese, but English subtitles are available), where I explain some of the strategies I used to make the Snake AI.
I am programming a Chess AI using an alpha-beta pruning algorithm that works at fixed depth. I was quite surprised to see that by setting the AI to a higher depth, it played even worse. But I think I figured it why so.
It currently works that way : All positions are listed, and for each of them, every other positions from that move is listed and so on... Until the fixed depth is reached : the board is evaluated by checking what pieces are present and by setting a value for every piece types. Then, the value bubbles up to the root using the minimax algorithm with alpha-beta.
But I need to account for the move order. For instance, there is two options, a checkmate in 2 moves, and another in 7 moves, then the first one has to be chosen. The same thing goes to taking a queen in whether 3 or 6 moves.
But since I only evaluate the board at the deepest nodes and that I only check the board as the evaluation result, it doesn't know what was the previous moves were.
I'm sure there is a better way to evaluate the game that can account for the way the pieces moved through the search.
EDIT: I figured out why it was playing weird. When I searched for moves (depth 5), it ended with a AI move (a MAX node level). By doing so, it counted moves such as taking a knight with a rook, even if it made the latter vulnerable (the algorithm cannot see it because it doesn't search deeper than that).
So I changed that and I set depth to 6, so it ends with a MIN node level.
Its moves now make more sense as it actually takes revenge when attacked (what it sometimes didn't do and instead played a dumb move).
However, it is now more defensive than ever and does not play : it moves its knight, then moves it back to the place it was before, and therefore, it ends up losing.
My evaluation is very standard, only the presence of pieces matters to the node value so it is free to pick the strategy it wants without forcing it to do stuff it doesn't need to.
Consedering that, is that a normal behaviour for my algorithm ? Is it a sign that my alpha-beta algorithm is badly implemented or is it perfectly normal with such an evaluation function ?
If you want to select the shortest path to a win, you probably also want to select the longest path to a loss. If you were to try to account for this in the evaluation function, you would have to the path length along with the score and have separate evaluation functions for min and max. It's a lot of complex and confusing overhead.
The standard way to solve this problem is with an iterative deepening approach to the evaluation. First you search deep enough for 1 move for all players, then you run the entire search again searching 2 moves for each player, etc until you run out of time. If you find a win in 2 moves, you stop searching and you'll never run into the 7 moves situation. This also solves your problem of searching odd depths and getting strange evaluations. It has many other benefits, like always having a move ready to go when you run out of time, and some significant algorithmic improvements because you won't need the overhead of tracking visited states.
As for the defensive play, that is a little bit of the horizon effect and a little bit of the evaluation function. If you have a perfect evaluation function, the algorithm only needs to see one move deep. If it's not perfect (and it's not), then you'll need to get much deeper into search. Last I checked, algorithms that can run on your laptop and see about 8 plys deep (a ply is 1 move for each player) can compete with strong humans.
In order to let the program choose the shortest checkmate, the standard approach is to give a higher value to mates that occur closer to the root. Of course, you must detect checkmates, and give them some score.
Also, from what you describe, you need a quiescence search.
All of this (and much more) is explained in the chess programming wiki. You should check it out:
https://chessprogramming.wikispaces.com/Checkmate#MateScore
https://chessprogramming.wikispaces.com/Quiescence+Search
I have implemented a connect 4 AI to play in a tournament for my class. I have implemented a depth limited minimax with alpha-beta pruning. We are allowed to give one depth to give as an argument for the tournament. My program will make a move, then another student will make a move, and this continues until there is a winner. It is also a modified connect 4 in which the entire 42 spots in the 6×7 game board are filled, and every 4-in-a-row is a point, and the most points wins.
My question is about the alpha-beta pruning. Our moves have to take "about 1 second", so anything under 2 seconds should be fine. Running my program without alpha-beta pruning allows a move about 1.3 seconds or less at depth 6. Depth 7 is unacceptable. Now, with alpha-beta pruning, can I guarantee that I can change my depth to go deeper? I know on average that it will let me go deeper, but I believe on the worst case nothing gets pruned, and I would exceed the time limit. Is this correct?
This IS correct: in the worst-case scenario, alpha-beta is just as slow as minimax.
Bit there is a really small chance for that to happen. To optimise alphabeta and prevent that problem, search "move ordering alpha beta" on google.
If you have to stay inside a time limit, I suggest to use iterative deepening (search with depth 1, 2, ..., x). That should not be a problem due to the exponential explosion. If your program runs out of time, just play the move you figured out with the previous searchdepth.
I have to create an AI which has to compete against other AIs.
Both AIs will run on the same hardware, have the same amount of processing time and memory. I know the opponent AI will be using the minimax algorithm with alpha beta pruning.
Now my question is - what are some approaches for beating such an opponent? If I use minimax myself - then both AI's perfectly predict each other's moves and the game resolves based on an inherent property of the game (first move wins etc).
The obvious solution is to somehow see further ahead into the possible moves which would allow for a better evaluation - since the processor time is the same I couldn't evaluate to a greater depth (assuming the opposing AI code is equally optimized). I could use a precomputed tree for an extra advantage but without a super computer I certainly couldn't "solve" any nontrivial game.
Is there some value in intentionally picking a non optimal node such as one that alpha beta would have pruned? This could potentially incur a CPU time penalty on the opponent as they'd have to go back and re-evaluate the tree. It would incur a penalty on me as well as I'd have to evaluate the minimax tree + alpha beta to see which nodes alpha beta would prune without reaping any direct benefits.
What are some other strategies for optimizing against such an opponent?
First, there isn't any value in choosing a non-optimal line of play. Assuming your opponent will play optimally (and that's a fundamental assumption of minimax search), your opponent will make a move that capitalizes on the mistake. A good game engine will have a hashed refutation table entry containing the countermove for your blunder, so you'll gain no time by making a wild move. Making bad moves allows a computer opponent to find good moves faster.
The key thing to realize with a game like Othello is that you can't be sure what the optimal move is until late in the game. That's because the search tree is almost always too large to be exhaustively searched for all won or lost positions, and so minimax can't tell you with certainty which moves will lead to victory or defeat. You can only heuristically decide where to stop searching, arbitrarily call those nodes "terminal", and then run an evaluation function that guesses the win/loss potential of a position.
The evaluation function's job is to assess the value of a position, typically using static metrics that can be computed without searching the game tree further. Piece counts, positional features, endgame tablebases, and even opponent psychology can play a role here. The more intelligence you put into your evaluation function, generally the better your engine will play. But the point of static evaluation is replace searches that would be too expensive. If your evaluation function does too much or does what it does too inefficiently, it can become slower than the game tree search needed to obtain the same information. Knowing what to put in an evaluation function and when to use static evaluation instead of search is a large part of the art of writing a good game engine.
There are a lot of ways to improve standard minimax with AB pruning. For example, the killer heuristic attempts to improve the order moves are looked at, since AB's efficiency is better with well-ordered moves.
A lot of information on different search enhancements and variations on AB can be found at chessprogramming.wikispaces.com.
I hope this isn't too much of an arbitrary question, but I have been looking through the source codes of Faile and TSCP and I have been playing them against each other. As far as I can see the engines have a lot in common, yet Faile searches ~1.3 million nodes per second while TSCP searches only 300k nodes per second.
The source code for faile can be found here: http://faile.sourceforge.net/download.php. TSCP source code can be found here: http://www.tckerrigan.com/Chess/TSCP.
After looking through them I see some similarities: both use an array board representation (although Faile uses a 144 size board), both use a alpha beta search with some sort of transposition table, both have very similar evaluate functions. The main difference I can find is that Faile uses a redundant representation of the board by also having arrays of the piece locations. This means that when the moves are generated (by very similar functions for both programs), Faile has to for loop through fewer bad pieces, while maintaining this array costs considerably fewer resources.
My question is: why is there a 4x difference in the speed of these two programs? Also, why does Faile consistently beat TSCP (I estimate about a ~200 ELO difference just by watching their moves)? For the latter, it seems to be because Faile is searching several plies deeper.
Short answer: TSCP is very simple (as you can guess from its name). Faile is more advanced, some time was spent by developers to optimize it. So it is just reasonable for Faile to be faster, which means also deeper search and higher ELO.
Long answer: As far as I remember, the most important part of the program, using alpha beta search (part which influences performance the most), is move generator. TSCP's move generator does not generate moves in any particular order. Faile's generator (as you noticed), uses piece list, which is sorted in order of decreasing piece value. This means it generates more important moves first. This allows alpha-beta pruning to cut more unneeded moves and makes search tree less branchy. And less branchy tree may be deeper and still have the same number of nodes, which allows deeper search.
Here is a very simplified example how the order of moves allows faster search. Suppose, last white's move was silly - they moved some piece to unprotected position. If we find some black's move that removes this piece, we can ignore all other, not yet estimated moves and return back to processing white's move list. Queen controls much more space than a pawn, so it has more chances to remove this piece, so if we look at queen's moves first, we can more likely skip more unneeded moves.
I didn't compare other parts of these programs. But most likely, Faile optimizes them better as well. Things like alpha-beta algorithm itself, variable depth of the search tree, static position analysis may be also optimized.
TSCP has not hash tables (-75 ELO).
TSCP has not Killers moves for ordering (-50 ELO).
TSCP has not null move (-100 ELO).
TSCP has a bad attack function design (-25 ELO).
In these 4 things you have about a difference of 250 points ELO. This will increase the number of nodes per second but you can not compare nodes per second on different engines as programmers can use a different interpretation of what is a node.