Reinforcement learning wity incomplete information - artificial-intelligence

I want to program an AI using Reinforcement Learning. This game is Ghost! (https://en.wikipedia.org/wiki/Ghosts_(board_game)) :
In Ghosts!, each player has four good ghosts and four evil ghosts – but only the player who owns a ghost can see whether it's good or evil (as with the pieces in Stratego). These ghosts start the game in the back rows of a 6x6 game board with the corners removed. Each turn, a player moves one of his ghosts one square orthogonally. Moving into an opponent's ghost kills that ghost. To win, you must get rid of your own evil ghosts, kill your opponent's good ghosts, or move one of your good ghosts off the board from one of your opponent's corner squares.
Which algorithm would you use to program an AI ?

Related

What is genetic drift and how does it affect EAs?

I have read in some articles on evolutionary computing that the algorithms generally converge to a single solution due to the phenomenon of genetic drift. There is a lot of content on the Internet, but I can't get a deep understanding of this concept. I need to know, simply and precisely:
What is genetic drift in the context of evolutionary computing?
How does it affect the convergence of an evolutionary algorithm?
To better understand the original concept of genetic drift (Biology), I suggest you read this Khan Academy's article. Simply put, you can think of it as an evolutionary phenomenon in which the frequency of one or more alleles (versions of a gene) in a population changes due to random factors (unrelated to the fitness of each individual). If the fittest individual of a population is struck, out of bad luck, by a lightning and dies before reproducing, he won't leave offspring (although he has the highest fitness!). This is an example (somewhat absurd, I know) of genetic drift.
Now, in the specific context of evolutionary algorithms, this paper provides a good summary on the subject:
EAs genetic drift can be as a result of a combination of factors,
primarily related to selection, fitness function and representation.
It happens by unintentional loss of genotypes. For example, random
chance that a good genotype solution never gets selected for
reproduction. Or, if there is a ‘lifespan’ to a solution and it dies
before it can reproduce. Normally such a genotype only resides in the
population for a limited number of generations.
(Sloss & Gustafson, 2019)
Finally, I will give you a real example of genetic drift acting on a genetic algorithm. Recently, I've used a simple neuroevolution algorithm to create an agent capable of playing the Snake game (GitHub repo). In my implementation of the game, the apples appear in random positions of the screen. When executing the evolutionary process for the first time, I noticed a big fluctuation in the population's best fitness between consecutive generations - overall, it wasn't improving much. Because of this, my algorithm was unable to converge to a good solution.
After some debugging, I found out that this was being caused by genetic drift. Because the apples spawned in random positions, some individuals, not necessarily the fittest, were lucky and got "easy apples", thus achieving a high fitness and leaving more offspring. Do you see the problem here?
Suppose that snake A is better at the game than snake B, because it can move towards the food, while B only moves randomly. Now, suppose that the first food that appeared for snake A was in a corner of the screen (a difficult position) and A died shortly after eating the apple. Now, suppose that snake B was lucky enough to have 3 apples spawn in a row, one after the other. Although B is "dumber" than A, it will leave more offspring, because it achieved a greater fitness. B's offspring will "pollute" the next generation, because they'll probably be "dumb" like B.
I solved the problem using a better apple positioning algorithm (I defined a minimum distance between the spawning position of two consecutive apples) and by calculating each individual's final fitness as the average of its fitness in several playing sessions. This greatly reduced (although it did not eliminate) the interference of genetic drift in my algorithm.
I hope this helps. You can also take a look at this video (it's in Portuguese, but English subtitles are available), where I explain some of the strategies I used to make the Snake AI.

TSP with vertices may be visited more than once

In a study project my project group is working on a robot which drives on a 5x5 with mines. We don't know the locations of the mines nor can we drive over the mines (only the sensor is allowed to sense it).
Our goal is to scan this matrix (as fast as possible) for mines with a starting point located below matrix point (1,5). We decided that the edges and the starting point of this matrix will be our vertices for the TSP problem, but we are stuck at finding a good algorithm which can (if needed) cross an edge mutiple times if this is faster.
The only thing we have at the moment is a 41x41 matrix with all possible ways to go from an edge to another edge (which can be altered to infinte if a mine is detected) and a backup plan to predefine a route and if a mine is detected send it to the next point.
What is the fastest algorithm which can tackle our problem and can you also provide an example c code or idea how to create it?
To answer the general question (how do you solve TSP with vertices that can be visited more than once): Just run the Floyd-Warshall algorithm before running a standard TSP solver.
To answer your specific question: since you don't know where the mines are at the start of the game you'll have to recompute your route each time you discover that your route is blocked.

How to account for move order in chess board evaluation

I am programming a Chess AI using an alpha-beta pruning algorithm that works at fixed depth. I was quite surprised to see that by setting the AI to a higher depth, it played even worse. But I think I figured it why so.
It currently works that way : All positions are listed, and for each of them, every other positions from that move is listed and so on... Until the fixed depth is reached : the board is evaluated by checking what pieces are present and by setting a value for every piece types. Then, the value bubbles up to the root using the minimax algorithm with alpha-beta.
But I need to account for the move order. For instance, there is two options, a checkmate in 2 moves, and another in 7 moves, then the first one has to be chosen. The same thing goes to taking a queen in whether 3 or 6 moves.
But since I only evaluate the board at the deepest nodes and that I only check the board as the evaluation result, it doesn't know what was the previous moves were.
I'm sure there is a better way to evaluate the game that can account for the way the pieces moved through the search.
EDIT: I figured out why it was playing weird. When I searched for moves (depth 5), it ended with a AI move (a MAX node level). By doing so, it counted moves such as taking a knight with a rook, even if it made the latter vulnerable (the algorithm cannot see it because it doesn't search deeper than that).
So I changed that and I set depth to 6, so it ends with a MIN node level.
Its moves now make more sense as it actually takes revenge when attacked (what it sometimes didn't do and instead played a dumb move).
However, it is now more defensive than ever and does not play : it moves its knight, then moves it back to the place it was before, and therefore, it ends up losing.
My evaluation is very standard, only the presence of pieces matters to the node value so it is free to pick the strategy it wants without forcing it to do stuff it doesn't need to.
Consedering that, is that a normal behaviour for my algorithm ? Is it a sign that my alpha-beta algorithm is badly implemented or is it perfectly normal with such an evaluation function ?
If you want to select the shortest path to a win, you probably also want to select the longest path to a loss. If you were to try to account for this in the evaluation function, you would have to the path length along with the score and have separate evaluation functions for min and max. It's a lot of complex and confusing overhead.
The standard way to solve this problem is with an iterative deepening approach to the evaluation. First you search deep enough for 1 move for all players, then you run the entire search again searching 2 moves for each player, etc until you run out of time. If you find a win in 2 moves, you stop searching and you'll never run into the 7 moves situation. This also solves your problem of searching odd depths and getting strange evaluations. It has many other benefits, like always having a move ready to go when you run out of time, and some significant algorithmic improvements because you won't need the overhead of tracking visited states.
As for the defensive play, that is a little bit of the horizon effect and a little bit of the evaluation function. If you have a perfect evaluation function, the algorithm only needs to see one move deep. If it's not perfect (and it's not), then you'll need to get much deeper into search. Last I checked, algorithms that can run on your laptop and see about 8 plys deep (a ply is 1 move for each player) can compete with strong humans.
In order to let the program choose the shortest checkmate, the standard approach is to give a higher value to mates that occur closer to the root. Of course, you must detect checkmates, and give them some score.
Also, from what you describe, you need a quiescence search.
All of this (and much more) is explained in the chess programming wiki. You should check it out:
https://chessprogramming.wikispaces.com/Checkmate#MateScore
https://chessprogramming.wikispaces.com/Quiescence+Search

Beating a minimax opponent

I have to create an AI which has to compete against other AIs.
Both AIs will run on the same hardware, have the same amount of processing time and memory. I know the opponent AI will be using the minimax algorithm with alpha beta pruning.
Now my question is - what are some approaches for beating such an opponent? If I use minimax myself - then both AI's perfectly predict each other's moves and the game resolves based on an inherent property of the game (first move wins etc).
The obvious solution is to somehow see further ahead into the possible moves which would allow for a better evaluation - since the processor time is the same I couldn't evaluate to a greater depth (assuming the opposing AI code is equally optimized). I could use a precomputed tree for an extra advantage but without a super computer I certainly couldn't "solve" any nontrivial game.
Is there some value in intentionally picking a non optimal node such as one that alpha beta would have pruned? This could potentially incur a CPU time penalty on the opponent as they'd have to go back and re-evaluate the tree. It would incur a penalty on me as well as I'd have to evaluate the minimax tree + alpha beta to see which nodes alpha beta would prune without reaping any direct benefits.
What are some other strategies for optimizing against such an opponent?
First, there isn't any value in choosing a non-optimal line of play. Assuming your opponent will play optimally (and that's a fundamental assumption of minimax search), your opponent will make a move that capitalizes on the mistake. A good game engine will have a hashed refutation table entry containing the countermove for your blunder, so you'll gain no time by making a wild move. Making bad moves allows a computer opponent to find good moves faster.
The key thing to realize with a game like Othello is that you can't be sure what the optimal move is until late in the game. That's because the search tree is almost always too large to be exhaustively searched for all won or lost positions, and so minimax can't tell you with certainty which moves will lead to victory or defeat. You can only heuristically decide where to stop searching, arbitrarily call those nodes "terminal", and then run an evaluation function that guesses the win/loss potential of a position.
The evaluation function's job is to assess the value of a position, typically using static metrics that can be computed without searching the game tree further. Piece counts, positional features, endgame tablebases, and even opponent psychology can play a role here. The more intelligence you put into your evaluation function, generally the better your engine will play. But the point of static evaluation is replace searches that would be too expensive. If your evaluation function does too much or does what it does too inefficiently, it can become slower than the game tree search needed to obtain the same information. Knowing what to put in an evaluation function and when to use static evaluation instead of search is a large part of the art of writing a good game engine.
There are a lot of ways to improve standard minimax with AB pruning. For example, the killer heuristic attempts to improve the order moves are looked at, since AB's efficiency is better with well-ordered moves.
A lot of information on different search enhancements and variations on AB can be found at chessprogramming.wikispaces.com.

Is the board game "Go" NP complete?

There are plenty of Chess AI's around, and evidently some are good enough to beat some of the world's greatest players.
I've heard that many attempts have been made to write successful AI's for the board game Go, but so far nothing has been conceived beyond average amateur level.
Could it be that the task of mathematically calculating the optimal move at any given time in Go is an NP-complete problem?
Chess and Go are both EXPTIME complete. IIRC, Go has more possible moves, so I think it a higher multiple of that complexity class than chess. Wikipedia has a good article on the complexity of Go.
Even if Go is merely in P it could still be something horrendous like O(n^m) where n is the number of spaces and m is some (large) fixed number. Even being in P doesn't make something reasonable to compute.
Neither Chess or Go AIs completely evaluate all possibilities before deciding on a move.
Chess AIs use various heuristics to narrow down the search space, and to quantify how 'good' a given position on the board happens to be. This can be done recursively by evaluating possible board positions 14-15 moves ahead and choosing a path that leads to a good position.
There's a bit of 'magic' in how a board position is quantified so the that at the top level, the AI can simply go Move A > Move B therefore lets do Move A. But since there's a limited number of pieces and they all have quantifiable value a 'good enough' algorithm can be implemented.
But it turns out to be a lot harder for a program to evaluate two possible board positions in Go and make that A > B calculation. Without that critical piece its a little hard to make the rest of the AI work.

Resources