Using minimax search for card games with imperfect information - artificial-intelligence

I want to use minimax search (with alpha-beta pruning), or rather negamax search, to make a computer program play a card game.
The card game actually consists of 4 players. So in order to be able to use minimax etc., I simplify the game to "me" against the "others". After each "move", you can objectively read the current state's evaluation from the game itself. When all 4 players have placed the card, the highest wins them all - and the cards' values count.
As you don't know how the distribution of cards between the other 3 players is exactly, I thought you must simulate all possible distributions ("worlds") with the cards that are not yours. You have 12 cards, the other 3 players have 36 cards in total.
So my approach is this algorithm, where player is a number between 1 and 3 symbolizing the three computer players that the program might need to find moves for. And -player stands for the opponents, namely all the other three players together.
private Card computerPickCard(GameState state, ArrayList<Card> cards) {
int bestScore = Integer.MIN_VALUE;
Card bestMove = null;
int nCards = cards.size();
for (int i = 0; i < nCards; i++) {
if (state.moveIsLegal(cards.get(i))) { // if you are allowed to place this card
int score;
GameState futureState = state.testMove(cards.get(i)); // a move is the placing of a card (which returns a new game state)
score = negamaxSearch(-state.getPlayersTurn(), futureState, 1, Integer.MIN_VALUE, Integer.MAX_VALUE);
if (score > bestScore) {
bestScore = score;
bestMove = cards.get(i);
}
}
}
// now bestMove is the card to place
}
private int negamaxSearch(int player, GameState state, int depthLeft, int alpha, int beta) {
ArrayList<Card> cards;
if (player >= 1 && player <= 3) {
cards = state.getCards(player);
}
else {
if (player == -1) {
cards = state.getCards(0);
cards.addAll(state.getCards(2));
cards.addAll(state.getCards(3));
}
else if (player == -2) {
cards = state.getCards(0);
cards.addAll(state.getCards(1));
cards.addAll(state.getCards(3));
}
else {
cards = state.getCards(0);
cards.addAll(state.getCards(1));
cards.addAll(state.getCards(2));
}
}
if (depthLeft <= 0 || state.isEnd()) { // end of recursion as the game is finished or max depth is reached
if (player >= 1 && player <= 3) {
return state.getCurrentPoints(player); // player's points as a positive value (for self)
}
else {
return -state.getCurrentPoints(-player); // player's points as a negative value (for others)
}
}
else {
int score;
int nCards = cards.size();
if (player > 0) { // make one move (it's player's turn)
for (int i = 0; i < nCards; i++) {
GameState futureState = state.testMove(cards.get(i));
if (futureState != null) { // wenn Zug gültig ist
score = negamaxSuche(-player, futureState, depthLeft-1, -beta, -alpha);
if (score >= beta) {
return score;
}
if (score > alpha) {
alpha = score; // alpha acts like max
}
}
}
return alpha;
}
else { // make three moves (it's the others' turn)
for (int i = 0; i < nCards; i++) {
GameState futureState = state.testMove(cards.get(i));
if (futureState != null) { // if move is valid
for (int k = 0; k < nCards; k++) {
if (k != i) {
GameState futureStateLevel2 = futureState.testMove(cards.get(k));
if (futureStateLevel2 != null) { // if move is valid
for (int m = 0; m < nCards; m++) {
if (m != i && m != k) {
GameState futureStateLevel3 = futureStateLevel2.testMove(cards.get(m));
if (futureStateLevel3 != null) { // if move is valid
score = negamaxSuche(-player, futureStateLevel3, depthLeft-1, -beta, -alpha);
if (score >= beta) {
return score;
}
if (score > alpha) {
alpha = score; // alpha acts like max
}
}
}
}
}
}
}
}
}
return alpha;
}
}
}
This seems to work fine, but for a depth of 1 (depthLeft=1), the program already needs to calculate 50,000 moves (placed cards) on average. This is too much, of course!
So my questions are:
Is the implementation correct at all? Can you simulate a game like this? Regarding the imperfect information, especially?
How can you improve the algorithm in speed and work load?
Can I, for example, reduce the set of possible moves to a random set of 50% to improve speed, while keeping good results?
I found UCT algorithm to be a good solution (maybe). Do you know this algorithm? Can you help me implementing it?

I want to clarify details that the accepted answer doesn't really go into.
In many card games you can sample the unknown cards that your opponent could have instead of generating all of them. You can take into account information like short suits and the probability of holding certain cards given play so far when doing this sampling to weight the likelihood of each possible hand (each hand is a possible world that we'll solve independently). Then, you solve each hand using perfect information search. The best move over all of these worlds is often the best move overall - with some caveat.
In games like Poker this won't work very well -- the game is all about the hidden information. You have to precisely balance your actions to keep the information about your hand hidden.
But, in games like trick-based card games, this works pretty well - particularly since new information is being revealed all the time. Really good players have a good idea what everyone holds anyway. So, reasonably strong Skat and Bridge programs have been based on these ideas.
If you can completely solve the underlying world, that is best, but if you can't, you can use minimax or UCT to choose the best move in each world. There are also hybrid algorithms (ISMCTS) that try to mix this process together. Be careful about the claims here. Simple sampling approaches are easier to code -- you should try the simpler approach before a more complex one.
Here are some research papers that will give some more information on when the sampling approach to imperfect information has worked well:
Understanding the Success of Perfect Information Monte Carlo Sampling in Game Tree Search (This paper analyzes when the sampling approach is likely to work.)
Improving State Evaluation, Inference, and Search in Trick-Based Card Games (This paper describes the use of sampling in Skat)
Imperfect information in a computationally challenging game (This paper describes sampling in Bridge)
Information Set Monte Carlo Tree Search (This paper merges sampling and UCT/Monte Carlo Tree Search to avoid the issues in the first reference.)
The problem with rule-based approaches in the accepted answer is that they can't take advantage of computational resources beyond that required to create the initial rules. Furthermore, rule-based approaches will be limited by the power of the rules that you can write. Search-based approaches can use the power of combinatorial search to produce much stronger play than the author of the program.

Minimax search as you've implemented it is the wrong approach for games where there is so much uncertainty. Since you don't know the card distribution among the other players, your search will spend an exponential amount of time exploring games that could not happen given the actual distribution of the cards.
I think a better approach would be to start with good rules for play when you have little or no information about the other players' hands. Things like:
If you play first in a round, play your lowest card since you have little chance of winning the round.
If you play last in a round, play your lowest card that will win the round. If you can't win the round, then play your lowest card.
Have your program initially not bother with search and just play by these rules and have it assume that all the other players will use these heuristics as well. As the program observes what cards the first and last players of each round play it can build up a table of information about the cards each player likely holds. E.g. a 9 would have won this round, but player 3 didn't play it so he must not have any cards 9 or higher. As information is gathered about each player's hand the search space will eventually be constrained to the point where a minimax search of possible games could produce useful information about the next card to play.

Related

How to improve Alpha-beta pruning performance

Here is my code for gomoku AI. So now my AI is currently run over 5 seconds but the time limit is 5 seconds. I am trying to improve the performance so I try move ordering but it seems not works. I calculate the score first in getChildStates(int player) function and then sort the vector into a descending order. But it just not work. Can some body help me?
Also, my depth is two. transpotation table seems not help, so I haven't try it.
int minimax(int depth, GameState state, bool maximizingPlayer, int alpha, int beta)
{
if (depth == 2)
return state.score;
if (maximizingPlayer)
{
vector<GameState> children = state.getChildStates(1);
sort(children.begin(), children.end(), greaterA());
int best = MIN;
for (auto& value : children) {
int val = minimax(depth + 1, value,
false, alpha, beta);
int oldBest = best;
best = max(best, val);
alpha = max(alpha, best);
if (depth == 0 && oldBest != best){
bestMoveX = value.lastMove.x;
bestMoveY = value.lastMove.y;
}
// Alpha Beta Pruning
if (beta <= alpha)
break;
}
return best;
}
else
{
vector<GameState> children = state.getChildStates(2);
sort(children.begin(), children.end(),greaterA());
int best = MAX;
// Recur for left and right children
for (auto& value : children) {
int val = minimax(depth + 1, value,
true, alpha, beta);
best = min(best, val);
beta = min(beta, best);
// Alpha Beta Pruning
if (beta <= alpha)
break;
}
return best;
}
}
I won't recommend sorting the game states to prioritize the states thereby enabling force move as per set timeout. Even with alpha-beta pruning, the minimax tree may be just too big. For reference you can have a look in the GNU Chess at github. Here are some options to reduce the best move search time:
1) Reduce depth of search.
2) Weed out redundant moves from the possible moves.
3) Use multi-threading in the first ply to gain speed
4) Allow quiescence search mode, so that minimax tree branches could continue generating in the background when the human opponent is still thinking.
5) Instead of generating the minimax tree for every move, you can think about reusable minimax tree where you only pruned moves already made and only continue generating one ply every iteration ( instead of the whole tree, see this article ).

Computing a move score in a Minimax Tree of a certain depth

I've implemented a Chess game in C, with the following structs:
move - which represents a move from (a,b) to (c,d) on a char board[8][8] (Chess board)
moves - which is a linked list of moves with head and tail.
Variables:
playing_color is 'W' or 'B'.
minimax_depth is a minimax depth that was set before.
Here is my code of the Minimax function with alpha-beta pruning and the getMoveScore function which should return the score of the move in Minimax Tree of a certain minimax_depth that was set before.
As well I'm using the getBestMoves function which I will also list here, it basicly find the best moves during the Minimax algorithm and saves them into a global variable so that I will be able to use them later.
I must add that all the functions that are listed within the three functions that I will add here are working properly and were tested, so the problem is either a logic problem of the alphabetaMax algorithm or the implementation of
getBestMoves/getMoveScore.
The problem mainly is that when I get my best moves at depth N (which are also not computed right somewhy) and then check their score on the same depth with getMoveScore function, I'm getting different scores that don't match the score of those actual best moves. I've spent hours on debugging this and couldn't see the error, I hope maybe anyone could give me a tip on finding the problem.
Here is the code:
/*
* Getting best possible moves for the playing color with the minimax algorithm
*/
moves* getBestMoves(char playing_color){
//Allocate memory for the best_moves which is a global variable to fill it in a minimax algorithm//
best_moves = calloc(1, sizeof(moves));
//Call an alpha-beta pruned minimax to compute the best moves//
alphabeta(playing_color, board, minimax_depth, INT_MIN, INT_MAX, 1);
return best_moves;
}
/*
* Getting the score of a given move for a current player
*/
int getMoveScore(char playing_color, move* curr_move){
//Allocate memory for best_moves although its not used so its just freed later//
best_moves = calloc(1, sizeof(moves));
int score;
char board_cpy[BOARD_SIZE][BOARD_SIZE];
//Copying a a current board and making a move on that board which score I want to compute//
boardCopy(board, board_cpy);
actualBoardUpdate(curr_move, board_cpy, playing_color);
//Calling the alphabeta Minimax now with the opposite color , a board after a given move and as a minimizing player, because basicly I made my move so its now the opponents turn and he is the minimizing player//
score = alphabeta(OppositeColor(playing_color), board_cpy, minimax_depth, INT_MIN, INT_MAX, 0);
freeMoves(best_moves->head);
free(best_moves);
return score;
}
/*
* Minimax function - finding the score of the best move possible from the input board
*/
int alphabeta(char playing_color, char curr_board[BOARD_SIZE][BOARD_SIZE], int depth,int alpha,int beta, int maximizing) {
if (depth == 0){
//If I'm at depth 0 I'm evaluating the current board with my scoring function//
return scoringFunc(curr_board, playing_color);
}
int score;
int max_score;
char board_cpy[BOARD_SIZE][BOARD_SIZE];
//I'm getting all the possible legal moves for the playing color//
moves * all_moves = getMoves(playing_color, curr_board);
move* curr_move = all_moves->head;
//If its terminating move I'm evaluating board as well, its separate from depth == 0 because only here I want to free memory//
if (curr_move == NULL){
free(all_moves);
return scoringFunc(curr_board,playing_color);
}
//If maximizing player is playing//
if (maximizing) {
score = INT_MIN;
max_score = score;
while (curr_move != NULL){
//Make the move and call alphabeta with the current board after the move for opposite color and !maximizing player//
boardCopy(curr_board, board_cpy);
actualBoardUpdate(curr_move, board_cpy, playing_color);
score = alphabeta(OppositeColor(playing_color), board_cpy, depth - 1,alpha,beta, !maximizing);
alpha = MAX(alpha, score);
if (beta <= alpha){
break;
}
//If I'm at the maximum depth I want to get current player best moves//
if (depth == minimax_depth){
move* best_move;
//If I found a move with a score that is bigger then the max score, I will free all previous moves and append him, and update the max_score//
if (score > max_score){
max_score = score;
freeMoves(best_moves->head);
free(best_moves);
best_moves = calloc(1, sizeof(moves));
best_move = copyMove(curr_move);
concatMoves(best_moves, best_move);
}
//If I have found a move with the same score and want to concatenate it to a list of best moves//
else if (score == max_score){
best_move = copyMove(curr_move);
concatMoves(best_moves, best_move);
}
}
//Move to the next move//
curr_move = curr_move->next;
}
freeMoves(all_moves->head);
free(all_moves);
return alpha;
}
else {
//The same as maximizing just for a minimizing player and I dont want to look for best moves here because I dont want to minimize my outcome//
score = INT_MAX;
while (curr_move != NULL){
boardCopy(curr_board, board_cpy);
actualBoardUpdate(curr_move, board_cpy, playing_color);
score = alphabeta(OppositeColor(playing_color), board_cpy, depth - 1,alpha,beta, !maximizing);
beta = MIN(beta, score);
if (beta <= alpha){
break;
}
curr_move = curr_move->next;
}
freeMoves(all_moves->head);
free(all_moves);
return beta;
}
}
As Eugene has pointed out-I'm adding an example here:
http://imageshack.com/a/img910/4643/fmQvlm.png
I'm currently the white player, i got only king-k and queen-q, the opposite color has king-K and rook-R. Obviously my best move here is to eat a rook or cause a check at least. Moves of the pieces are tested and they work fine. Although when i call get_best_moves function at depth 3, I'm getting lots of unnecessary moves and negative scores for them at that depth. Maybe now it's a little more clear. Thanks!
Without debugging your whole code, at least ONE of the problems is the fact that your scoreverification might work with a minimax algorithm, but not with a Alpha-Beta. Following problem:
The getMoveScore() function has to start with an open AB Window.
The getBestMoves() however call getMoveScore() with an already closed AB Window.
So in the case of getBestMoves, there can be branches pruned that are not being pruned in getMoveScore(), therefore the score not being exact, and thats the reason (or at least ONE of them) why these valued can differ.

Implementing multi-threading in an already existing chess engine in C

I want to know if its possible to modify an existing chess engine in C that works without multi-threading to be able to support multi-threading. I have no experience in this subject and would appreciate some guidance.
EDIT: To be more specific, is there anything I can add to my implementation of negamax to make it multi-thread compatible? :
static double alphaBetaMax(double alpha, double beta, int depthleft, game_t game, bool player)
{
move_t *cur;
move_t *tmp;
double score = 0;
bool did_move = false;
cur = getAllMoves(game, player);
if(cur == NULL) /*/ check mate*/
return -9999999*(player*2-1);
tmp = firstMove;
firstMove = 0;
while (cur != NULL)
{
game_t copy;
if(depthleft<=0 && !isCapture(game, cur)) { /* Quiescence search */
cur = cur->next;
continue;
}
did_move = true;
copyGame(game, &copy);
makeMove(&copy, *cur);
firstMove = NULL;
score = -alphaBetaMax(-beta, -alpha, depthleft - 1, copy, !player);
if(board_count > MAX_BOARDS)
break;
freeGame(copy);
if(score > alpha)
alpha = score;
if (beta <= alpha)
break;
cur = cur->next;
}
firstMove=tmp;
freeMoves();
if(!did_move)
alpha = evaluate(game)*(player*2-1);
return alpha;
}
A fast chess engine relies on two things: Caching the evaluation of positions, and the alpha/beta strategy. Caching positions and making it thread safe and fast is hard. The alpha/beta strategy relies on the seemingly best move being completely evaluated before you start evaluating other moves. This also makes it tough to use multiple threads.
Beginner composer to Mozart: "Can you tell me how to compose a symphony"? Mozart to beginner: "Maybe at your young age you should try something easier first. " Beginner to Mozart: "But you wrote symphonies when you were much younger than I am now. " Mozart to beginner: "True, but I didn't have to ask anyone".
The Alpha-Beta pruning is inherently single-threaded in nature. There's been successful approaches using variations of Dynamic Tree Splitting which basically means searching various branches at the same time. However the likelihood (in a well tuned engine) that next branch will be searched (or beta-cut) does not usually outweigh the other parallelism bottlenecks like memory waits.
I would suggest, first modify your search to a "re-search" algorithm like NegaScout or PVS which with small code changes will give good improvements over your current pure Alpha-Beta, then secondly fine-tune your move ordering to yield efficient beta-cut.
Thereafter you could try to split the tree based on beta-cut chances. Typically there would be higher chance of cutoff when a move is found in the transposition-table or a killer move and lesser chance when starting to search bad captures and quiet moves.
Take a look at CPW for some thoughts on it and the YBWC algorithm.
Young Brothers Wait Concept
I'm currently writing a c++ chess engine and I have made a quite simple but not optimal solution:
First I'm generating all moves in the form of a list of structs
I start n threads which repeatedly grab "jobs" from the list
do their search and write back the result into the struct. When the
list is empty, a thread kills itself.
In the general search function I join the threads and loop through the results afterwards.
Is this approach most efficient?
As currently the focus on searching the most relevant moves first but you'll eventually look at all moves and also hardly get a cutoff at depth one but works fine
Simple to implement and in any case better than a "one-cpu" search. Even if one move takes way more time - it's running on one cpu, like back then :D
Maybe start out with that

C pthreads: police officer that allows cars to pass through a one way bridge

I'm implementing a problem in C using pthreads, it is about an old bridge which crosses a river from east to west. Since it is a small bridge, cars can only go
in one direction.
I have solved the problem of creating the threads which will cross the bridge (use the critical region) and they cross if there are no cars on the bridge or if the cars crossing are going the same direction.
Next step is to make another thread that has to run during the entire process, this thread has to allow the pass in one direction to a certain amount of cars and then to the other direction, like a police officer would do. Right now I'm doing the following:
I'm creating a thread (police officer):
pthread_create(&policeOfficer, NULL, officerFunction, &bridge);
Before the creation of the "car" threads:
pthread_t thread[totalCars];
int randomDir;
int contWest = 0;
int contEast = 0;
while (contWest + contEast < totalCars) {
randomDir = (int) (rand() % 2); //random number to define the direction (east/west)
if (contWest < westCars && randomDir == 1) {
contWest++;
//sleep(1);
pthread_create(&thread[(contWest + contEast) - 1], NULL, west, &bridge);
} else if (contEast < eastCars && randomDir == 0) {
contEste++;
//sleep(1);
pthread_create(&thread[(contwest + contEast) - 1], NULL, east, &bridge);
}
}
The function that the "officer" executes is the following (this is not working):
static void* officer(void *data) {
bridge *pBridge = (bridge *) data;
while(totalThreads>=0){
while(pBridge->numberOfCars <= carsAllowed);//busy waiting I think. numberOfCars is the actual number of cars at the bridge (crossing)
if(pBridge->direction == WEST){
pBridge->direction = EAST;
}else{
pBridge->direction = WEST;
}
}
return NULL;
}
I'm getting used to work with threads and arrays of threads but not with a thread to control another threads so I don't know if this is right. The functions that controls how the threads use the bridge are all working so I don't think I need to modify them that much.
I think the "officer" just have to allow a certain amount of cars to cross in one direction or the other (changing the bridge direction after cars pass through) so I hope you can help me on this one.
Thanks!

How can i extract my best move from Min Max in TicTacToe?

int minmax(Board game, int depth)
{
if (game.IsFinished() || depth < 0)
return game.Score(game.Turn);
int alpha = int.MinValue + 1;
foreach (Point move in game.Generate_Moves())
{
Board currentBoard = game;
currentBoard.Do_Move(move);
alpha = max(alpha, -minmax(currentBoard, depth-1));
currentBoard.Undo_Move(move);
}
return alpha;
}
The thing is that this little function tells me if the game is a win, a lose or a draw, but how can i get the move that will led me to a win? My Point class is a simple Class With 2 coordinates X, Y and i want to get the answer as a point so i can latter say something like game.Do_Move(myPoint).
In case some functions aren't obvious:
game.IsFinished() - returns true if win/lose/draw else otherwise
game.Score(turn) - returns -1/0/1 in case is a lose/draw/win for the player with the next move
game.Generate_Moves() - returns a List with available moves
game.Do_Move() - void that applies the move to game
game.Undo_Move() - talks for itself
It would be enough if the minimax function which gets called on the root node of the game tree returns both, the choosen move and the score. For all other nodes of the game tree, the function needs only to return the score. Thus the usual way is to implement two slightly different minimax functions – Look at Note #2 in the description to this NegaMax Framework.
Applied to your minimax interface you would have following additional function:
int minimaxWithMove(Board game, int depth, Point& choosen)
{
assert (!game.IsFinished() && depth > 0); // not possible at root node
int alpha = int.MinValue + 1;
foreach (Point move in game.Generate_Moves())
{
Board currentBoard = game;
currentBoard.Do_Move(move);
int score = -minmax(currentBoard, depth-1);
if (score > alpha)
{
alpha = score;
choosen = move;
}
}
return alpha;
}
Note that I have removed the call to Undo_Move as it is not needed because you make a copy of game in each iteration.
You need to apply the minimax theorem.
You basically have to make a game tree, where each node in the tree is a board position, and each child is the result of a legal move. The leaf nodes (where the game is ended) will have scores according to game.score(), and one player is trying to pick moves down a path leading to a high score, while the other is trying to pick moves that force a low score. The theorem will help you see how to apply that idea, rigorously.

Resources