I'm suppose to make a c++ program that makes a DFA for Tic Tac Toe, accepting first player wins only. I have working code and it is generating a DFA. I also have a function that is counting the number of states. I'm getting 2,203,642 states, but I'm not sure if that is right or wrong. Can anyone tell me how many states I should have?
I'm not sure whether I fully understand the question, but I can answer based on my understanding.
The first thing we need to do is model a game of Tic Tac Toe. There is no one right or wrong way to do this. I propose the following model: a game of Tic-Tac-Toe is a string of length 9 consisting of 0s, 1s and 2s. 0s correspond to unclaimed positions, 1s to those claimed by player 1, and 2s those claimed by player 2. Position inside the string corresponds to the positions by taking the positions from left to right, then top to bottom. Player 1 wins this game: 122010001; loses this game: 100222110; and this game is invalid 111222111.
So we are looking for a DFA to recognize valid games where player 1 wins. A game is valid if (a) there is at least one player who hasn't won and (b) both players have the same number of moves or player 1 has one more move than player 2. Player 1 wins in any of these cases:
111SSSSSS
SSS111SSS
SSSSSS111
1SS1SS1SS
S1SS1SS1S
SS1SS1SS1
1SSS1SSS1
SS1S1S1SS
Here, S stands for any of the symbols 0, 1, 2. So, here's one way to approach making a DFA:
Write a DFA that accepts the union of the languages described by the 8 regular expressions in the list above.
Copy the DFA from 1, but for player 2 winning instead of player 1. Now, negate the DFA by toggling states from accepting to non-accepting and vice versa.
Construct the Cartesian product machine by considering the intersection of the DFAs obtained from steps 1 and 2. This DFA now recognizes "player 1 wins and player 2 does not`.
Now construct a new DFA to recognize whether the numbers of moves are valid. We will have 100 states: 10 move counts for player 1 and 10 for player 2. Mark as accepting those states corresponding to #1 = #2 or #1 = #2 + 1 (there should be 20 such states).
Construct the Cartesian product machine of machines from steps 3 and 4 using intersection to determine accepting states. The resulting DFA recognizes exactly the valid games which player 1 wins and player 2 loses.
Note: after each step, you can minimize the resulting DFA in order to make the Cartesian product machine constructions have fewer states. Also, you can minimize the machine from step 5 to get the smallest possible DFA accepting the language.
Possibly not super helpful but if you write a program to generate the DFAs outlined above, and then have the program minimize the DFAs along the way, you should get the right answer.
Related
I have an AI class and we have to make projects. I chose to do a genetic algorithm and since I'm new to the concept I have couple of questions. I have researched and I get the idea and followed Coding Train's video on simple genetic algorithm without any problem. However i have seen multiple videos on YouTube where cars evolve, and I don't get how do they have population of lets say 20 if only one car is being rendered to the screen. I wanna try and create Pong like game(I'll use basic physics engine) where Player A is computer, which always follows the Y coordinate of the ball thus can't lose, and Player B is supposed to evolve using genetic algorithm. How would I evolve Player B every time it loses. What would the chromosomes be? What would the population be? If you can give me any advice I would be very thankful
Regarding the cars, it's most likely that each car in the generation is being evaluated and rendered sequentially. Suppose the population size is 20, the first 20 cars you see would be the initial population. The next 20 cars you see would be the second generations population and so on.
Regarding Pong, you need to decide on a fitness function for your Player B. If Player B always loses then perhaps your fitness function could be how long it is able to last before it loses. To determine your chromosome you first need to decide how you will control Player B's paddle. The chromosome would then be some set of design variables that affect that system. For example, you might use a small neural net where your chromosome encodes the weights of the connections. Your population is a set of chromosomes used to produce the next generations set of chromosomes through crossover and mutation.
There is a game that I've programmed in java. The game is simple (refer to the figure below). There are 4 birds and 1 larva. It is a 2 player game (AI vs Human).
Larva can move diagonally forward AND diagonally backward
Birds can ONLY move diagonally forward
Larva wins if it can get to line 1 (fence)
Larva also wins if birds have no moves left
Birds CANNOT "eat" the larva.
Birds win if Larva has NO move left (cannot move at all)
When the game starts, Larva begins, then ONE bird can move (any one), then Larva, etc...
I have implemented a MiniMax (Alpha Beta Pruning) and I'm using the following evaluate() function (heuristic function).
Let us give the following numbers to each square on the board.
Therefore, our evaluation function will be
h(n) = value of position of larva - value of position of bird 1 - value of position of bird 2 - value of position of bird 3 - value of position of bird 4
the Larva will try to MAXIMIZE the heuristic value whereas the Birds will try to MINIMIZe it
Example:
However, this is a simple and naive heuristic. It does not act in a smart manner. I am a beginner in AI and I would like to know what can I do to IMPROVE this heuristic function?
What would be a good/informed heuristic?
How about this :
Maximum :larva
Minimum :birds
H(t)=max_distance(larva,line_8)+Σmin_distance(bird_n,larva)
or
H(t)=Σmin_distance(bird_n,larva) - min_distance(larva,line_1)
max_distance(larva,line_8): to reflect the condition that larva is closer to the line 1.
Σmin_distance(bird_n,larva): to reflect the condition that birds are closer to the larva(to block it).
I believe there are still many thing could be considered ,for example ,the bird closest to the larva should have high priority to be chosen to move, but the direction about the function above make sense , and many details can be thought to improve it easily.
There is 1 simple way to improve your heuristic considerably. In your current heuristic, the values of square A1 is 8 less than the value of square A8. This makes the Birds inclined to move towards the left side of the game board, as a move to the left will always be higher than a move to the right. This is nit accurate. All squares on row 1 should have the same value. Thus assign all squares in row 1 a 1, in row 2 a 2, etc. This way the birds and larva won't be inclined to move to the left, and instead can focus on making a good move.
You could take into account the fact that birds will have a positional advantage over the larva when the larva is on the sides of the board, so if Larva is MAX then change the side tile values of the board to be smaller.
In the minimax algorithm, the first player plays optimally, which means it wants to maximise its score, and the second player tries to minimise the first player's chances of winning. Does this mean that the second player also plays optimally to win the game? Trying to choose some path in order to minimise the first player's chances of winning also means trying to win?
I am actually trying to solve this task from TopCoder: EllysCandyGame. I wonder whether we can apply the minimax algorithm here. That statement "both to play optimally" really confuses me and I would like some advice how to deal with this type of problems, if there is some general idea.
Yes, you can use the minimax algorithm here.
The problem statement says that the winner of the game is "the girl who has more candies at the end of the game." So one reasonable scoring function you could use is the difference in the number of candies held by the first and second player.
Does this mean that the second player also plays optimally to win the game?
Yes. When you are evaluating a MIN level, the MIN player will always choose the path with the lowest score for the MAX player.
Note: both the MIN and MAX levels can be implemented with the same code, if you evaluate every node from the perspective of the player making the move in that round, and convert scores between levels. If the score is a difference in number of candies, you could simply negate it between levels.
Trying to choose some path in order to minimize the first player's chances of winning also means trying to win?
Yes. The second player is trying to minimize the first player's score. A reasonable scoring function will give the first player a lower score for a loss than a tie.
I wonder whether we can apply the minimax algorithm here.
Yes. If I've read the problem correctly, the number of levels will be equal to the number of boxes. If there's no limit on the number of boxes, you'll need to use an n-move lookahead, evaluating nodes in the minimax tree to a maximum depth.
Properties of the game:
At each point, there are a limited, well defined number of moves (picking one of the non-empty boxes)
The game ends after a finite number of moves (when all boxes are empty)
As a result, the search tree consists of a finite number of leafs. You are right that by applying Minimax, you can find the best move.
Note that you only have to evaluate the game at the final positions (when there are no more moves left). At that point, there are only three results: The first player won, the second player won, or it is a draw.
Note that the standard Minimax algorithm has nothing to do with probabilities. The result of the Minimax algorithm determines the perfect play for both side (assuming that both sides make no mistakes).
By the way, if you need to improve the search algorithm, a safe and simple optimization is to apply Alpha Beta pruning.
I was challenged by coworker into creating a Tic Tac Toe game AI that plays five-in-a-row games (not the traditional 3). My initial thoughts are that I create a "scoreboard", i.e. every cell in the game gets a score between 0 and infinite. The AI finds shapes and determines which places hold how much value and give score to the cells. In the end, highest scored cell is the choice.
Is there a better way to approach this problem?
5x5 Tic-Tac-Toe might still be small enough to solve directly, depending on your time constraints, if you're clever about the board symmetries. Oddly enough, I just wrote a description of the general technique last night, for this question:
How to code simple AI for a windows phone board game?
If not, that's still a good starting point. The next most obvious thing to me would be to change the board evaluation function and search only as deep in the tree as is feasible for your time constraints. The idea is that you, as a human, might have some ideas about what strong and weak positions are. So, as a guess, we know five in a row wins, so assign X wins as +5 and O wins as -5. One way to win is to get four in a row prior to that, so if X has four in a row, that might be worth 4, and if O has four in a row, that might be worth -4. The idea is that if you can't search all the way down the tree, you search as far as you can with the minimax technique, confident that you're working your way toward a strong position.
That board eval function is only an example. Coming up with a good board evaluation function can be tricky, and the one I described misses some obvious details.
Another thing to try is to use a genetic algorithm and neural networks to evolve the board evaluation function. Now the idea is to feed board positions into neural networks, which do the board evaluations, and let them play according to the technique I described above, tournament style. Then, after tournament rounds, new neural networks are created (through genetic algorithm) from the winners and losers are eliminated. The board evaluation function evolves naturally.
Hi
I'm confused how you can determine the utility functions on with a minimax search
Explain it with any game that you can use a minimax search with
Basically i am asking how do you determine the utility functions
Cheers
The utility value is just some arbitrary value that the player receives when arriving at a certain state in the game. For instance, in Tic-tac-toe, your utility function could simply be 1 for a win, 0 for a tie, or -1 for a loss.
Running minmax on this would at best find a set of actions that result in 1 (a win).
Another example would be chess (not that you can feasibly run minimax on a game of chess). Say your utility function comes from a certain number that is based on the value of the piece you captured or lost
Determining the utility value of a move at a certain state has to do with the experience of the programmer and his/her knowledge of the game.
Utility values on a terminal state are kind of easy to determine. In Tic-tac-toe, for instance, a terminal state for player X is when the Xs are aligned in diagonal, vertically, or horizontally. Any move that creates such a state is a terminal state and you can create a function that checks that. If it is a terminal state, the function returns a 1 or -1.
If your player agent is player X and after player X's move it determines that player O will win, then the function returns a -1. The function returns a 1 if it determines that it is its own winning move.
If all cells are occupied with the last possible move and nobody has won, then the function returns a zero.
This is at terminal states only. It is critical to evaluate intermediate states because, even in a 3x3 game, there are lots of combinations to consider. If you include symmetrical moves you have 9! possible states in Tic-tac-toe. For those intermediate cases, you need to come up with an evaluation function that returns a score for each state as they related to other states.
Suppose that I assign the terminal state values of 810, 0, and -810. For each move, the score would be 810 / (# of moves). So if I reach a terminal state in 6 moves, the score would be 810/6 = 135. In 9 moves, the score would be 90. An evaluation function fashioned this way would favor moves that reach a terminal state faster. However, it still evaluates to a leaf node. We need to evaluate before reaching a leaf node, though, but this could also be part of an evaluation function.
Supposed that, in the game below, player 1 is X. So X moves next. The following are the legal moves (row, column) for X:
(1) 0,0
(2) 0,2
(3) 2,0
(4) 2,1
(5) 2,2
| |O| |
|O|X|X|
| | | |
The utility value for each move should favor the best moves.
The best moves, in this case, are either (2) or (5). So an evaluation function will assign a utility value of 81, for instance to each of those. Move (4) is the worst possible move for the X player (and would also warranty that you lose the game against an intelligent player) so the function would assign a value of -9 to that move. Moves (1) and (3), while not ideal, will not make you lose, so we might assign a 1.
So when minimax evaluates those 5 moves, because your player X, is max, the choice would be either (2) or (5).
If we focus on options (2) or (5), the game will be on a terminal state two moves after these. So, in reality, the evaluation function should look 2 moves ahead of the current legal moves to return the utility values. (This strategy follows the lines of depth limited search, where your function evaluates at a certain depth and produces a utility value without reaching a leaf node - or terminal state)
Now I'll circle back to my first statement. The utility value will be determined by an evaluation function coded per the programmer's knowledge of the game.
Hopefully, I'm not confusing you...