Related
I'm refering mostly to this paper here: http://clgiles.ist.psu.edu/papers/UMD-CS-TR-3617.what.size.neural.net.to.use.pdf
Current Setup:
I'm currently trying to port the neural-genetic AI solution that I have laying around to get into a multi-purpose multi-agent tool. So, for example, it should work as an AI in a game engine for moving around entities and let 'em shoot and destroy the enemy (so e.g. 4 inputs like distance x,y and angle x,y and 2 outputs like accelerate left,right).
The state so far is that I'm using the same amount of genomes as there are agents to determine the fittest agents. 20% of the fittest agents are combined with each other (zz, zw genomes selected) and create 2 babies for the new population each. The rest of the new population per-new-generation is selected randomly across the old population, including the fittest with-an-unfit-genome.
That works pretty well to prime the AI, after generation 50-100 it is pretty much human-unbeatable in a Breakout clone and a little Tank game where you can shoot and move around.
As I had the idea to use on evolution population for each "type of Agent" the question is now if it is possible to determine the amount of hidden layers and the amount of neurons in the hidden layers generically.
My setup for the tank game is 4 inputs, 3 outputs and 1 hidden layer with 12 neurons that worked the best (around 50 generations to be really strong).
My setup for a breakout game is 6 inputs, 2 outputs and 2 hidden layers with 12 neurons that seems to work best.
Done Research:
So, back to the paper: On page 32 you can see that it seems that more neurons per hidden layer need of course more time for priming, but the more neurons are in between, the more are the chances to get into the function without noise.
I currently prime my AI only using the fitness increase on successfully being better than the last try.
So in a tank game it means he successfully shot the other tank (wounded him 4 times is better, then enemy is dead) and won the round.
In the breakout game it's similar as I have a paddle that the AI can move around and it can collect points. "Getting shot" or negative treatment here is that it forgot to catch the ball. So potential noise input would be 2 output values (move-left, move-right) that depend on 4 input values (ball x, y, degx, degy).
Questions:
So, what kind of calculation for the amount of hidden layers and amount of neurons do you think can be a good tradeoff to have no noise that kills the genome evolution?
What is the minimum amount of agents until you can say that "it evolves further"? My current training setup is always around having 50 agents that learn in parallel (so they basically simulate 50 games in parallel "behind the scenes").
In sum, for most problems, one could probably get decent performance (even without a second optimization step) by setting the hidden layer configuration using just two rules: (i) number of hidden layers equals one; and (ii) the number of neurons in that layer is the mean of the neurons in the input and output layers.
-doug
In short. It's an ongoing area of research. Most (All that I know of) ANN using numerous neurons and H-Layers don't set a static number of either, instead they use algorithms to continuously modify these values. Usually constructing and destroying when outputs converge/diverge.
Since it sounds like you're already using some evolutionary computing, consider looking into Andrew Turner's work on CGPANN, I remember it getting pretty decent improvements on benchmarks similar to your work.
I want to ask if it is senseful using a standard backpropagation neural network with TD-learning method in a board game?
My method looks like:
Play 1 game. Net is playing as both players with greedy policy and random moves sometimes.
For each stored game position (starting from terminal-1 and move to starting position) calculate estimated position value and desired position value, e.g.
boards_values[i]['desired_value'] = boards_values[i]['estimated_value'] + 0.4 * ( boards_values[i+1]['estimated_value'] - boards_values[i]['estimated_value'] )
Create training patterns for net from entire game end train each with small learning rate for 1 epoch with standard back propagation algorithm.
NN.train([pattern], iterations=1, N=0.001, M=0.000001)
I tried some combinations of the above (learning not from one example but 30-40 patterns, decreasing/increasing learning speed and so on) in my tic tac toe game and never trained ideal player (it should never lose vs random). One of best example when NN agent play against random player is:
(play as first: win, tie, lose), (play as second: win, tie, lose), (sum: win, tie, lose)
(191, 34, 275), (159, 102, 239), (350, 136, 514) - fresh net
(427, 21, 52), (312, 16, 172), (739, 37, 224) - after +50k games
Input is 18 neurons in format:
for each board cell set (1,0) for x, (0,0) for empty cell and (0,1) for o. Output is one unit win/lose probability estimation in range -1, 1.
Tic tac toe is only testing sandbox, when I finish it with success I will move to a more complex card game ('Lost Cities').
Yes, this is relatively standard. It was the approach taken by Tesauro in his program TDGammon 2.1 which trained an artificial neural network to play backgammon better than the best human players (after bootstrapping on 1.5 million games).
There are many caveats, however:
Artificial neural networks are notoriously difficult to use correctly. Have you ensured that your implementation performs as expected, by testing it on some simple supervised learning problems?
TDGammon used a neural network to give heuristic utilities for each game state, and combined this with a 2-ply alpha/beta pruning algorithm. With modern computers it is possible to use a deeper look-ahead (for example, I recently coded up an alpha/beta search algorithm that easily manages 10-ply search on a game with a branching factor of 7, on interpreted (non-compiled) code and before taking heuristics into account).
TD Learning is not the only reinforcement learning algorithm. I have had success in the past applying SARSA and Q-Learning, which speed up search by preferentially exploring strategies which appear promising and ignoring strategies which look bad. You need to combine them with an exploration policy to ensure that they sometimes explore strategies that look bad, to avoid getting stuck in local minima. A simple policy such as epsilon-greedy with ε = 0.1 often works well.
Eligibility traces are a powerful way of speeding up learning in reinforcement learning algorithms. Algorithms that use eligibility traces include TD(λ), SARSA(λ) and Q(λ). You need to be careful, though - there is now another parameter to fit, which means that it's even more important to take care when training your model. Use a test set!
I was challenged by coworker into creating a Tic Tac Toe game AI that plays five-in-a-row games (not the traditional 3). My initial thoughts are that I create a "scoreboard", i.e. every cell in the game gets a score between 0 and infinite. The AI finds shapes and determines which places hold how much value and give score to the cells. In the end, highest scored cell is the choice.
Is there a better way to approach this problem?
5x5 Tic-Tac-Toe might still be small enough to solve directly, depending on your time constraints, if you're clever about the board symmetries. Oddly enough, I just wrote a description of the general technique last night, for this question:
How to code simple AI for a windows phone board game?
If not, that's still a good starting point. The next most obvious thing to me would be to change the board evaluation function and search only as deep in the tree as is feasible for your time constraints. The idea is that you, as a human, might have some ideas about what strong and weak positions are. So, as a guess, we know five in a row wins, so assign X wins as +5 and O wins as -5. One way to win is to get four in a row prior to that, so if X has four in a row, that might be worth 4, and if O has four in a row, that might be worth -4. The idea is that if you can't search all the way down the tree, you search as far as you can with the minimax technique, confident that you're working your way toward a strong position.
That board eval function is only an example. Coming up with a good board evaluation function can be tricky, and the one I described misses some obvious details.
Another thing to try is to use a genetic algorithm and neural networks to evolve the board evaluation function. Now the idea is to feed board positions into neural networks, which do the board evaluations, and let them play according to the technique I described above, tournament style. Then, after tournament rounds, new neural networks are created (through genetic algorithm) from the winners and losers are eliminated. The board evaluation function evolves naturally.
'Proximity' is a strategy game of territorial domination similar to Othello, Go and Risk.
Two players, uses a 10x12 hex grid. Game invented by Brian Cable in 2007.
Seems to be a worthy game for discussing a) optimal algorithm then b) how to build an AI.
Strategies are going to be probabilistic or heuristic-based, due to the randomness factor, and the insane branching factor (20^120).
So it will be kind of hard to compare objectively.
A compute time limit of 5 seconds max per turn seems reasonable => this rules out all brute-force attempts. (Play the game's AI on Expert level to get a feel - it does a very good job based on some simple heuristic)
Game: Flash version here, iPhone version iProximity here and many copies elsewhere on the web
Rules: here
Object: to have control of the most armies after all tiles have been placed. You start with an empty hexboard. Each turn you receive a randomly numbered tile (value between 1 and 20 armies) to place on any vacant board space. If this tile is adjacent to any ALLY tiles, it will strengthen each of those tile's defenses +1 (up to a max value of 20). If it is adjacent to any ENEMY tiles, it will take control over them IF its number is higher than the number on the enemy tile.
Thoughts on strategy: Here are some initial thoughts; setting the computer AI to Expert will probably teach a lot:
minimizing your perimeter seems to be a good strategy, to prevent flips and minimize worst-case damage
like in Go, leaving holes inside your formation is lethal, only more so with the hex grid because you can lose armies on up to 6 squares in one move
low-numbered tiles are a liability, so place them away from your main territory, near the board edges and scattered. You can also use low-numbered tiles to plug holes in your formation, or make small gains along the perimeter which the opponent will not tend to bother attacking.
a triangle formation of three pieces is strong since they mutually reinforce, and also reduce the perimeter
Each tile can be flipped at most 6 times, i.e. when its neighbor tiles are occupied. Control of a formation can flow back and forth. Sometimes you lose part of a formation and plug any holes to render that part of the board 'dead' and lock in your territory/ prevent further losses.
Low-numbered tiles are obvious-but-low-valued liabilities, but high-numbered tiles can be bigger liabilities if they get flipped (which is harder). One lucky play with a 20-army tile can cause a swing of 200 (from +100 to -100 armies). So tile placement will have both offensive and defensive considerations.
Comment 1,2,4 seem to resemble a minimax strategy where we minimize the maximum expected possible loss (modified by some probabilistic consideration of the value ß the opponent can get from 1..20 i.e. a structure which can only be flipped by a ß=20 tile is 'nearly impregnable'.)
I'm not clear what the implications of comments 3,5,6 are for optimal strategy.
Interested in comments from Go, Chess or Othello players.
(The sequel ProximityHD for XBox Live, allows 4-player -cooperative or -competitive local multiplayer increases the branching factor since you now have 5 tiles in your hand at any given time, of which you can only play one. Reinforcement of ally tiles is increased to +2 per ally.)
A former member of the U of A GAMES group here.
That branching factor is insane. Far worse than Go.
Basically, you're hooped.
The problem with this game is that it is not deterministic due to the selection of a random tile. This actually adds another layer of nodes between each existing layer of nodes in the tree. You'll be interested in my publications on *-Minimax to learn about techniques for searching in stochastic domains.
In order to complete one-ply searches before the end of this century, you're going to need some very aggressive forward pruning techniques. Throw provably best move out the window early and concentrate on building good move ordering.
For general algorithms, I would suggest you to check the research done by the Alberta University AI Games group: http://games.cs.ualberta.ca Many of the algorithms there guarantee to find optimal policies. However, I doubt you're really interested in finding the optimal, aim for the "good enough" unless you want to sell that game in Korea :D
From your description, I have understood the game to be a two-player with full-observability i.e. no hidden units and such and fully deterministic i.e. player's actions outcomes do not require rolling, then you should take a look at the real-time bounded-search minimax derivatives proposed by the U Alberta guys. However, being able to do bound as well the depth of the backups of the value function would perhaps be a nice way to add a "difficulty level" to your game. They have been doing some work - a bit fishy imo - on sampling the search space for improving value function estimates.
About the "strategy" section you describe: in the framework I am mentioning, you will have to encode that knowledge as an evaluation function. Look at the work of Michael Büro and others - also in the U Alberta group - for examples of such knowledge engineering.
Another possibility would be to pose the problem as a Reinforcement Learning problem, where adversary moves are compiled as "afterstates". Look that up on the Barto & Sutton book: http://webdocs.cs.ualberta.ca/~sutton/book/the-book.html However the value function for a RL problem resulting from such a compilation might prove a bit difficult to solve optimally - the number of states will blow up like an H-Bomb. However, if you see how to use a factored representation, things can be much easier. And your "strategy" could perhaps be encoded as some shaping function, which would be speeding up the learning process considerably.
EDIT: Damn English prepositions
I'm trying to implement Pacman. It works fine, but so far, the ghosts aren't using any pathfinding, but instead just decide randomly on each path junction which path to take. So you can imagine that it isn't really difficult for Pacman to win the game ;)
So I read a little bit about path finding algorithms in Pacman and here on SO I found a really good answer: Pathfinding Algorithm For Pacman
The answers are referring to http://home.comcast.net/~jpittman2/pacman/pacmandossier.html#Chapter%204
This is all fine, but in my implementation of Pacman, there are two Pacmans which are played by two different players. So I wonder how to adapt the pathfinding algorithms, so that the ghosts are not always chasing one player.
Any thoughts on how to modify the algorithm so that the ghosts are more or less equally fair to both players?
I think the easiest strategy is to make each ghost chase the player closest to it. Proximity can be calculated using Manhattan distance (there was a link to it in the pathfinding question) or Euclidean distance or by a path length to the players. The last option means that you will have to compute paths to both players. Try all these options and choose one to your taste.
Also, on a side note. All people answering the pathfinding question didn't mention Dijkstra's algorithm which is even slower than BFS :) but allows to search all shortest paths only once. That is, if you implement A* or BFS and have n ghosts you will make at least n pathfinding queries. With Dijkstra you can do it only once starting from the player. But it all depends. If your game field is too large, Dijkstra is not the best choice. Try, experiment and maybe it'll suit you.
(Haven't looked but) I'm guessing that all the ghost algorithms base their behaviour on the relative positions of the ghost and 'the player' - well, simply have each ghost change its mind about which of the two players it uses as 'the player' in its algorithm, every so often.
Determining what exactly "every so often* means here is going to be a question for playtesting - should it be on a fixed schedule? Vary per ghost? Vary based on the relative proximity of the two players? Randomly - on a uniform / Poisson / other distribution?
There are as you can see many possibilities. Bear in mind that you want to avoid both behaviour which is 'too good' and behaviour which is 'too stupid'...
If you can query the distance and direction to any one Pacman from any one Ghost and also the number of Ghosts (and which Ghosts) are currently chasing any one Pacman, you should be able to make a pretty good and simple AI with some creativity.
I think you keep the pathfinding algorithms described on this web page you mentioned. That will make the game feel more true to the original. The only problem then is to determine how many ghosts chase a particular Pacman. I think this behavior should include scenarios where all of the ghosts are chasing one player. So, an algorithm is needed to determine if 1, 2, 3, or 4 ghosts are chasing a player. The algorithm could be based on the point difference between the players. So, the player in the lead would get chased by more ghosts. The algorithm should probably factor in the number of lives left for the player. So, if the player in the lead has fewer lives, the algorithm should delay increasing the number of ghosts chasing the player in the lead. The frequency of change in the number of ghosts chasing a player should also not happen too often. If a ghost changes the player being chased too much, then the ghost will seem to not really be chasing either. Just like the web page mentioned, getting a good behavior is going to take some experimentation. I think keeping it simple at first is key because sometimes complex looking behavior can be achieved by using a few simple rules. Good luck and I would love to see what you come up with. Please post a link when you get done!
I don't know if this coincides with your notion of "fairness", but I imagine one would like to prevent the case where one player happened to be the closer target to all 4 ghosts and so they end up ganging up on him and following him around, never again to chase the other player. This would be a possible result of the rule to have the ghost always follow the closest player.
You might consider first allocating fairly 2 ghosts to player 1 and 2 other ghosts to player 2, and then have them chase their targets (and reassigning this every so often). Although, if I were a ghost in the real world I wouldn't care if all my friends and I were ganging up on one pacman.
Instead of BFS or Dijkstra, I would use depth first search to depth 3 or 4, using Cartesian distance between your ghost and the Pacman at the leaves of this search tree and picking the value of the best leaf up to the root. For a small lookahead, it would be faster and easier to code compared to BFS and Dijkstra. Depth limited search should give you pretty intelligent behavior for your ghosts, assuming your gameboard does not have spiraling corridors where the number of moves required to escape the spiral is greater than 3 or 4. It also means the running time of the algorithm doesn't increase with larger and larger boards as does BFS and Dijkstra, again assuming you don't have spiraling corridors.