Questions about Q-learning in a 2D maze - artificial-intelligence

I just read about Q-learning and I'm not sure if I understand this correctly. All examples I saw are rat-in-a-maze, where the rat must move towards the cheese, and the cheese doesn't move.
I'm just wondering if it's possible to do Q-learning in a situation where both the mouse and the cheese move (so one agent chases and the other runs away).
If Q-learning doesn't work in that situation, do we have any other algorithms (greedy or non-greedy) that work?
Also is there a formal/academic name the situation? I'd like to search for papers that talks about this but can't find its formal/academic name.
Thank you so much!

All RL algorithms enable a single agent to learn a policy. In problems that involve multiple actors such as a mouse and a cheese, one actor (the mouse) would learn a policy using an RL algorithm and the other actor (the cheese) would be guided by some AI that is not RL. If both the mouse and cheese are RL agents, then you're looking at multiagent RL. Here is a nice framework for it: https://github.com/PettingZoo-Team/PettingZoo/
Q-learning is probably the most popular RL technique for beginners, but can only solve very simple toy problems with a discrete state space, such as a 2D maze. It is not very effective in addressing problems with a continuous state space, even simple ones, such as the Cartpole. It might solve them but would take much longer than other RL methods. Q-learning combined with a neural network, however, can be very powerful, as demonstrated by RL methods such as deep Q-network (DQN) and double DQN.

Related

Find the uniform search technique for the River Crossing puzzle

I have to use a uninformed search technique to solve the following problem.
The game is like:
On side of the river, there is a Policeman, a Robber, a woman in a red-dress and her two children, a woman in a yellow dress and her two children. There is a boat that can carry atmost two persons. The children cannot drive the boat.
If the policeman is absent then the robber will kill the people. If the red-dress woman is absent then the yellow-dressed woman will kill the red-dressed woman’s children and vice versa.
I am confused as usual. Please help me figure it out.
The problem and how can it be solved (without programming) is shown in the video below:
https://www.youtube.com/watch?v=vSusAZBSWwg
Thank you.
Problems like River Crossing Puzzle, Sokoban or Lemmings are solved normally with Brute-Force-Search in the gametree. The domain is specified declarative as rules (moves are possible or not), and a function which determines the amount of points which are reached by a policy (policy = plan through the gametree). The solver has the aim to find a good policy. The best hardware for doing this is a quantum computer with unlimited speed for testing as much as possble moves per second.
The reason why this is not practicaly is because of a phenomenon which is called "combinatorial explosion", first introduced by James Lighthill in year 1973 for prooving that artifical intelligence is not ready for use in realworld. The answer to that problem is, to use alternative strategies which have noting to do with brute-force-search.
One possibility is to use heuristics which are hardcoded into programcode. Often these heuristics are called macroaction or motion primitives. An example would be "bring-robber-to-other-side". This subfunction executes a predefined number of actions. Another macroaction could be "check-if-two-woman-are-on-the-same-side". To implement such kind of strategy for the complete game is a hard task. Not because of high cpu usage, but because of every detail has to coded into software.

Tree generation in abalone artificial intelligence

I need to implement an intelligent agent to play Abalone game, for this kind of game the best way to proceed seems a min-max strategy with alpha beta pruning.
I have already implemented a naive search algorithm that use min-max with pruning,
my problem is...
How to generate the nodes of the tree where perform the search?
I have no idea of the right way to do this, and how assign the weigh to each node.
For generating the tree nodes: You need to implement a method that returns a collection of all possible legal moves given the current board position and the player whose turn it is. All these moves will become children of the node representing the current board position. Repeat until memory is exhausted to generate the game tree ;) or rather until you reach a reasonable tree depth.
For alpha-beta search you also need an evaluation function which calculates the weight for each board position/node. You can do some research or think about such a function yourself, maybe considering the number of stones still on the board. However a bad evaluation function can seriously screw up your results, so take care and run a lot of tests.
If you have trouble coming up with a reasonable evaluation function, I recommend you take a look into Monte-Carlo techniques such as UCT.
http://en.wikipedia.org/wiki/Monte_Carlo_tree_search
These tackle the problem using a probabilistic approach and have some nice advantages over alpha-beta. Also they don't require an evaluation function so you could skip this step.
Good luck!
I have published two libraries for move generation in Abalone. You didn't mention the programming language used for your search implementation, but you can easily port the functions.
For C++, https://sourceforge.net/projects/abnet/
For Python, https://gitlab.com/peer.sommerlund/haliotis
For an evaluation function, distance between all your marbles, or distance to their gravity center (same thing), works nicely. Tino Werner used this with a twist for his program that won ICGA 2003.
For understanding distance when using hex coordinates, I can recommend Amit Patel's page: https://www.redblobgames.com/grids/hexagons/

AI Minesweeper project

I need to implement Minesweeper solver. I have started to implement rule based agent.
I have implemented certain rules. I have a heuristic function for choosing best matching rule for current cell (with info about surrounding cells) being treated. So for each chosen cell it can decide for 8 surroundings cells to open them, to mark them or to do nothing. I mean. at the moment, the agent gets as an input some revealed cell and decides what to do with surrounding cells (at the moment, the agent do not know, how to decide which cell to treat).
My question is, what algorithm to implement for deciding which cell to treat?
Suppose, for, the first move, the agent will reveal a corner cell (or some other, according to some rule for the first move). What to do after that?
I understand that I need to implement some kind of search. I know many search algorithms (BFS, DFS, A-STAR and others), that is not the problem, I just do not understand how can I use here these searches.
I need to implement it in a principles of Artificial Intelligence: A modern approach.
BFS, DFS, and A* are probably not appropriate here. Those algorithms are good if you are trying to plan out a course of action when you have complete knowledge of the world. In Minesweeper, you don't have such knowledge.
Instead, I would suggest trying to use some of the logical inference techniques from Section III of the book, particularly using SAT or the techniques from Chapter 10. This will let you draw conclusions about where the mines are using facts like "one of the following eight squares is a mine, and exactly two of the following eight squares is a mine." Doing this at each step will help you identify where the mines are, or realize that you must guess before continuing.
Hope this helps!
I ported this (with a bit of help). Here is the link to it working: http://robertleeplummerjr.github.io/smartSweepers.js/ . Here is the project: https://github.com/robertleeplummerjr/smartSweepers.js
Have fun!

How to use MinMax trees with Q-Learning?

How to use MinMax trees with Q-Learning?
I want to implement a Q-Learning connect four agent and heard that adding MinMax trees into it helps.
Q-learning is a Temporal difference learning algorithm. For every possible state (board), it learns the value of the available actions (moves). However, it is not suitable for use with Minimax, because the Minimax algorithm needs an evaluation function that returns the value of a position, not the value of an action at that position.
However, temporal difference methods can be used to learn such an evaluation function. Most notably, Gerald Tesauro used the TD(λ) ("TD lambda") algorithm to create TD-Gammon, a human-competitive Backgammon playing program. He wrote an article describing the approach, which you can find here.
TD(λ) was later extended to TDLeaf(λ), specifically to better deal with Minimax searches. TDLeaf(λ) has been used, for example, in the chess program KnightCap. You can read about TDLeaf in this paper.
Minimax allows you to look a number of moves into the future and play in a way to maximize your chances of scoring in that timespan. This is good for Connect-4, where a game can end almost at any moment and the number of moves available at each turn is not very large. Q-Learning would provide you with a value function to guide the Minimax search.
Littman has used minimax with Q learning. Hence proposed Minimix-Q learning algorithm in his famous and pioneering work Markov Games as a framework for multiagent reinforcement learning. His work is on zero-sum game in multiagent settings. Later Hu & Wellman extended his work to develop NashQ learning which you can find here.

How to create a reasonable AI?

I'm creating a logic game based on Fox and Hounds game. The player plays the fox and AI plays the hounds. (as far as I can see) I managed to make the AI perfect, so it never loses. Leaving it as such would not be much fun for human players.
Now, I have to dumb-down the AI so human can win, but I'm not sure how. The current AI logic is based on pattern-matching - if I introduce random moves which make the board go out of pattern space the AI would most probably play dumb until the end of the game.
I'm also thinking about removing a set of patterns, so it would seem as AI does not know that "trick" but this way players could find a way to beat the computer using the same moves every time.
Any ideas how to dumb down the AI in such way that is does not go from "genius" to "completely dumb" in a single move?
We used MinMax as the AI algorithm for our game and we implemented the AI levels by setting different depth for each level
I ended up creating a couple of quasi-smart pattern plays (as if a 10 year old might play) so it is not completely dumb, and then I pick one or two of those at random before the game starts. This way the game is always beatable, but the player does not know how (i.e. he cannot use the same strategy to always win, he has to explore for weak spot first).
If your game is a zero-sum one, the MiniMax algorithm with an alpha-beta optimization is a good choice. You can create difficulty levels by making the search stop when the algorithm reaches a certain depth.
Since I'm not an expert game developer and my AI is actually too dumb at the moment, I'd make a sort of 'learning AI'. Say you keep all your regexes disabled and you enable them once the player uses them.
I think this is a zero-sum game, and you can use the MinMax algorithm to solve the game instead of pattern matching, in that way by controlling the search depth you can control the level of expertise of the agent.
On the other hand you can use the A* search to determine the best move for a given fox/hound. And choosing different heuristics you can control the effectiveness if the agent.

Resources