I am trying to model UNO card game as Partially Observable Markov Decision Processes(POMDPs) . I did little bit of research, and came to conclusion that, the states will be the number of cards, the actions will be either to play or pick the card from unseen card deck. I am facing difficulty in formulating the state transition and observation model. I think, that observation model will depend on past actions and observation(History), but for that I need to relax Markov Assumption. I want to know that relaxing the Markov Assumption is better choice or not? Additionally, how exactly should I form the state and observation model.Thanks in advance.
I think in a POMDP the states should still be the "full truth" (position of all the cards) and the transitions are simply the rules of the game (including the strategy of the other players?!). The observations should certainly not depend on any history, only on the state, or else you're violating the Markov assumption. The point of a POMDP is that the agent can gain information about the current state by analyzing history. I'm not really sure if or how this applies to UNO, though. If you know which cards have been played and their order, can you still gain information by using the history? Probably not. Not sure, but maybe it does not make sense to think of this game as a POMDP, even if you use a solution that was designed for a POMDP.
Related
I have some questions related to POMDPs.
What do we mean by controllable actions in a partially observable Markov decision process? Or no controllable actions in hidden Markov states?
When computing policies through value or policy iteration, could we say that the POMDP is an expert system (because we model the environment)? While, when using Q-learning, it is a more flexible system in terms of intelligence or adaptability to a changing environment?
Actions
Controllable actions are the results of choices that the decision maker makes. In the classic POMDP tiger problem, there is a tiger hidden behind one of two doors. At each time step, the decision maker can choose to listen or to open one of the doors. The actions in this scenario are {listen, open left door, open right door}. The transition function from one state to another depends on both the previous state and the action chosen.
In a hidden Markov model (HMM), there are no actions for the decision maker. In the tiger problem context, this means the participant can only listen without opening doors. In this case, the transition function only depends on the previous state, since there are no actions.
For more details on the tiger problem, see Kaelbling Littman and Cassandra's 1998 POMDP paper, Section 5.1. There's also a more introductory walk-through available in this tutorial.
Adaptability
The basic intuition in your question is correct, but can be refined. POMDPs are a class of models, whereas Q-learning is a solution technique. The basic difference in your question is between model-based and model-free approaches. POMDPs are model-based, although the partial observability allows for additional uncertainty. Reinforcement learning can be applied in a model-free context, with Q-learning. The model-free approach will be more flexible for non-stationary problems. That being said, depending on the complexity of the problem, you could incorporate the non-stationarity into the model itself and treat it as an MDP.
There's a very thorough discussion on these non-stationary modelling trade-offs in the answer to this question.
Lastly, it is correct that POMDP's can be considered expert systems. Mazumdar et al (2017) have suggested treating Markov decision processes (MDPs) as expert systems.
I'm creating a cricket manager stats game. I need to create a ball-by-ball simulation of the game. The game/ball outcome will be influenced by player stats and other external factors like weather or chosen tactics.
I've been reading around that most of the games can be implemented as a state machine, which sounds appealing to me, but because I'm a newbie at cricket I'm failing to envision this game as a state machine.
Should the Ball be a state machine or the match or the player or all 3. I'm also not sure how will i orchestrate this state machines (through events).
I'm also having hard time identifying the States and transitions. Any help would be greatly appreciated.
So here's what I understand from your question - Your cricket manager game will simulate a match ball by ball depending on player stats (bowler's skill/experience, batsman's skill/exp, fielding/wicketkeeping stats, so on...) and other related variables. From my understanding this will be more of an algorithmic engine rather than a visual representation of a cricket game.
Now answering your question, first of all, I don't believe you're looking at FSMs the right way. An FSM is a piece of code designed such that at any point in it's lifetime, it is in one of many possible states of execution. Each state can and usually has (that's the point of it) a different update routine. Also, each state can transition to another state upon predefined triggers/events. What you need to understand is that states implement different behaviour for the same entity.
Now, "most of the games can be implemented as a state machine" - Not "a" state machine but rather a whole nest of state machines. Several manager classes in a game, the renderer, gameplay objects, menu systems, more or less everything works off a state machine of its own. Imagine a game character, say a boxer, for the purpose of this example. Some states you'll find in the 'CBoxer'(?) class will be 'Blocking', 'TakingHit', 'Dodge', RightUpper', 'LeftHook' and so on.
Keep in mind though, that FSMs are more of a design construct - a way to envision the solution to the problem at hand. You don't HAVE to necessarily use them. You could make a complete game without a state machine(I think :) ). But FSMs make your code design really intuitive and straightforward, and it's frankly difficult to not find one in any decent sized project.
I suggest you take a look at some code samples of FSMs at work. Once you get the idea behind it, you'll find yourself using them everywhere :)
As a first step you should go through the rules of cricket and your model for the ball outcome to summarise how previous balls affect a given ball.
Then identify what you need to keep track of, and whether it is convenient to use a state machine to represent it. For example, statistics are usually not very convenient to keep track of as FSMs.
With that information in mind, you should be able to build a model. Information you need to keep track of might be either state machines or an internal value of a particular state. The interactions between balls will dictate the transitions and the events circulating from one machine to another.
I'm creating a game that requires the units onscreen to fight eachother, based on teams and designated enemies for each team. The player doesn't control any of the tanks or teams.
The issue is that the battle between the units (tanks at the moment) should be interesting enough to the player that they can watch and have fun without doing anything.
I currently have the tanks moving around totally randomly and shooting at each other when in range, but I'm looking for something smarter.
What types of ai and ai algorithms should I look into? All ideas are welcome, I simply want to make every battle interesting.
For strategies and tactics, your AI probably needs to do some rational decision making to make it look smarter. There are many ways to to this, the simplest way is write down a couple of condition-action rules for your tanks and implement them as a finite state machine. FSMs are simple to implement and easy to debug, but it gets tedious later when you want to revise the condition rules or add/remove any states. You can also use utility agents - the AI performs utility check on each potential goal (e.g. engage, retreat, reload/refuel, take cover, repair, etc.) based on current stats (ammo, health, enemy counts and locations) periodically and then chooses the most preferable goal. This take more time to implement compared to FSM, but it's more flexible in the way that you don't need to change the decision flow when you need to add or remove behaviors. It makes the AI look like it follows a general rule but not always predictable. Utility agent is also harder to debug and control because you don't have any rigid condition-action rules to trace like you do with FSM when your AI goes crazy. Another popular method is behavior tree. Action sequences are implemented as a tree structure. It requires more code to write upfront but usually gives you a better balance between control and flexibility than FSM and utility agent. These decision making processes are not mutually exclusive - you can any method for top level strategies and a different method for low level tactics.
Whatever decision making process you choose, you need some input to feed to your AI. You can use influence map help AI determine where in the battlefield is considered hostile and where is considered safe. Influence map is shared among the team so it can also help with group tactics. When your AI engages multiple enemies, selecting a right target is important. If your AIs pick a target that most human player wouldn't, the player is gonna feel the AI is "stupid", even when sometimes the chosen target is actually the best one. You can run distance check on the enemy units and filter/prioritize the target with line of sight, current weapon range, threat level, etc. Some tests are more expensive than others (line of sight check is usually one of the worst offender) so if you have a lot of enemy units in range you want to run those slower tests the last.
For tanks' movement, look into steering behaviors. It covers a lot of vehicle movement behaviors but pursue and evade are the ones that you need the most. Also look into A* for pathfinding if your tanks need to navigate around a complex terrain. There are other good pathing solutions that give you the shortest/fastest path, but in a game the shortest/fastest path is not always the optimal path. If your shortest path is open but too close to the enemy line, you want to give your tank some heuristic to take a different route. You can easily configure your path preference with A*.
Things to look into: finite state machine, utility based agent, behavior tree, steering behaviors, a* search algorithm, navigation waypoints or navigation mesh, influence map.
The simplest thing would be to have them drive in a random direction and when there is an enemy tank within range, they start shooting until one of them is destroyed. You could also have them randomly retreat when their health gets too low. You could also try adding group tactics where any tank that is not engaged will join (with some proability so that maybe it will, maybe it won't - just to keep things interesting) it's nearest neighbour in combat.
If you're looking for algorithms, A* ("A-Star") is a generic path-finding algorithm that could help your tanks move around, but I don't know of any generic algorithms to control the battles.
I've programmed a non-directional neural network. So kind of like the brain, all neurons are updated at the same time, and there are no explicit layers.
Now I'm wondering, how does pain work? How can I structure a neural network so that a "pain" signal will make it want to do anything to get rid of said pain.
It doesn't really work quite like that. The network you have described is too simple to have a concept like pain that it would try to get rid of. On a low level it's nothing but just another input, but obviously that doesn't make the network "dislike" it.
In order to gain such a signal, you could train the network to perform certain actions when it receives this particular signal. As it becomes more refined, this signal starts looking like a real pain signal, but it's nothing more than a specific training of the network.
The pain signal in higher animals has this "do anything to get rid of it" response because higher animals have rather advanced cognitive abilities compared to the network you have described. Worms, on the other hand, might respond in a very specific way to a "pain" input - twitch a certain way. It's hard-wired that way, and to say that the worm tries to do anything to get rid of the signal would be wrong; it's more like a motor connected to a button that spins every time you press the button.
Realistic mechanisms for getting artificial neural networks to do useful things are collectively known as "neural network training", and is a large and complex research area. You can google for this phrase to get various ideas.
You should be aware, however, that neural networks are not a panacea for solving hard problems; they don't automatically get things done through magic. Using them effectively requires a good deal of experimentation with traning algorithm tweaks and network parameter tweaks.
I don't know much (if anything) about AI theory, except that we are still looking for a way to give AI the model it needs to reason and think and ponder like real humans do. (We're still looking for the key - and maybe it's pain.)
Most of my adult life has been focused on computer programming and studying and understanding the mind.
I am writing here because I think that PAIN might be the missing link. (Also stackoverflow rocks right now.) I know that creating a model that actually enables higher thinking is a large leap, but I just had this amazing aha-type moment and had to share it. :)
In my studies of Buddhism, I learned of a scientist who studied leprosy cases. The reason lepers become deformed is because they don't feel pain when they come into contact with damaging forces. It's here that science and Buddhist reasoning collide in a fundamental truth.
Pain is what keeps us alive, defines our boundaries, and shapes how we make our choices and our world-view.
In an AI model, the principle would be to define a series of forces perhaps, that are constantly at play. The idea is to keep the mind alive.
The concept of ideas having life is something we humans also seem to play out. When someone "kills" your idea, by proving it wrong, at first, there is a resistance to the "death" of the idea. In fact, it takes a lot sometimes, to force an idea to be changed. We all know stubborn people... It has been said that the "death" of an idea, is the "death" of part of one's ego. The ego is always trying to build itself up.
So you see, to give AI an ego, you must give it pain, and then it will have to fight to build "safe" thoughts so that it may grow it's own ideas and eventually human psychosis and "consciousness".
Artificial neural networks do not recognize such a thing as "pain", but may actually be trained in order to avoid certain states. In a Hopfield network, the final state of the network is attained at the energy minimum that is closest to the starting state. The starting state in this context is the state where the network is at "pain". If you train the network to have its local energy minimum at a state where the "pain" is gone, it should modify itself until that state is achieved. A simple way to train a Hopfield network is assigning a weight to the interactions between neurons. That weight is decided according to Hebb's rule, which is given by: Wij = (1/n) * [i] * [j].
Wij is the weight of the connection between neuron i and neuron j, n is the total number of neurons in the matrix,and [i] and [j] are the states of neurons i and j, respectively, which can have values of 1 or -1. Once you have completed the weight matrix for a state in which the "pain" does not exist, the network should shift most of the time towards that state without mattering the initial state.
Think of Neural Networks as a multi-dimensional plane. Training a Neural Network is basically placing high and low points in the plane. The plane supports the "weights" and forms a depression around them. A depression in the plane is a desired output, and a highland is an undesired output. The idea of a neural network is to put the depressions in the areas that matter. Pain would look like a giant mountain. So an input neuron representing pain would have a very high probability of producing an undesired output.
But pain isn't the only thing that makes a creature behave the way it does. Pain to a tree doesn't cause much of a reaction. In animals, pain causes physiological reactions such as a surge in adrenaline. This causes a heightened state of awareness and a big uptick in energy consumption. To model the behavior of pain, you must provide a model of these mechanisms so that a stimulus of pain provides the appropriate output. In a NN, I imagine that it would need to be a Recursive Neural Network so that the pain has a duration proportionate to the input, so that the creature you are modeling avoids the pain for longer than the pain stimulus duration. This would be a healing period.
NNs tend to be more tree-like. By modeling an energy state with an energy cost, the creature would use minimal energy to survive, but use a lot of energy if by doing so, it moves it into the desired state faster than the cost of remaining in the undesired pain state. Going back to the hyperplane, this would look like a higher velocity off of the pain highland and into a desired "safe" depression. The vector's magnitude into the nearest depression is the motivation level of the NN to avoid pain. Training should naturally do this by adding heavy negative weights and biases to the pain inputs by always making the pain input result in a wrong answer, assuming the energy and awareness reaction is modeled into a recursive neural net.
I may have a partial answer to this question of how pain can be expressed in a neural network. For reference, the base network I use is an HTM algorithm. It is essentially a series of interconnected layers, each predicting their next input, correct predictions are rewarded using hebbian logic.
Theoretically, there could be some connections between layers that are gated, and this gate can only be opened by sufficient activation in another layer. This other layer would be rigged to only learn to recognize new patterns in the context of the pain trigger. Therefore, in the presence of pain-anticipated stimulus, the gated channel would be opened, creating a simulated attention system for the recognition of future pain. While this is not pain in itself, it is similar to fear.
A friend of mine is beginning to build a NetHack bot (a bot that plays the Roguelike game: NetHack). There is a very good working bot for the similar game Angband, but it works partially because of the ease in going back to the town and always being able to scum low levels to gain items.
In NetHack, the problem is much more difficult, because the game rewards ballsy experimentation and is built basically as 1,000 edge cases.
Recently I suggested using some kind of naive bayesian analysis, in very much the same way spam is created.
Basically the bot would at first build a corpus, by trying every possible action with every item or creature it finds and storing that information with, for instance, how close to a death, injury of negative effect it was. Over time it seems like you could generate a reasonably playable model.
Can anyone point us in the right direction of what a good start would be? Am I barking up the wrong tree or misunderstanding the idea of bayesian analysis?
Edit: My friend put up a github repo of his NetHack patch that allows python bindings. It's still in a pretty primitive state but if anyone's interested...
Although Bayesian analysis encompasses much more, the Naive Bayes algorithm well known from spam filters is based on one very fundamental assumption: all variables are essentially independent of each other. So for instance, in spam filtering each word is usually treated as a variable so this means assuming that if the email contains the word 'viagra', that knowledge does affect the probability that it will also contain the word 'medicine' (or 'foo' or 'spam' or anything else). The interesting thing is that this assumption is quite obviously false when it comes to natural language but still manages to produce reasonable results.
Now one way people sometimes get around the independence assumption is to define variables that are technically combinations of things (like searching for the token 'buy viagra'). That can work if you know specific cases to look for but in general, in a game environment, it means that you can't generally remember anything. So each time you have to move, perform an action, etc, its completely independent of anything else you've done so far. I would say for even the simplest games, this is a very inefficient way to go about learning the game.
I would suggest looking into using q-learning instead. Most of the examples you'll find are usually just simple games anyway (like learning to navigate a map while avoiding walls, traps, monsters, etc). Reinforcement learning is a type of online unsupervised learning that does really well in situations that can be modeled as an agent interacting with an environment, like a game (or robots). It does this trying to figure out what the optimal action is at each state in the environment (where each state can include as many variables as needed, much more than just 'where am i'). The trick then is maintain just enough state that helps the bot make good decisions without having a distinct point in your state 'space' for every possible combination of previous actions.
To put that in more concrete terms, if you were to build a chess bot you would probably have trouble if you tried to create a decision policy that made decisions based on all previous moves since the set of all possible combinations of chess moves grows really quickly. Even a simpler model of where every piece is on the board is still a very large state space so you have to find a way to simplify what you keep track of. But notice that you do get to keep track of some state so that your bot doesn't just keep trying to make a left term into a wall over and over again.
The wikipedia article is pretty jargon heavy but this tutorial does a much better job translating the concepts into real world examples.
The one catch is that you do need to be able to define rewards to provide as the positive 'reinforcement'. That is you need to be able to define the states that the bot is trying to get to, otherwise it will just continue forever.
There is precedent: the monstrous rog-o-matic program succeeded in playing rogue and even returned with the amulet of Yendor a few times. Unfortunately, rogue was only released an a binary, not source, so it has died (unless you can set up a 4.3BSD system on a MicroVAX), leaving rog-o-matic unable to play any of the clones. It just hangs cos they're not close enough emulations.
However, rog-o-matic is, I think, my favourite program of all time, not only because of what it achieved but because of the readability of the code and the comprehensible intelligence of its algorithms. It used "genetic inheritance": a new player would inherit a combination of preferences from a previous pair of successful players, with some random offset, then be pitted against the machine. More successful preferences would migrate up in the gene pool and less successful ones down.
The source can be hard to find these days, but searching "rogomatic" will set you on the path.
I doubt bayesian analysis will get you far because most of NetHack is highly contextual. There are very few actions which are always a bad idea; most are also life-savers in the "right" situation (an extreme example is eating a cockatrice: that's bad, unless you are starving and currently polymorphed into a stone-resistant monster, in which case eating the cockatrice is the right thing to do). Some of those "almost bad" actions are required to win the game (e.g. coming up the stairs on level 1, or deliberately falling in traps to reach Gehennom).
What you could try would be trying to do it at the "meta" level. Design the bot as choosing randomly among a variety of "elementary behaviors". Then try to measure how these bots fare. Then extract the combinations of behaviors which seem to promote survival; bayesian analysis could do that among a wide corpus of games along with their "success level". For instance, if there are behaviors "pick up daggers" and "avoid engaging monsters in melee", I would assume that analysis would show that those two behaviors fit well together: bots which pick daggers up without using them, and bots which try to throw missiles at monsters without gathering such missiles, will probably fare worse.
This somehow mimics what learning gamers often ask for in rec.games.roguelike.nethack. Most questions are similar to: "should I drink unknown potions to identify them ?" or "what level should be my character before going that deep in the dungeon ?". Answers to those questions heavily depend on what else the player is doing, and there is no good absolute answer.
A difficult point here is how to measure the success at survival. If you simply try to maximize the time spent before dying, then you will favor bots which never leave the first levels; those may live long but will never win the game. If you measure success by how deep the character goes before dying then the best bots will be archeologists (who start with a pick-axe) in a digging frenzy.
Apparently there are a good number of Nethack bots out there. Check out this listing:
In nethack unknown actions usually have a boolean effect -- either you gain or you loose. Bayesian networks base around "fuzzy logic" values -- an action may give a gain with a given probability. Hence, you don't need a bayesian network, just a list of "discovered effects" and wether they are good or bad.
No need to eat the Cockatrice again, is there?
All in all it depends how much "knowledge" you want to give the bot as starters. Do you want him to learn everything "the hard way", or will you feed him spoilers 'till he's stuffed?