What do we mean by "controllable actions" in a POMDP? - artificial-intelligence

I have some questions related to POMDPs.
What do we mean by controllable actions in a partially observable Markov decision process? Or no controllable actions in hidden Markov states?
When computing policies through value or policy iteration, could we say that the POMDP is an expert system (because we model the environment)? While, when using Q-learning, it is a more flexible system in terms of intelligence or adaptability to a changing environment?

Actions
Controllable actions are the results of choices that the decision maker makes. In the classic POMDP tiger problem, there is a tiger hidden behind one of two doors. At each time step, the decision maker can choose to listen or to open one of the doors. The actions in this scenario are {listen, open left door, open right door}. The transition function from one state to another depends on both the previous state and the action chosen.
In a hidden Markov model (HMM), there are no actions for the decision maker. In the tiger problem context, this means the participant can only listen without opening doors. In this case, the transition function only depends on the previous state, since there are no actions.
For more details on the tiger problem, see Kaelbling Littman and Cassandra's 1998 POMDP paper, Section 5.1. There's also a more introductory walk-through available in this tutorial.
Adaptability
The basic intuition in your question is correct, but can be refined. POMDPs are a class of models, whereas Q-learning is a solution technique. The basic difference in your question is between model-based and model-free approaches. POMDPs are model-based, although the partial observability allows for additional uncertainty. Reinforcement learning can be applied in a model-free context, with Q-learning. The model-free approach will be more flexible for non-stationary problems. That being said, depending on the complexity of the problem, you could incorporate the non-stationarity into the model itself and treat it as an MDP.
There's a very thorough discussion on these non-stationary modelling trade-offs in the answer to this question.
Lastly, it is correct that POMDP's can be considered expert systems. Mazumdar et al (2017) have suggested treating Markov decision processes (MDPs) as expert systems.

Related

How to model UNO as a POMDP

I am trying to model UNO card game as Partially Observable Markov Decision Processes(POMDPs) . I did little bit of research, and came to conclusion that, the states will be the number of cards, the actions will be either to play or pick the card from unseen card deck. I am facing difficulty in formulating the state transition and observation model. I think, that observation model will depend on past actions and observation(History), but for that I need to relax Markov Assumption. I want to know that relaxing the Markov Assumption is better choice or not? Additionally, how exactly should I form the state and observation model.Thanks in advance.
I think in a POMDP the states should still be the "full truth" (position of all the cards) and the transitions are simply the rules of the game (including the strategy of the other players?!). The observations should certainly not depend on any history, only on the state, or else you're violating the Markov assumption. The point of a POMDP is that the agent can gain information about the current state by analyzing history. I'm not really sure if or how this applies to UNO, though. If you know which cards have been played and their order, can you still gain information by using the history? Probably not. Not sure, but maybe it does not make sense to think of this game as a POMDP, even if you use a solution that was designed for a POMDP.

Cricket as a state machine

I'm creating a cricket manager stats game. I need to create a ball-by-ball simulation of the game. The game/ball outcome will be influenced by player stats and other external factors like weather or chosen tactics.
I've been reading around that most of the games can be implemented as a state machine, which sounds appealing to me, but because I'm a newbie at cricket I'm failing to envision this game as a state machine.
Should the Ball be a state machine or the match or the player or all 3. I'm also not sure how will i orchestrate this state machines (through events).
I'm also having hard time identifying the States and transitions. Any help would be greatly appreciated.
So here's what I understand from your question - Your cricket manager game will simulate a match ball by ball depending on player stats (bowler's skill/experience, batsman's skill/exp, fielding/wicketkeeping stats, so on...) and other related variables. From my understanding this will be more of an algorithmic engine rather than a visual representation of a cricket game.
Now answering your question, first of all, I don't believe you're looking at FSMs the right way. An FSM is a piece of code designed such that at any point in it's lifetime, it is in one of many possible states of execution. Each state can and usually has (that's the point of it) a different update routine. Also, each state can transition to another state upon predefined triggers/events. What you need to understand is that states implement different behaviour for the same entity.
Now, "most of the games can be implemented as a state machine" - Not "a" state machine but rather a whole nest of state machines. Several manager classes in a game, the renderer, gameplay objects, menu systems, more or less everything works off a state machine of its own. Imagine a game character, say a boxer, for the purpose of this example. Some states you'll find in the 'CBoxer'(?) class will be 'Blocking', 'TakingHit', 'Dodge', RightUpper', 'LeftHook' and so on.
Keep in mind though, that FSMs are more of a design construct - a way to envision the solution to the problem at hand. You don't HAVE to necessarily use them. You could make a complete game without a state machine(I think :) ). But FSMs make your code design really intuitive and straightforward, and it's frankly difficult to not find one in any decent sized project.
I suggest you take a look at some code samples of FSMs at work. Once you get the idea behind it, you'll find yourself using them everywhere :)
As a first step you should go through the rules of cricket and your model for the ball outcome to summarise how previous balls affect a given ball.
Then identify what you need to keep track of, and whether it is convenient to use a state machine to represent it. For example, statistics are usually not very convenient to keep track of as FSMs.
With that information in mind, you should be able to build a model. Information you need to keep track of might be either state machines or an internal value of a particular state. The interactions between balls will dictate the transitions and the events circulating from one machine to another.

Learning the Structure of a Hierarchical Reinforcement Task

I've been studying hierachial reinforcement learning problems, and while a lot of papers propose interesting ways for learning a policy, they all seem to assume they know in advance a graph structure describing the actions in the domain. For example, The MAXQ Method for Hierarchial Reinforcement Learning by Dietterich describes a complex graph of actions and sub-tasks for a simple Taxi domain, but not how this graph was discovered. How would you learn the hierarchy of this graph, and not just the policy?
In Dietterich's MAXQ, the graph is constructed manually. It's considered to be a task for the system designer, in the same way that coming up with a representation space and reward functions are.
Depending on what you're trying to achieve, you might want to automatically decompose the state space, learn relevant features, or transfer experience from simple tasks to more complex ones.
I'd suggest you just start reading papers that refer to the MAXQ one you linked to. Without knowing what exactly what you want to achieve, I can't be very prescriptive (and I'm not really on top of all the current RL research), but you might find relevant ideas in the work of Luo, Bell & McCollum or the papers by Madden & Howley.
This paper describes one approach that is a good starting point:
N. Mehta, S. Ray, P. Tadepalli, and T. Dietterich. Automatic Discovery and Transfer of MAXQ Hierarchies. In International Conference on Machine Learning, 2008.
http://web.engr.oregonstate.edu/~mehtane/papers/hi-mat.pdf
Say there is this agent out there moving about doing things. You don't know its internal goals (task graph). How do you infer its goals?
In way way, this is impossible. Just as it is impossible for me to know what goal you had mind when you put that box down: maybe you were tired, maybe you saw a killer bee, maybe you had to pee....
You are trying to model an agent's internal goal structure. In order to do that you need some sort of guidance as to what are the set of possible goals and how these are represented by actions. In the research literature this problem has been studied under the terms "plan recognition" and also with the use of POMDP (partially observable markov decision process), but both of these techniques assume you do know something about the other agent's goals.
If you don't know anything about its goals, all you can do is either infer one of the above models (This is what we humans do. I assume others have the same goals I do. I never think, "Oh, he dropped his laptop, he must be ready to lay an egg" cse, he's a human.) or model it as a black box: a simple state-to-actions function then add internal states as needed (hmmmm, someone must have written a paper on this, but I don't know who).

Do you use Styrofoam balls to model your systems? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
[Objective-C]
Do you still use Styrofoam balls to model your systems, where each ball
represents a class?
Tom Love: We do, actually. We've also done a 3D animation version of
it, which we found to be nowhere near
as useful as the Styrofoam balls.
There's something about a physical,
conspicuous structure hanging from the
ceiling right in the middle of a
development project that's regularly
updated to provide not only the
structure of the system that you're
building, but also the current status
of each one of the classes.
We've done it on 19 projects the last time I've counted. One of them was 1,856 classes, which is big - actually, probably bigger than it should be. It was a big commercial project, so it needed to be somewhat big.
Masterminds of Programming
It is the first time I've read or heard about using styrofoam balls to model classes.
Is that a commonly used technique? And, how does that sort of modeling help us to design better the system?
If you have any photos to share which can show us how the classes are represented it'd be great!
Update: So, it seems that the material most people use is the paper. Styrofoam balls are actually oddballs, not a commonly used technique.
Noticeable techniques:
"paper plates and string" modeling, NealB
Post-it Notes on a whiteboard, Jason
Class-Responsibility-Collaboration cards, duffymo
Sheets of ruled paper taped to the wall, AMissico
Thank you all for the very good answers.
I found a couple of styrofoam models for:
Windows 95
and
Lotus Notes
(if that helps)
Actually, here's a Tom Love case study that shows a couple of his models.
This model may represent the least
expensive CASE tool on the market --
materials cost $20.35. It was more
useful than any CASE tools I have ever
used.
We used it in three important ways.
It fixed the number of classes that we would deliver in the finished
application and we did not allow new
ones to be added, unless existing ones
could be removed.
It was a very useful way to publicly document which classes had
been code reviewed (blue ribbons) and
tested (green ribbons).
It helped everyone understand what was being built and how much time and
effort it takes to do testing,
documentation and code reviews.
Edit: photo of object model
alt text http://img686.imageshack.us/img686/82/stryrofoamobjectmodel.jpg
The styrofoam ball model appears to date back to the mid 1990's - a time when CASE (Computer Aided Systems Analysis)
systems were all the rage.
At that time CASE systems promised significant benefits but were dismally slow,
buggy, unstable, overextended and downright awkward to use. Basically, long on potential but short on delivery.
I remember having a conversation with an analyst working on a different project from mine. Her team had
become so frustrated with their CASE system that they trashed it and resorted to "paper plates and string"
modeling. They reserved a meeting room, removed all the furniture, and laid out their process model using labeled
paper plates with strings (representing data flows) connecting them. She claimed it was much more
useful than the CASE system it replaced.
I suspect that the styrofoam ball model had similar roots.
Using styrofoam balls or paper plates fostered design "buy-in". If a team
finds something to rally around it naturally creates a common design focus. It is simple, concrete and
minimal - using it requires a lot
of face to face interaction and discussion. And that is where the value comes from. I suspect
if you brought a new person into the project and told them to bring themselves up-to-speed by
reviewing the "model" they would be "dead in the water". However, walk them through the
"model" and a real conversation would occur where all the required information need to
perform on the project would be imparted very quickly and efficiently.
Do I think styrofoam balls could become a sustainable modeling tool? No, I don't. They would be a real
pain to keep up to date in a changing environment. They convey little information. There are better tools available
today. And most importantly, if the team you are working with don't "buy" it, and they
probably won't, it will look really stupid - kind of like a sports team mascot, a rallying point
only if the team "buys it".
No, we don't do this. And in my 30-odd year history in the IT industry, I've never heard of anyone doing this.
The only way this could help you design better systems is by:
keeping the class count down since it's hard to build the styrofoam model; and
minimising changes, since updating it would be a serious pain in the rear end.
Other than those two dubious features, I can't see this as being very useful. I'd almost conclude it was some sort of prank. Far better to do some real work, I think.
Seriously, if we tried to model our application with styro coffee cups and straws, our bosses would be calling the men in white coats.
Post-it Notes on a whiteboard seem to be popular in the circles I travel in. Objects go on the Post-Its, and you rearrange them until you get your relationships the way you want em.
And then there are the Color Modeling people who use a 4-pack of colored Post-Its and assign an archetype to each color. It doesn't sound like this is much of an improvement, but standing across a room looking at it, you can tell where there are missing features or unidentified objects in the system.
There is one application to this that I think we tend to forget-- using tools to articulate an architecture comes naturally to us after years in the industry, but there are valuable, albeit less technically-minded, stakeholders who may not grasp vital concepts as readily. It would sometimes be a lifesaver to point to a cluster of balls and say, "This is the Language Processing Model, and if I implement the feature you want, it will have consequences here, here, and here. You can see that there are a lot of balls connected there".
Architects, be they designing buildings or systems, might rely on those tangible models to indoctrinate the check writers into the process.
And I thought that UML was useless. The styrofoam ball model makes UML look positively elegant by comparison.
Ward Cunningham's CRC card idea is more useful, even cheaper, and still retains that tactile quality that Dr. Love was after.
I had never heard of the idea until I read this question. It deserves an up vote for originality. And the "Windows" and "Lotus Notes" pictures are priceless.
Sheets of ruled paper taped to the wall, where each sheet is a component, class, entity, or whatever is needed. Everyone has a pencil.
Everyone can write on them "flushing" out the model during the design meetings. Such as, meeting notes, implemetation notes, new classes, removed classes, reasons why you do not have a particular class, and so on. After the design meeting, the principal designer takes them down and rewrite them, again "flushing" them out with pen in "rough-draft" versions. The designer can then make decisions based on the notes of each sheet, create new sheets for any additional components. Generate topics for next meeting, note any descrepancies, note any design / implementation details needed for coding, or whatever else they need to do.
Repeat the meetings until everyone is satisfied. Pencil is new stuff, pen is previous items. Once everyone is happy, the designer creates the working-draft, and posts where everyone can see and initial, in pen, their acceptance of the "working-draft".
Nothing is final. Pen versions are "latest" versions. Pencil versions are "work-in-progress" or "draft" versions.
Simple, fast, flexible, no wasting time on the computer, with high visiblity. Working man's Wiki.
No. My team does not do this.
And I am badly tempted to mock with image macros. But I'm contemplating that the idea is silly enough that it is self-mocking.

Any business examples of using Markov chains?

What business cases are there for using Markov chains? I've seen the sort of play area of a markov chain applied to someone's blog to write a fake post. I'd like some practical examples though? E.g. useful in business or prediction of stock market, or the like...
Edit: Thanks to all who gave examples, I upvoted each one as they were all useful.
Edit2: I selected the answer with the most detail as the accepted answer. All answers I upvoted.
The obvious one: Google's PageRank.
Hidden Markov models are based on a Markov chain and extensively used in speech recognition and especially bioinformatics.
I've seen spam email that was clearly generated using a Markov chain -- certainly that qualifies as a "business use". :)
There is a class of optimization methods based on Markov Chain Monte Carlo (MCMC) methods. These have been applied to a wide variety of practical problems, for example signal & image processing applications to data segmentation and classification. Speech & image recognition, time series analysis, lots of similar examples come out of computer vision and pattern recognition.
We use log-file chain-analysis to derive and promote secondary and tertiary links to otherwise-unrelated documents in our help-system (a collection of 10m docs).
This is especially helpful in bridging otherwise separate taxonomies. e.g. SQL docs vs. IIS docs.
I know AccessData uses them in their forensic password-cracking tools. It lets you explore the more likely password phrases first, resulting in faster password recovery (on average).
Markov chains are used by search companies like bing to infer the relevance of documents from the sequence of clicks made by users on the results page. The underlying user behaviour in a typical query session is modeled as a markov chain , with particular behaviours as state transitions...
for example if the document is relevant, a user may still examine more documents (but with a smaller probability) or else he may examine more documents (with a much larger probability).
There are some commercial Ray Tracing systems that implement Metropolis Light Transport (invented by Eric Veach, basically he applied metropolis hastings to ray tracing), and also Bi-Directional- and Importance-Sampling- Path Tracers use Markov-Chains.
The bold texts are googlable, I omitted further explanation for the sake of this thread.
We plan to use it for predictive text entry on a handheld device for data entry in an industrial environment. In a situation with a reasonable vocabulary size, transitions to the next word can be suggested based on frequency. Our initial testing suggests that this will work well for our needs.
IBM has CELM. Check out this link:
http://www.research.ibm.com/journal/rd/513/labbi.pdf
I recently stumbled on a blog example of using markov chains for creating test data...
http://github.com/emelski/code.melski.net/blob/master/markov/main.cpp
Markov model is a way of describing a process that goes through a series of states.
HMMs can be applied in many fields where the goal is to recover a data sequence that is not immediately observable (but depends on some other data on that sequence).
Common applications include:
Crypt-analysis, Speech recognition, Part-of-speech tagging, Machine translation, Stock Prediction, Gene prediction, Alignment of bio-sequences, Gesture Recognition, Activity recognition, Detecting browsing pattern of a user on a website.
Markov Chains can be used to simulate user interaction, f.g. when browsing service.
My friend was writing as diplom work plagiat recognision using Markov Chains (he said the input data must be whole books to succeed).
It may not be very 'business' but Markov Chains can be used to generate fictitious geographical and person names, especially in RPG games.
Markov Chains are used in life insurance, particularly in the permanent disability model. There are 3 states
0 - The life is healthy
1 - The life becomes disabled
2 - The life dies
In a permanent disability model the insurer may pay some sort of benefit if the insured becomes disabled and/or the life insurance benefit when the insured dies. The insurance company would then likely run a monte carlo simulation based on this Markov Chain to determine the likely cost of providing such an insurance.

Resources