Quantification over actions - artificial-intelligence

Looking for examples that show me how to quantify over actions (and perhaps fluents?) in situation calculus (Reiter 2001).
I understand the difference between actions, fluents and situations, but why do they need to be represented in 2nd order logic? Why not use first order? Can you please explain?

All but a few formulas used to encode a dynamic world require Second Order Logic (SOL). In particular,
the [properties of the] Initial State
the Action Preconditions
the Action Effects
the Successor State axioms (used to circumvent the "Frame problem")
can all be expressed with First Order Logic (FOL).
With some domains it may be convenient -and more concise- to use SOL for the above items, but AFAIK, it is always possible to convert such SOL to FOL in the context of the items listed above, for a finite domain, and hence SOL is not necessary (again, for the above items).
Typically the need for SOL with Situation Calculus only comes from some of the "Foundational axioms" such as the axiom to perform induction on Situations.
Furthermore, depending on the particular application, one may not need the SOL-based Foundational axiom(s) and hence the whole world can be exclusively described in terms of FOL.
I'm not a specialist in the field, but I think that in many cases, for the purpose of [Logical] Filtering, Planning, Temporal Projection, we can do away with the need for induction and hence only rely on FOL only.

Related

Reinforcement Learning in Dynamic Environment with large state -action space

I have a 500*500 grid with 7 different penalty values. I need to make an RL agent whose action space contains 11 actions. (Left, Right, Up, Down, 4 Diagonal Directions, Speed Up, Speed Down And Normal Speed). How can I solve this problem?
The Probability of 'action performed' which was chosen is 0.8. Otherwise a random action is selected. Also, the penalty values can change dynamically.
Take a look at this chapter by Sutton incompleteideas.net/sutton/book/ebook/node15.html, especially his experiments in later sections. Your problem seems similar to the N-Armed bandit in that each of the arms returns a normal distribution of reward. While this chapter mostly focuses on exploration, the problem applies.
Another way to look at it, is if your state really returns a normal distribution of penalties, you will need to sufficiently explore the domain to get the mean of the state, action tuple. The mean in these cases is Q*, which will give you the optimal policy.
As a follow up, if the state space is too large or continuous, it may be worth looking into generalization with a function approximator. While the same convergence rules apply, there are cases where function approximations run into issues. I would say that is beyond the scope of this discussion though.

Is the HTM cortical learning algorithm defined by Numenta's paper restricted by Euclidean geometry?

Specifically, their most recent implementation.
http://www.numenta.com/htm-overview/htm-algorithms.php
Essentially, I'm asking whether non-euclidean relationships, or relationships in patterns that exceed the dimensionality of the inputs, can be effectively inferred by the algorithm in its present state?
HTM uses Euclidean geometry to determine "neighborship" when analyzing patterns. Consistently framed input causes the algorithm to exhibit predictive behavior, and sequence length is practically unlimited. This algorithm learns very well - but I'm wondering whether it has the capacity to infer nonlinear attributes from its input data.
For example, if you input the entire set of texts from Project Gutenberg, it's going to pick up on the set of probabilistic rules that comprise English spelling, grammar, and readily apparent features from the subject matter, such as gender associations with words, and so forth. These are first level "linear" relations, and can be easily defined with probabilities in a logical network.
A nonlinear relation would be an association of assumptions and implications, such as "Time flies like an arrow, fruit flies like a banana." If correctly framed, the ambiguity of the sentence causes a predictive interpretation of the sentence to generate many possible meanings.
If the algorithm is capable of "understanding" nonlinear relations, then it would be able to process the first phrase and correctly identify that "Time flies" is talking about time doing something, and "fruit flies" are a type of bug.
The answer to the question is probably a simple one to find, but I can't decide either way. Does mapping down the input into a uniform, 2d, Euclidean plane preclude the association of nonlinear attributes of the data?
If it doesn't prevent nonlinear associations, my assumption would then be that you could simply vary the resolution, repetition, and other input attributes to automate the discovery of nonlinear relations - in effect, adding a "think harder" process to the algorithm.
From what I understand of HTM's, the structure of layers and columns mimics the structure of the neocortex. See appendix B here: http://www.numenta.com/htm-overview/education/HTM_CorticalLearningAlgorithms.pdf
So the short answer would be that since the brain can understand non-linear phenomenon with this structure, so can an HTM.
Initial, instantaneous sensory input is indeed mapped to 2D regions within an HTM. This does not limit HTM's to dealing with 2D representations any more than a one dimensional string of bits is limited to representing only one dimensional things. It's just a way of encoding stuff so that sparse distributed representations can be formed and their efficiencies can be taken advantage of.
To answer your question about Project Gutenberg, I don't think an HTM will really understand language without first understanding the physical world on which language is based and creates symbols for. That said, this is a very interesting sequence for an HTM, since predictions are only made in one direction, and in a way the understanding of what's happening to the fruit goes backwards. i.e. I see the pattern 'flies like a' and assume the phrase applies to the fruit the same way it did to time. HTM's do group subsequent input (words in this case) together at higher levels, so if you used Fuzzy Grouping (perhaps) as Davide Maltoni has shown to be effective, the two halves of the sentence could be grouped together into the same high level representation and feedback could be sent down linking the two specific sentences. Numenta, to my knowledge has not done too much with feedback messages yet, but it's definitely part of the theory.
The software which runs the HTM is called NuPIC (Numenta Platform for Intelligent Computing). A NuPIC region (representing a region of neocortex) can be configured to either use topology or not, depending on the type of data it's receiving.
If you use topology, the usual setup maps each column to a set of inputs which is centred on the corresponding position in the input space (the connections will be selected randomly according to a probability distribution which favours the centre). The spatial pattern recognising component of NuPIC, known as the Spatial Pooler (SP), will then learn to recognise and represent localised topological features in the data.
There is absolutely no restriction on the "linearity" of the input data which NuPIC can learn. NuPIC can learn sequences of spatial patterns in extremely high-dimensional spaces, and is limited only by the presence (or lack of) spatial and temporal structure in the data.
To answer the specific part of your question, yes, NuPIC can learn non-Euclidean and non-linear relationships, because NuPIC is not, and cannot be modelled by, a linear system. On the other hand, it seems logically impossible to infer relationships of a dimensionality which exceeds that of the data.
The best place to find out about HTM and NuPIC, its Open Source implementation, is at NuPIC's community website (and mailing list).
Yes, It can do non-linear. Basically it is multilayer. And all multilayer neural networks can infer non linear relationships. And I think the neighborship is calculated locally. If it is calcualted locally then globally it can be piece wise non linear for example look at Local Linear Embedding.
Yes HTM uses euclidean geometry to connect synapses, but this is only because it is mimicking a biological system that sends out dendrites and creates connections to other nearby cells that have strong activation at that point in time.
The Cortical Learning Algorithm (CLA) is very good at predicting sequences, so it would be good at determining "Time flies like an arrow, fruit flies like a" and predict "banana" if it has encountered this sequence before or something close to it. I don't think it could infer that a fruit fly is a type of insect unless you trained it on that sequence. Thus the T for Temporal. HTMs are sequence association compressors and retrievers (a form of memory). To get the pattern out of the HTM you play in a sequence and it will match the strongest representation it has encountered to date and predict the next bits of the sequence. It seems to be very good at this and the main application for HTMs right now are predicting sequences and anomalies out of streams of data.
To get more complex representations and more abstraction you would cascade a trained HTMs outputs to another HTMs inputs along with some other new sequence based input to correlate to. I suppose you could wire in some feedback and do some other tricks to combine multiple HTMs, but you would need lots of training on primitives first, just like a baby does, before you will ever get something as sophisticated as associating concepts based on syntax of the written word.
ok guys, dont get silly, htms just copy data into them, if you want a concept, its going to be a group of the data, and then you can have motor depend on the relation, and then it all works.
our cortex, is probably way better, and actually generates new images, but a computer cortex WONT, but as it happens, it doesnt matter, and its very very useful already.
but drawing concepts from a data pool, is tricky, the easiest way to do it is by recording an invarient combination of its senses, and when it comes up, associate everything else to it, this will give you organism or animal like intelligence.
drawing harder relations, is what humans do, and its ad hoc logic, imagine a set explaining the most ad hoc relation, and then it slowly gets more and more specific, until it gets to exact motor programs... and all knowledge you have is controlling your motor, and making relations that trigger pathways in the cortex, and tell it where to go, from the blast search that checks all motor, and finds the most successful trigger.
woah that was a mouthful, but watch out dummies, you wont get no concepts from a predictive assimilator, which is what htm is, unless you work out how people draw relations in the data pool, like a machine, and if you do that, its like a program thats programming itself.
no shit.

Structured, factored and atomic representation?

I am currently reading "Artificial Intelligence: A modern Approach". Though the terminology factored, structured and atomic representation is confusing what do these mean exactly?
In relation with programming...
Thanks
I'm not thrilled with the lines that Russell and Norvig draw, but: Generally, when you're using AI techniques to solve a problem, you're going to have a programmed model of the situation. Atomic/factored/structured is a qualitative measure of how much "internal structure" those models have, from least to most.
Atomic models have no internal structure; the state either does or does not match what you're looking for. In a sliding tile puzzle, for instance, you either have the correct alignment of tiles or you do not.
Factored models have more internal structure, although exactly what will depend on the problem. Typically, you're looking at variables or performance metrics of interest; in a sliding puzzle, this might be a simple heuristic like "number of tiles out of place," or "sum of manhatten distances."
Structured models have still more; again, exactly what depends on the problem, but they're often relations either of components of the model to itself, or components of the model to components of the environment.
It is very easy, especially when looking at very simple problems like the sliding tile, to unconsciously do all the hard intelligence work yourself, at a glance, and forget that your model doesn't have all your insight. For example, if you were to make a program to do a graph search technique on the sliding puzzle, you'd probably make some engine that took as input a puzzle state and an action, and generated a new puzzle state from that. The puzzle states are still atomic, but you the programmer are using a much more detailed model to link those inputs and outputs together.
I like explanation given by Novak. My 2 cents is to make clarity on difference between factored vs structured. Here is extract from definitions:
An atomic representation is one in which each state is treated as a
black box.
A factored representation is one in which the states are
defined by set of features.
A structured representation is one in which the states are expressed in form of objects and relations between them. Such knowledge about relations called facts.
Examples:
atomicState == goal: Y/N // Is goal reached?
It is the only question we can ask to black box.
factoredState{18} == goal{42}: N // Is goal reached?
diff( goal{42}, factoredState{18}) = 24 // How much is difference?
// some other questions. the more features => more type of questions
The simplest factored state must have at least one feature (of some type), that gives us ability to ask more questions. Normally it defines quantitative difference between states. The example has one feature of integer type.
11grade#schoolA{John(Math=A-), Marry(Music=A+), Job1(doMath)..} == goal{50% ready for jobs}
The key here - structured representation, allows higher level of formal logical reasoning at the search. See First-Order Logic #berkley for introductory info.
This subject easily confuses a practitioner (especially beginner) but makes great sense for comparing different goal searching algorithms. Such "world" state representation classification logically separates algorithms into different classes. It is very useful to draw lines in academic research and compare apples to apples when reasoning academically.

Adaptive Neuro-Fuzzy Inference System (ANFIS)

Do you have an example or an explanation of ANFIS (Adaptive Neuro-Fuzzy Inference System), I am reading that this could be applied to classify some diseases, What do you think about it?
Usually in order to develop a fuzzy system you have to determine the if-then rules, suitable membership functions, and their parameters. This is not always a trivial task, especially the development of correct if-then rules may be time consuming as we first have to "extract" the expert knowledge somehow.
This is where ANFIS comes into play: Under certain circumstances it can automatically determine suitable parameters for the membership functions. This is the case in particular when we already have a set of input and related output variables and values. Like in an artificial neural network the ANFIS system is able to adapt its nodes and connections between them "automatically".
To your question: you could of course create an ANFIS system for your desease classification, as long as you already have input and output data for system training available. But its not necessarily tied to such systems, you can see ANFIS more an approach usable under the mentioned circumstances, than a tool for a specific problem. It all depends on the requirements for the system you want to create, as well as the known (external) preconditions...
Hope that helps!
As Matthias said ANFIS is not mapped to a particular problem, you can use it on the basis of problem requirement. But where to use ANFIS: You can use it with any problem where something is ambiguous.
Actually this is the property of FIS(Fuzzy Inference System). Adaptive come in role as Matthias explained.
For ex. took famous classification problem, classifying a input to any class is not always perfectly determined, it somewhat ambiguous. So there using ANFIS may give better results then other classification algorithms depending upon whether you are able to model the system correctly or not using ANFIS.
But using ANFIS is computationally expensive as compared to other non-fuzzy approches. As to make FIS to perfect model your problem you will add AN part to it. This only make membership function selection adaptive. What about if-then rules. For that you have to do unsupervised rule selection from the complete possible rule base(this is basically a kind of unsupervised clustering problem, where you are trying to group all the rules whose effect would be same).
So far I have found a university 'Monash' that explains (based on the guide of Matlabs's Fuzzy Logic Toolbox) ANFIS.
The fuzzy inference system that we have considered is a model that maps:
input characteristics to input
membership functions
input
membership function to rules
rules
to a set of output characteristics
output characteristics to output
membership functions
the output membership function to a
single-valued output, or, a decision
associated with the output.
Yes it can be used for Diseases Classification.
Since the idea of ANFIS is combine fuzzy system in architecture of ANN. In this case, ANFIS have two main benefit.
first, you can use fuzzy variable which is support for Linguistic variable and it's fit for Diseases's symptoms that are commonly used as system's input (example of input >> pain levels : low, mid, high).
Second, since the architecture is mapped to ANN layers, ANFIS can do training process which aims to create more accurate result (ex : use Backpropagation method).

Best way to automate testing of AI algorithms?

I'm wondering how people test artificial intelligence algorithms in an automated fashion.
One example would be for the Turing Test - say there were a number of submissions for a contest. Is there any conceivable way to score candidates in an automated fashion - other than just having humans test them out.
I've also seen some data sets (obscured images of numbers/letters, groups of photos, etc) that can be fed in and learned over time. What good resources are out there for this.
One challenge I see: you don't want an algorithm that tailors itself to the test data over time, since you are trying to see how well it does in the general case. Are there any techniques to ensure it doesn't do this? Such as giving it a random test each time, or averaging its results over a bunch of random tests.
Basically, given a bunch of algorithms, I want some automated process to feed it data and see how well it "learned" it or can predict new stuff it hasn't seen yet.
This is a complex topic - good AI algorithms are generally the ones which can generalize well to "unseen" data. The simplest method is to have two datasets: a training set and an evaluation set used for measuring the performances. But generally, you want to "tune" your algorithm so you may want 3 datasets, one for learning, one for tuning, and one for evaluation. What defines tuning depends on your algorithm, but a typical example is a model where you have a few hyper-parameters (for example parameters in your Bayesian prior under the Bayesian view of learning) that you would like to tune on a separate dataset. The learning procedure would already have set a value for it (or maybe you hardcoded their value), but having enough data may help so that you can tune them separately.
As for making those separate datasets, there are many ways to do so, for example by dividing the data you have available into subsets used for different purposes. There is a tradeoff to be made because you want as much data as possible for training, but you want enough data for evaluation too (assuming you are in the design phase of your new algorithm/product).
A standard method to do so in a systematic way from a known dataset is cross validation.
Generally when it comes to this sort of thing you have two datasets - one large "training set" which you use to build and tune the algorithm, and a separate smaller "probe set" that you use to evaluate its performance.
#Anon has the right of things - training and what I'll call validation sets. That noted, the bits and pieces I see about developments in this field point at two things:
Bayesian Classifiers: there's something like this probably filtering your email. In short you train the algorithm to make a probabilistic decision if a particular item is part of a group or not (e.g. spam and ham).
Multiple Classifiers: this is the approach that the winning group involved in the Netflix challenge took, whereby it's not about optimizing one particular algorithm (e.g. Bayesian, Genetic Programming, Neural Networks, etc..) by combining several to get a better result.
As for data sets Weka has several available. I haven't explored other libraries for data sets, but mloss.org appears to be a good resource. Finally data.gov offers a lot of sets that provide some interesting opportunities.
Training data sets and test sets are very common for K-means and other clustering algorithms, but to have something that's artificially intelligent without supervised learning (which means having a training set) you are building a "brain" so-to-speak based on:
In chess: all possible future states possible from the current gameState.
In most AI-learning (reinforcement learning) you have a problem where the "agent" is trained by doing the game over and over. Basically you ascribe a value to every state. Then you assign an expected value of each possible action at a state.
So say you have S states and a actions per state (although you might have more possible moves in one state, and not as many in another), then you want to figure out the most-valuable states from s to be in, and the most valuable actions to take.
In order to figure out the value of states and their corresponding actions, you have to iterate the game through. Probabilistically, a certain sequence of states will lead to victory or defeat, and basically you learn which states lead to failure and are "bad states". You also learn which ones are more likely to lead to victory, and these are subsequently "good" states. They each get a mathematical value associated, usually as an expected reward.
Reward from second-last state to a winning state: +10
Reward if entering a losing state: -10
So the states that give negative rewards then give negative rewards backwards, to the state that called the second-last state, and then the state that called the third-last state and so-on.
Eventually, you have a mapping of expected reward based on which state you're in, and based on which action you take. You eventually find the "optimal" sequence of steps to take. This is often referred to as an optimal policy.
It is true of the converse that normal courses of actions that you are stepping-through while deriving the optimal policy are called simply policies and you are always implementing a certain "policy" with respect to Q-Learning.
Usually the way of determining the reward is the interesting part. Suppose I reward you for each state-transition that does not lead to failure. Then the value of walking all the states until I terminated is however many increments I made, however many state transitions I had.
If certain states are extremely unvaluable, then loss is easy to avoid because almost all bad states are avoided.
However, you don't want to discourage discovery of new, potentially more-efficient paths that don't follow just this-one-works, so you want to reward and punish the agent in such a way as to ensure "victory" or "keeping the pole balanced" or whatever as long as possible, but you don't want to be stuck at local maxima and minima for efficiency if failure is too painful, so no new, unexplored routes will be tried. (Although there are many approaches in addition to this one).
So when you ask "how do you test AI algorithms" the best part is is that the testing itself is how many "algorithms" are constructed. The algorithm is designed to test a certain course-of-action (policy). It's much more complicated than
"turn left every half mile"
it's more like
"turn left every half mile if I have turned right 3 times and then turned left 2 times and had a quarter in my left pocket to pay fare... etc etc"
It's very precise.
So the testing is usually actually how the A.I. is being programmed. Most models are just probabilistic representations of what is probably good and probably bad. Calculating every possible state is easier for computers (we thought!) because they can focus on one task for very long periods of time and how much they remember is exactly how much RAM you have. However, we learn by affecting neurons in a probabilistic manner, which is why the memristor is such a great discovery -- it's just like a neuron!
You should look at Neural Networks, it's mindblowing. The first time I read about making a "brain" out of a matrix of fake-neuron synaptic connections... A brain that can "remember" basically rocked my universe.
A.I. research is mostly probabilistic because we don't know how to make "thinking" we just know how to imitate our own inner learning process of try, try again.

Resources