Vetting Second Hand Knowledge in an Ontology - artificial-intelligence

How would you assign objective certainties to statements asserted by different users in an ontology?
For example, consider User A asserts "Bob's hat is blue" while User B asserts "Bob's hat is red". How would you determine if:
User A and User B are referring to different people named Bob, and may or may not be correct.
Both users are referring to the same person, but User A is right and User B is mistaken (or vice versa).
Both users are referring to the same person, but User A is right and User B is lying (or vice versa).
Both users are referring to the same person, and both uses are either mistaken or lying.
The main difficulty I see is the ontology doesn't have any way to obtain first-hand data (e.g. it can't ask Bob what color his hat is).
I realize there's probably no entirely objective way to resolve this. Are there any heuristics that could be employed? Does this problem have a formal name?

I'm not an expert in this field, but I've worked a bit with uncertainties in ontologies and the Semantic Web. There are, of course, approaches to this problem that have nothing to do with the semantic web, but my knowledge ends there.
Two problems that I feel are connected with your question are the Identity Crisis and the URI crisis. Formal representations of the statements above can be issued in RDF (Resource Description Framework).
If I convert the statementss "Bob's hat is blue/red" into triples, this would be:
Fact 1:
X isA Person
X hasName "Bob"
X possesses H1
H1 isA Hat
H1 hasColor Blue
Fact 2:
Y isA Person
Y hasName "Bob"
Y possesses H2
H2 isA Hat
H2 hasColor Red
The problem here is that X, Y, H1 and H2 are resources, which may or may not be the same. So in your example it is unknown if X and Y are the same person or distinct and you can't know without further information. (Same holds for the hats.)
However, the problem is more complex, because User A and B just stated those things, so they are no "real" facts. RDF offer the method of Reification for this, but I won't write this down here completely, it would be too long. What you basically would do is add an "UserA statesThat (...)" to every above mentioned statement.
If you have all this, you can start reasoning. At the university we once used RACER for this kind of stuff, but that was an old version and I'm not familiar with the current one.
Of Course, you can do that stuff without RDF as well, e.g., in LISP.
Hope it helped.

I have heard this kind of thing being referred to as information fusion, mirroring the idea of data fusion. I don't know much about it but it seems there are conferences on the subject.
I'd also add another difficulty here, that of distinguishing between objective and subjective information. If user A says 'Bob is a nice guy' and user B says 'Bob is not a nice guy' then they can both be right while asserting seemingly opposing statements.

Step 1: Make some assumptions. Otherwise, you have nothing to base anything on. A possible assumption would be, "If bob's hat is red, there is a 90% that User A will say his hat is red."
Step 2: Apply relevant math. To relate a conditional probability to its inverse (i.e. to ask the probability that bob's hat is red knowing what A said based on the assumption I proposed), use Bayes Theorem.

Related

What is mean by Events in Artificial Intelligence

While learning about Artificial intelligence I came across this topic events, mental events, mental objects, etc... I searched on internet but I can't able to understand the meaning can someone explain this what is mean by events in AI ?
Often in AI an event refers to something happening (being true) at a particular point in time.
This often arises in logic contexts. For example, we may have a logic knowledge base stating
for all X: USPresident(X) => USCitizen(X)
(for all objects X who are the president of the United States, X is a US citizen).
However, the knowledge base above is insufficient if you are reasoning about multiple points in time, because then who is the US president or even who is a US citizen depends on the moment you are talking about. In that case, it may be best to represent the rule as
for all X, for all t: USPresident(X, t) => USCitizen(X, t)
(for all objects X and for all time points t, if X is the president of the United States at time t, then X is a US citizen at time t)
In the latter setting a system could potentially reason about the fact that Reagan was president in 1985 but Obama was not, but in 20010 the opposite was true, etc.
This is particularly important in subjects such as Situation Calculus and Event Calculus.
However, "event" is really just an English word and interpreted in many different ways, even within AI. For example, in Natural Language Processing one can work on the "event detection" problem which consists of automatically detecting descriptions of events in English within a document. In any case, those meanings of "event" are normally associated with the idea of something happening at a certain point in time.

Horst (pD*) compared to OWL2-RL

We are using GraphDB 8.4.0 as a triple store for a large data integration project. We want to make use of the reasoning capabilities, and are trying to decide between using HORST (pD*) and OWL2-RL.
However, the literature describing these is quite dense. I think I understand that OWL2-RL is more "powerful", but I am unsure how so. Can anyone provide some examples of the differences between the two reasoners? What kinds of statements are inferred by OWL2-RL, but not HORST (and vice versa)?
Brill, inside there GraphDB there is a single rule engine, which supports different reasoning profiles, depending on the rule-set which was selected. The predefined rule-sets are part of the distribution - look at the PIE files in folder configs/rules. One can also take one of the existing profiles and tailor it to her needs (e.g. remove a rule, which is not necessary).
The fundamental difference between OWL2 RL and what we call OWL-Horst (pD*) is that OWL2RL pushes the limits of which OWL constructs can be supported using this type of entailment rules. OWL Horst is limited to RDFS (subClassOf, subSpropertyOf, domain and range) plus what was popular in the past as OWL Lite: sameAs, equivalentClass, equivalentProperty, SymmetricProperty, TransitiveProperty, inverseOf, FunctionalProperty, InverseFunctionalProperty. There is also partial support for: intersectionOf, someValuesFrom, hasValue, allValuesFrom.
What OWL 2 RL adds on top is support for AsymmetricProperty, IrreflexiveProperty, propertyChainAxiom, AllDisjointProperties, hasKey, unionOf, complementOf, oneOf, differentFrom, AllDisjointClasses and all the property cardinality primitives. It also adds more complete support for intersectionOf, someValuesFrom, hasValue, allValuesFrom. Be aware that there are limitations to the inference supported by OWL 2 RL for some of these properties, e.g. what type of inferences should or should not be done for specific class expressions (OWL restrictions). If you chose OWL 2 RL, check Tables 5-8 in the spec, https://www.w3.org/TR/owl2-profiles/#OWL_2_RL. GraphDB's owl-2-rl data set is fully compliant with it. GraphDB is the only major triplestore with full OWL 2 RL compliance - see the this table (https://www.w3.org/2001/sw/wiki/OWL/Implementations) it appears with its former name OWLIM.
My suggestion would be to go with OWL Horst for a large dataset, as reasoning with OWL 2 RL could be much slower. It depends on your ontology and data patterns, but as a rule of thumb you can expect loading/updates to be 2 times slower with OWL 2 RL, even if you don't use extensively its "expensive" primitives (e.g. property chains). See the difference between loading speeds with RDFS+ and OWL 2 RL benchmarked here: http://graphdb.ontotext.com/documentation/standard/benchmark.html
Finally, I would recommend you to use the "optimized" versions of the pre-defined rule-sets. These versions exclude some RDFS reasoning rules, which are not useful for most of the applications, but add substantial reasoning overheads, e.g. the one that infers that the subject, the predicate and the object of a statement are instances of rdfs:Resource
Id: rdf1_rdfs4a_4b
x a y
-------------------------------
x <rdf:type> <rdfs:Resource>
a <rdf:type> <rdfs:Resource>
y <rdf:type> <rdfs:Resource>
If you want to stay 100% compliant with the W3C spec, you should stay with the non-optimized versions.
If you need further assistance, please, write to support#ontotext.com
In addition to what Atanas (our CEO) wrote and your excellent example, http://graphdb.ontotext.com/documentation/enterprise/rules-optimisations.html provides some ideas how to optimize your rulesets to make them faster.
Two of the ideas are:
ptop:transitiveOver is faster than owl:TransitiveProperty: quadratic vs cubic complexity over the length of transitive chains
ptop:PropChain (a 2-place chain) is faster than general owl:propertyChainAxiom (n-place chain) because it doesn't need to unroll the rdf:List underlying the representation of owl:propertyChainAxiom.
Under some conditions you can translate the standard OWL constructs to these custom constructs, to have both standards compliance and faster speed:
use rule TransitiveUsingStep; if every TransitiveProperty p (eg skos:broaderTransitive) is defined over a step property s (eg skos:broader) and you don't insert p directly
if you use only 2-step owl:propertyChainAxiom then translate them to custom using the following rule, and infer using rule ptop_PropChain:
Id: ptop_PropChain_from_propertyChainAxiom
q <owl:propertyChainAxiom> l1
l1 <rdf:first> p1
l1 <rdf:rest> l2
l2 <rdf:first> p2
l2 <rdf:rest> <rdf:nil>
----------------------
t <ptop:premise1> p1
t <ptop:premise2> p2
t <ptop:conclusion> q
t <rdf:type> <ptop:PropChain>
http://rawgit2.com/VladimirAlexiev/my/master/pubs/extending-owl2/index.html describes further ideas for extended property constructs, and has illustrations.
Let us know if we can help further.
After thinking this for bit, I came up with a concrete example. The Oral Health and Disease Ontology (http://purl.obolibrary.org/obo/ohd.owl) contains three interrelated entities:
a restored tooth
a restored tooth surface that is part of the restored tooth
a tooth restoration procedure that creates the restored tooth surface (e.g., when you have a filling placed in your tooth)
The axioms that define these entities are (using pseudo Manchester syntax):
restored tooth equivalent to Tooth and has part some dental restoration material
restored tooth surface subclass of part of restored tooth
filling procedure has specified output some restored tooth surface
The has specified output relation is a subproperty of the has participant relation, and the has participant relation contains the property chain:
has_specified_input o 'is part of'
The reason for this property chain is for reasoner to infer that if a tooth surface participates in a procedure, then the whole tooth that the surface is part of participates in the procedure, and, furthermore, the patient that the tooth is part of also participates in the procedure (due to the transitivity of part of).
As a concrete example, let define some individuals (using pseudo rdf):
:patient#1 a :patient .
:tooth#1 a :tooth; :part-of :patient#1
:restored-occlusal#1 a :restored-occlusal-surface; :part-of tooth#1 .
:procedure#1 :has-specified-output :restored-occlusal#1 .
Suppose you want to query for all restored teeth:
select ?tooth where {?tooth a :restored-tooth}
RDFS-Plus reasoning will not find any restored teeth b/c it doesn't reason over equivalent classes. But, OWL-Horst and OWL2-RL will find such teeth.
Now, suppose you want to find all patients that underwent (i.e. participated in) a tooth restoration procedure:
select ?patient where {
?patient a patient: .
?proc a tooth_restoration_procedure:; has_participant: ?patient .
}
Again, RDFS-Plus will not find any patients, and neither will OWL-Horst b/c this inference requires reasoning over the has participant property chain. OWL2-RL is needed in order to make this inference.
I hope this example is helpful for the interested reader :).
I welcome comments to improve the example. Please keep any comments within the scope of trying to make the example clearer. Its purpose is to give insight into what these different levels of reasoning do and not to give advice about which reasoner is most appropriate.

parsing text of a yes/no query

I am automating a process which asks questions (via SMS but shouldn't matter) to real people. The questions have yes/no answers, but the person might respond in a number of ways such as: sure, not at this time, yeah, never or in any other way that they might. I would like to attempt to parse this text and determine if it was a yes or no answer (of course it might not always be right).
I figured the ideas and concepts to do this might already exist as it seems like a common task for an AI, but don't know what it might be called so I can't find information on how I might implement it. So my questions is, have algorithms been developed to do this kind of parsing and if so where can I find more information on how to implement them?
This can be viewed as a binary (yes or no) classification task. You could write a rule-based model to classify or a statistics-based model.
A rule-based model would be like if answer in ["never", "not at this time", "nope"] then answer is "no". When spam filters first came out they contained a lot of rules like these.
A statistics-based model would probably be more suitable here, as writing your own rules gets tiresome and does not handle new cases as well.
For this you need to label a training dataset. After a little preprocessing (like lowercasing all the words, removing punctuation and maybe even a little stemming) you could get a dataset like
0 | never in a million years
0 | never
1 | yes sir
1 | yep
1 | yes yes yeah
0 | no way
Now you can run classification algorithms like Naive Bayes or Logistic Regression over this set (after you vectorize the words in either binary, which means is the word present or not, word count, which means the term frequency, or a tfidf float, which prevent bias to longer answers and common words) and learn which words more often belong to which class.
In the above example yes would be strongly correlated to a positive answer (1) and never would be strongly related to a negative answer (0). You could work with n-grams so a not no would be treated as a single token in favor of the positive class. This is called the bag-of-words approach.
To combat spelling errors you can add a spellchecker like Aspell to the pre-processing step. You could use a charvectorizer too, so a word like nno would be interpreted as nn and no and you catch errors like hellyes and you could trust your users to repeat spelling errors. If 5 users make the spelling error neve for the word never then the token neve will automatically start to count for the negative class (if labeled as such).
You could write these algorithms yourself (Naive Bayes is doable, Paul Graham has wrote a few accessible essays on how to classify spam with Bayes Theorem and nearly every ML library has a tutorial on how to do this) or make use of libraries or programs like Scikit-Learn (MultinomialNB, SGDclassifier, LinearSVC etc.) or Vowpal Wabbit (logistic regression, quantile loss etc.).
Im thinking on top of my head, if you get a response which you dont know if its yes / no, you can keep the answers in a DB like unknown_answers and 2 more tables as affirmative_answers / negative_answers, then in a little backend system, everytime you get a new unknown_answer you qualify them as yes or no, and there the system "learns" about it and with time, you will have a very big and good database of affirmative / negative answers.

How do I handle uncertainty/missing data in an Artifical Neural Network?

The context:
I'm experimenting with using a feed-forward artificial neural network to create AI for a video game, and I've run into the problem that some of my input features are dependent upon the existence or value of other input features.
The most basic, simplified example I can think of is this:
feature 1 is the number of players (range 2...5)
feature 2 to ? is the score of each player (range >=0)
The number of features needed to inform the ANN of the scores is dependent on the number of players.
The question: How can I represent this dynamic knowledge input to an ANN?
Things I've already considered:
Simply not using such features, or consolidating them into static input.
I.E using the sum of the players scores instead. I seriously doubt this is applicable to my problem, it would result in the loss of too much information and the ANN would fail to perform well.
Passing in an error value (eg -1) or default value (eg 0) for non-existant input
I'm not sure how well this would work, in theory the ANN could easily learn from this input and model the function appropriately. In practise I'm worried about the sheer number of non-existant input causing problems for the ANN. For example if the range of players was 2-10, if there were only 2 players, 80% of the input data would be non-existant and would introduce weird bias into the ANN resulting in a poor performance.
Passing in the mean value over the training set in place on non-existant input
Again, the amount of non-existant input would be a problem, and I'm worried this would introduce weird problems for discrete-valued inputs.
So, I'm asking this, does anybody have any other solutions I could think about? And is there a standard or commonly used method for handling this problem?
I know it's a rather niche and complicated question for SO, but I was getting bored of the "how do I fix this code?" and "how do I do this in PHP/Javascript?" questions :P, thanks guys.
It sounds like you have multiple data sets (for each number of players) that aren't really compatible with each other. Would lessons learned from a 5-player game really apply to a 2-player game? Try simplifying the problem, such as #1, and see how the program performs. In AI, absurd simplifications can sometimes give you a lot of traction, like bag of words in spam filters.
Try thinking about some model like the following:
Say xi (e.g. x1) is one of the inputs that a variable number of can exist. You can have n of these (x1 to xn). Let y be the rest of the inputs.
On your first hidden layer, pass x1 and y to the first c nodes, x1,x2 and y to the next c nodes, x1,x2,x3 and y to the next c nodes, and so on. This assumes x1 and x3 can't both be active without x2. The model will have to change appropriately if this needs to be possible.
The rest of the network is a standard feed-forward network with all nodes connected to all nodes of the next layer, or however you choose.
Whenever you have w active inputs, disable all but the wth set of c nodes (completely exclude them from training for that input set, don't include them when calculating the value for the nodes they output to, don't update the weights for their inputs or outputs). This will allow most of the network to train, but for the first hidden layer, only parts applicable to that number of inputs.
I suggest c is chosen such that c*n (the number of nodes in the first hidden layer) is greater than (or equal to) the number of nodes in the 2nd hidden layer (and have c be at the very least 10 for a moderately sized network (into the 100s is also fine)) and I also suggest the network have at least 2 other hidden layers (so 3 in total excluding input and output). This is not from experience, but just what my intuition tells me.
This working is dependent on a certain (possibly undefinable) similarity between the different numbers of inputs, and might not work well, if at all, if this similarity doesn't exist. This also probably requires quite a bit of training data for each number of inputs.
If you try it, let me / us know if it works.
If you're interested in Artificial Intelligence discussions, I suggest joining some Linked-In group dedicated to it, there are some that are quite active and have interesting discussions. There doesn't seem to be much happening on stackoverflow when it comes to Artificial Intelligence, or maybe we should just work to change that, or both.
UPDATE:
Here is a list of the names of a few decent Artificial Intelligence LinkedIn groups (unless they changed their policies recently, it should be easy enough to join):
'Artificial Intelligence Researchers, Faculty + Professionals'
'Artificial Intelligence Applications'
'Artificial Neural Networks'
'AGI — Artificial General Intelligence'
'Applied Artificial Intelligence' (not too much going on at the moment, and still dealing with some spam, but it is getting better)
'Text Analytics' (if you're interested in that)

Logic programming with integer or even floating point domains

I am reading a lot about logic programming - ASP (Answer Set Programming) is one example or this. They (logic programs) are usually in the form:
[Program 1]
Rule1: a <- a1, a2, ..., not am, am+1;
Rule2: ...
This set of rules is called the logic program and the s.c. model is the result of such computation - some kind of assignment of True/False values to each of a1, a2, ...
There is lot of research going on - e.g. how such kind of programs (rules) can be integrated with the (semantic web) ontologies to build knowledge bases that contain both - rules and ontologies (some kind of constraints/behaviour and data); there is lot of research about ASP itself - like parallel extensions, extensions for probabilistic logic, for temporal logic and so on.
My question is - is there some kind of research and maybe some proof-of-concept projects where this analysis is extended from Boolean variables to variables with integer and maybe even float domains? Currently I have not found any research that could address the following programs:
[Program 2]
Rule1 a1:=5 <- a2=5, a3=7, a4<8, ...
Rule2 ...
...
[the final assignment of values to a1, a2, etc., is the solution of this program]
Currently - as I understand - if one could like to perform some kind of analysis on Program-2 (e.g. to find if this program is correct in some sense - e.g. if it satisfies some properties, if it terminates, what domains are allowed not to violate some kind of properties and so on), then he or she must restate Program-2 in terms of Program-1 and then proceed in way which seems to be completely unexplored - to my knowledge (and I don't believe that is it unexplored, simply - I don't know some sources or trend). There is constraint logic programming that allow the use of statements with inequalities in Program-1, but it is too focused on Boolean variables as well. Actually - Programm-2 is of kind that can be fairly common in business rules systems, that was the cause of my interest in logic programming.
SO - my question has some history - my practical experience has led me to appreciate business rules systems/engines, especially - JBoss project Drools and it was my intention to do some kind of research of theory underlying s.c. production rules systems (I was and I am planning to do my thesis about them - if I would spot what can be done here), but I can say that there is little to do - after going over the literature (e.g. http://www.computer.org/csdl/trans/tk/2010/11/index.html was excellent IEEE TKDE special issues with some articles about them, one of them was writter by Drools leader) one can see that there is some kind of technical improvements of the decades old Rete algorithm but there is no theory of Drools or other production rule systems that could help with to do some formal analysis about them. So - the other question is - is there theory of production rule systems (for rule engines like Drools, Jess, CLIPS and so on) and is there practical need for such theory and what are the practical issues of using Drools and other systems that can be addressed by the theory of production rule systems.
p.s. I know - all these are questions that should be directed to thesis advisor, but my current position is that there is no (up to my knowledge) person in department where I am enrolled with who could fit to answer them, so - I am reading journals and also conference proceedings (there are nice conference series series of Lecture Notes in Computer Science - RuleML and RR)...
Thanks for any hint in advance!
In a sense the boolean systems already do what you suggest.
to ensure A=5 is part of your solution, consider the rules (I forget my ASP syntax so bear with me)
integer 1..100 //integers 1 to 100 exist
1{A(X) : integer(X)}1 //there is one A(X) that is true, where X is an integer
A(5) //A(5) is true
and I think your clause would require:
integer 1..100 //integers 1 to 100 exist
1{A(X) : integer(X)}1 //A1 can take only one value and must take a value
1{B(X) : integer(X)}1 //A2 ``
1{C(X) : integer(X)}1 //A3 ``
1{D(X) : integer(X)}1 //A4 ``
A(5) :- B(5), C(7), D(8) //A2=5, A3=7, A4=8 ==> A1=5
I hope I've understood the question correctly.
Recent versions of Clojure core.logic (since 0.8) include exactly this kind of support, based on cKanren
See an example here: https://gist.github.com/4229449

Resources