Defining a class restriction using relationship between two data properties in Protege - owl

I am working on building a simple software ontology in Protege v5.2 and I am trying to classify pieces of software (using a reasoner plugin) as CPU intensive if their CPU time is larger than 80% of their physical execution time.
For this reason, each individual has the following data properties filled with float values:
a) hasCPUTime
b) hasPhysicalExecutionTime
I have created a class CPUIntensive and I want to add a restriction that individuals which have hasCPUTime > 0.8 * hasPhysicalExecutionTime belong in this class.
Can this be done in Protege?

OWL 2 doesn't allow arithmetic computations, e.g. multiplication (though some kind of comparison is possible using data ranges).
You need SWRL with builtins:
hasCPUTime (?ind, ?cpu) ^
hasPhysicalTime (?ind, ?phy) ^
swrlb:greaterThan (?cpu, ?mul) ^
swrlb:multiply (?mul, 0.8, ?phy)
-> CPUIntensive(?ind)
The swrlb:multiply builtin is satisfied iff the first argument is equal to the arithmetic product of the second argument through the last argument, and if the first argument is unbound, binds it to the arithmetic product of them, much like Mul is 0.8*Phy. works in Prolog.
Pellet does support those builtins:

Related

Horst (pD*) compared to OWL2-RL

We are using GraphDB 8.4.0 as a triple store for a large data integration project. We want to make use of the reasoning capabilities, and are trying to decide between using HORST (pD*) and OWL2-RL.
However, the literature describing these is quite dense. I think I understand that OWL2-RL is more "powerful", but I am unsure how so. Can anyone provide some examples of the differences between the two reasoners? What kinds of statements are inferred by OWL2-RL, but not HORST (and vice versa)?
Brill, inside there GraphDB there is a single rule engine, which supports different reasoning profiles, depending on the rule-set which was selected. The predefined rule-sets are part of the distribution - look at the PIE files in folder configs/rules. One can also take one of the existing profiles and tailor it to her needs (e.g. remove a rule, which is not necessary).
The fundamental difference between OWL2 RL and what we call OWL-Horst (pD*) is that OWL2RL pushes the limits of which OWL constructs can be supported using this type of entailment rules. OWL Horst is limited to RDFS (subClassOf, subSpropertyOf, domain and range) plus what was popular in the past as OWL Lite: sameAs, equivalentClass, equivalentProperty, SymmetricProperty, TransitiveProperty, inverseOf, FunctionalProperty, InverseFunctionalProperty. There is also partial support for: intersectionOf, someValuesFrom, hasValue, allValuesFrom.
What OWL 2 RL adds on top is support for AsymmetricProperty, IrreflexiveProperty, propertyChainAxiom, AllDisjointProperties, hasKey, unionOf, complementOf, oneOf, differentFrom, AllDisjointClasses and all the property cardinality primitives. It also adds more complete support for intersectionOf, someValuesFrom, hasValue, allValuesFrom. Be aware that there are limitations to the inference supported by OWL 2 RL for some of these properties, e.g. what type of inferences should or should not be done for specific class expressions (OWL restrictions). If you chose OWL 2 RL, check Tables 5-8 in the spec, https://www.w3.org/TR/owl2-profiles/#OWL_2_RL. GraphDB's owl-2-rl data set is fully compliant with it. GraphDB is the only major triplestore with full OWL 2 RL compliance - see the this table (https://www.w3.org/2001/sw/wiki/OWL/Implementations) it appears with its former name OWLIM.
My suggestion would be to go with OWL Horst for a large dataset, as reasoning with OWL 2 RL could be much slower. It depends on your ontology and data patterns, but as a rule of thumb you can expect loading/updates to be 2 times slower with OWL 2 RL, even if you don't use extensively its "expensive" primitives (e.g. property chains). See the difference between loading speeds with RDFS+ and OWL 2 RL benchmarked here: http://graphdb.ontotext.com/documentation/standard/benchmark.html
Finally, I would recommend you to use the "optimized" versions of the pre-defined rule-sets. These versions exclude some RDFS reasoning rules, which are not useful for most of the applications, but add substantial reasoning overheads, e.g. the one that infers that the subject, the predicate and the object of a statement are instances of rdfs:Resource
Id: rdf1_rdfs4a_4b
x a y
-------------------------------
x <rdf:type> <rdfs:Resource>
a <rdf:type> <rdfs:Resource>
y <rdf:type> <rdfs:Resource>
If you want to stay 100% compliant with the W3C spec, you should stay with the non-optimized versions.
If you need further assistance, please, write to support#ontotext.com
In addition to what Atanas (our CEO) wrote and your excellent example, http://graphdb.ontotext.com/documentation/enterprise/rules-optimisations.html provides some ideas how to optimize your rulesets to make them faster.
Two of the ideas are:
ptop:transitiveOver is faster than owl:TransitiveProperty: quadratic vs cubic complexity over the length of transitive chains
ptop:PropChain (a 2-place chain) is faster than general owl:propertyChainAxiom (n-place chain) because it doesn't need to unroll the rdf:List underlying the representation of owl:propertyChainAxiom.
Under some conditions you can translate the standard OWL constructs to these custom constructs, to have both standards compliance and faster speed:
use rule TransitiveUsingStep; if every TransitiveProperty p (eg skos:broaderTransitive) is defined over a step property s (eg skos:broader) and you don't insert p directly
if you use only 2-step owl:propertyChainAxiom then translate them to custom using the following rule, and infer using rule ptop_PropChain:
Id: ptop_PropChain_from_propertyChainAxiom
q <owl:propertyChainAxiom> l1
l1 <rdf:first> p1
l1 <rdf:rest> l2
l2 <rdf:first> p2
l2 <rdf:rest> <rdf:nil>
----------------------
t <ptop:premise1> p1
t <ptop:premise2> p2
t <ptop:conclusion> q
t <rdf:type> <ptop:PropChain>
http://rawgit2.com/VladimirAlexiev/my/master/pubs/extending-owl2/index.html describes further ideas for extended property constructs, and has illustrations.
Let us know if we can help further.
After thinking this for bit, I came up with a concrete example. The Oral Health and Disease Ontology (http://purl.obolibrary.org/obo/ohd.owl) contains three interrelated entities:
a restored tooth
a restored tooth surface that is part of the restored tooth
a tooth restoration procedure that creates the restored tooth surface (e.g., when you have a filling placed in your tooth)
The axioms that define these entities are (using pseudo Manchester syntax):
restored tooth equivalent to Tooth and has part some dental restoration material
restored tooth surface subclass of part of restored tooth
filling procedure has specified output some restored tooth surface
The has specified output relation is a subproperty of the has participant relation, and the has participant relation contains the property chain:
has_specified_input o 'is part of'
The reason for this property chain is for reasoner to infer that if a tooth surface participates in a procedure, then the whole tooth that the surface is part of participates in the procedure, and, furthermore, the patient that the tooth is part of also participates in the procedure (due to the transitivity of part of).
As a concrete example, let define some individuals (using pseudo rdf):
:patient#1 a :patient .
:tooth#1 a :tooth; :part-of :patient#1
:restored-occlusal#1 a :restored-occlusal-surface; :part-of tooth#1 .
:procedure#1 :has-specified-output :restored-occlusal#1 .
Suppose you want to query for all restored teeth:
select ?tooth where {?tooth a :restored-tooth}
RDFS-Plus reasoning will not find any restored teeth b/c it doesn't reason over equivalent classes. But, OWL-Horst and OWL2-RL will find such teeth.
Now, suppose you want to find all patients that underwent (i.e. participated in) a tooth restoration procedure:
select ?patient where {
?patient a patient: .
?proc a tooth_restoration_procedure:; has_participant: ?patient .
}
Again, RDFS-Plus will not find any patients, and neither will OWL-Horst b/c this inference requires reasoning over the has participant property chain. OWL2-RL is needed in order to make this inference.
I hope this example is helpful for the interested reader :).
I welcome comments to improve the example. Please keep any comments within the scope of trying to make the example clearer. Its purpose is to give insight into what these different levels of reasoning do and not to give advice about which reasoner is most appropriate.

Best way to represent part-of (mereological) transitivity for OWL classes?

I have a background in frame-based ontologies, in which classes represent concepts and there's no restriction against assertion of class:class relationships.
I am now working with an OWL2 ontology and trying to determine the best/recommended way to represent "canonical part-of" relationships - conceptually, these are relationships that are true, by definition, of the things represented by each class (i.e., all instances). The part-of relationship is transitive, and I want to take advantage of that so that I'd be able to query the ontology for "all parts of (a canonical) X".
For example, I might want to represent:
"engine" is a part of "car", and
"piston" is a part of "engine"
and then query transitively, using SPARQL, for parts of cars, getting back both engine and piston. Note that I want to be able to represent individual cars later (and be able to deduce their parts by reference to their rdf:type), and of course I want to be able to represent sub-classes of cars as well, so I can't model the above-described classes as individuals - they must be classes.
It seems that I have 3 options using OWL, none ideal. Is one of these recommended (or are any discouraged), and am I missing any?
OWL restriction axioms:
rdfs:subClassOf(engine, someValuesFrom(partOf, car))
rdfs:subClassOf(piston, someValuesFrom(partOf, engine))
The major drawback of the above is that there's no way in SPARQL to query transitively over the partOf relationship, since it's embedded in an OWL restriction. I would need some kind of generalized recursion feature in SPARQL - or I would need the following rule, which is not part of any standard OWL profile as far as I can tell:
antecedent (body):
subClassOf(B, (P some A) ^
subClassOf(C, (P some B) ^
transitiveProperty(P)
consequent (head):
subClassOf(C, (P some A))
OWL2 punning: I could effectively represent the partOf relationships on canonical instances of the classes, by asserting the object-property directly on the classes. I'm not sure that that'd work transparently with SPARQL though, since the partOf relationships would be asserted on instances (via punning) and any subClassOf relationships would be asserted on classes. So if I had, for example, a subclass six_cylinder_engine, the following SPARQL snippet would not bind six_cylinder_engine:
?part (rdfs:subClassOf*/partOf*)+ car
Annotation property: I could assert partOf as an annotation property on the classes, with values that are also classes. I think that would work (minus transitivity, but I could recover that easily enough with SPARQL paths as in the query above), but it seems like an abuse of the intended use of annotation properties.
I think you have performed a good analysis of the problem and the advantages/disadvantages of different approaches. I don't know if any one is discouraged or encouraged. IMHO this problem has not received sufficient attention, and is a bigger problem in some domains than others (I work in bio-ontologies which frequently use partonomies, and hence this is very important).
For 1, your rule is valid and justified by OWL semantics. There are other ways to implement this using OWL reasoners, as well as RDF-level reasoners. For example, using the ROBOT command line wrapper to the OWLAPI, you can run the reason command using an Expression Materializing Reasoner. E.g
robot reason --exclude-tautologies true --include-indirect true -r emr -i engine.owl -o engine-reasoned.owl
This will give you an axiom piston subClassOf partOf some car that can be queried using a non-transitive SPARQL query.
The --exclude-tautologies removes inferences to owl:Thing, and --include-indirect will include transitive inferences.
For your option 2, you have to be careful in that you may introduce incorrect inferences. For example, assume there are some engines without pistons, i.e. engine SubClassOf inverse(part_of) some piston does not hold. However, in your punned shadow world, this would be entailed. This may or may not be a problem depending on your use case.
A variant of your 2 is to introduce different mapping rules for layering OWL T-Tboxes onto RDF, such as described in my OWLStar proposal. With this proposal, existentials would be mapped to direct triples, but there is another mechanism (e.g. reification) to indicate the intended quantification. This allows writing rules that are both safe (no undesired inferences) and complete (for anything expressible in OWL-RL). Here there is no punning (under the alternative RDF to OWL interpretation). You can also use the exact same transitive SPARQL query you wrote to get the desired results.

How to efficiently compute disjointness with OWLAPI?

I'm profiling my application that is OWLAPI based and the only bottleneck I found was about computing disjointness.
I have to check if each class is disjoint from other classes and, if this is asserted or inferred.
It seems to be heavy to compute, because unlike for the equivalence which is based on the Node data structure (and it is efficient to retrieve data), the disjointness is based on the NodeSet in this way I'm forced to perform more loops.
This is the procedure I use:
private void computeDisjointness(OWLClass clazz) {
NodeSet<OWLClass> disjointSetsFromCls = reasoner.getDisjointClasses(clazz);
for (Node<OWLClass> singleDisjoinSet : disjointSetsFromCls) {
for (OWLClass item : singleDisjoinSet) {
for (OWLDisjointClassesAxiom disjAxiom : ontology.getAxioms(AxiomType.DISJOINT_CLASSES)) {
if(disjAxiom.containsEntityInSignature(item))
{
//asserted
}
else
{
//derived
}
}
}
}
As you can see, the bottleneck is given by the 3 for loops that slow down the application; moreover, the procedure computeDisjointness is executed for each class of the ontology.
Is there a more efficient way to get the disjointness and check if the axioms are asserted or derived?
One simple optimization is to move ontology.getAxioms(AxiomType.DISJOINT_CLASSES) to the calling method, then pass it in as a parameter. This method returns a new set on each call, with the same contents every time, since you're not modifying the ontology. So if you have N classes you are creating at least N identical sets; more if many classes are actually disjoint.
Optimization number two: check the size of the disjoint node set. Size 1 means no disjoints, so you can skip the rest of the method.
Optimization 3: keep track of the classes you've already visited. E.g., if you have
A disjointWith B
your code will be called on A and cycle over A and B, then be called on B and repeat the computation.
Keep a set of visited classes, to which you add all elements in the disjoint node set, and when it's B turn you'll be able to skip the reasoner call as well. Speaking of which, I would assume the reasoner call is actually the most expensive call in this method. Do you have profiling data that says otherwise?
Optimization 4: I'm not convinced this code reliably tells you which disjoint axioms are inferred and which ones are asserted. You could have:
A disjointWith B
B disjointWith C
The reasoner would return {A, B, C} in response to asking for disjoints of A. You would find all three elements in the signature of a disjoint axiom, and find out that the reasoner has done no inferences. But the axioms in input are not the same as the axioms in output (many reasoners would in fact run absorption on the input axioms and transform them to an internal representation that is an axiom with three operands).
So, my definition of inferred and asserted would be that the set of nodes returned by the reasoner is the same as the set of operands of one disjoint axiom. To verify this condition, I would take all disjoint axioms, extract the set of operands and keep those sets in a set of sets. Then,
for each class (and keeping in place the optimization to visit each class only once),
obtain the set of disjoint nodes (optimization here to not bother if there's only one class in the set),
transform it to a set of entities,
and check whether the set of sets created earlier contains this new set.
If yes, this is an asserted axiom; if not, it's inferred.

Viola - jones adaboost method

I have read the paper about Viola-Jones method for object detection and am confused by a few things.
1 - For Adaboost does each round mean that we calculate all the 160k features across all images then find the one with the least error (which as i understand is a 'weak classifier? please correct me if I am wrong). If yes then won't this take extremely long to train on a large set of images which can take months ? And also how many rounds will you run it for if this is correct.
2- If the above point is wrong then does it mean that for each feature, we evaluate all the non-face and face images with one feature and compare it to a certain acceptable threshold of error, and if it lies below this acceptable threshold then we take this feature as a weak classifier and then update the weights before using the next feature from the 160k features.
I have tried understanding the MATLAB code in this link http://www.ece301.com/ml-doc/54-face-detect-matlab-1.html but not sure if the way he implemented adaboost was correct.
It would also be a great help if there was a link that would explain adaboost used by Viola-Jones in a simple and clear way.
AdaBoost tries out multiple weak classifier over several rounds , Selecting the best weak classifier in each round and combining the best classifier to create a strong classifier.
Example for adaboost :
Data point Classifier 1 Classifier 2 Classifier 3
x1 Fail pass fail
x2 Pass fail pass
x3 fail pass pass
x4 pass fail pass
AdaBoost can use classifier that are consistently wrong by reversing their decision .
1) Yes. If your parameters are, say, 5000 positive windows and 5000 negative ones (extracted from the set of negative background images you provided initially), then at each stage every feature is evaluated in turn for all 10000 windows and the best one is added to the feature set.
And yes in the original paper of Viola-Jones each feature is a weak classifier.
The process of checking all features will take a long time, but not weeks. As a matter of fact the bottle neck is the gathering of negative widows over the last stages than could require days or also weeks The number of rounds depends on the reaching of the given conditions. Each stage requires a number of round (usually more in final stages). So for example: stage 1 requires 3 features (3 rounds), stage 2 requires 5 features (5 rounds), stage 3 requires 6 features (6 rounds), etc.
2) The above point is true so it doesn’t apply.

ai aggregate association rules

If I extract certain association rules from a sample itemset consisting of let's say:
a, b -> c
c, d -> e
a, c -> d
b, c -> c
Is there a way to combine the found rules into one formula depending on a fixed item count number were all rules are aggregated to get the most likely combination of all association rules combined?
Let's say the fixed item number is four and the above association rules have to be mixed to get the most likely combination. How would I do that? Are there algorithms or programmes for this?
Each association rule has a confidence and support.
For example A --> BC support : 50 % confidence : 50%.
If you combine several association rules, then how you would calculate the support and confidence of the resulting rule ? That would be a problem.
Actually, you could look at CBA: Classification by Associations. This project use association rules to perform classification. Instead of trying to combine association rules, it uses some heuristics to select the rule that is the most appropriate for classifying a new instance. To choose the best rule, it considers the support, the confidence and the size of the left par of the rule. There is other similar works.
But I have not seen any work trying to combine association rules... maybe if you search "association rule clustering" in Google you could find something related to your idea.
By the way, besides confidence and support, some people use other interestingness measures like the lift, J-Measure, etc.

Resources