library(samsort) in SICStus Prolog - sicstus-prolog

In SICStus Prolog there's library(samsort),
a library for generic sorting.
The library exports the predicates samsort/2, samsort/3, samkeysort/2, and others.
I can see the use of these predicates, but I'm somewhat puzzled by the name prefix sam.
What is "sam"? Is it some abbreviation, initialism, or acronym?
What does "sam" mean? Is there a story/history behind the name "sam"?

This is Smooth Applicative Merge sort.
Originally from the DEC-10 Prolog library, I think.
It was described in a technical report, which I cannot find, unfortunately:
O’Keefe, R.: A smooth applicative merge sort. Tech. rep., Department of Artificial Intelligence, University of Edinburgh (1982)

Related

Horst (pD*) compared to OWL2-RL

We are using GraphDB 8.4.0 as a triple store for a large data integration project. We want to make use of the reasoning capabilities, and are trying to decide between using HORST (pD*) and OWL2-RL.
However, the literature describing these is quite dense. I think I understand that OWL2-RL is more "powerful", but I am unsure how so. Can anyone provide some examples of the differences between the two reasoners? What kinds of statements are inferred by OWL2-RL, but not HORST (and vice versa)?
Brill, inside there GraphDB there is a single rule engine, which supports different reasoning profiles, depending on the rule-set which was selected. The predefined rule-sets are part of the distribution - look at the PIE files in folder configs/rules. One can also take one of the existing profiles and tailor it to her needs (e.g. remove a rule, which is not necessary).
The fundamental difference between OWL2 RL and what we call OWL-Horst (pD*) is that OWL2RL pushes the limits of which OWL constructs can be supported using this type of entailment rules. OWL Horst is limited to RDFS (subClassOf, subSpropertyOf, domain and range) plus what was popular in the past as OWL Lite: sameAs, equivalentClass, equivalentProperty, SymmetricProperty, TransitiveProperty, inverseOf, FunctionalProperty, InverseFunctionalProperty. There is also partial support for: intersectionOf, someValuesFrom, hasValue, allValuesFrom.
What OWL 2 RL adds on top is support for AsymmetricProperty, IrreflexiveProperty, propertyChainAxiom, AllDisjointProperties, hasKey, unionOf, complementOf, oneOf, differentFrom, AllDisjointClasses and all the property cardinality primitives. It also adds more complete support for intersectionOf, someValuesFrom, hasValue, allValuesFrom. Be aware that there are limitations to the inference supported by OWL 2 RL for some of these properties, e.g. what type of inferences should or should not be done for specific class expressions (OWL restrictions). If you chose OWL 2 RL, check Tables 5-8 in the spec, https://www.w3.org/TR/owl2-profiles/#OWL_2_RL. GraphDB's owl-2-rl data set is fully compliant with it. GraphDB is the only major triplestore with full OWL 2 RL compliance - see the this table (https://www.w3.org/2001/sw/wiki/OWL/Implementations) it appears with its former name OWLIM.
My suggestion would be to go with OWL Horst for a large dataset, as reasoning with OWL 2 RL could be much slower. It depends on your ontology and data patterns, but as a rule of thumb you can expect loading/updates to be 2 times slower with OWL 2 RL, even if you don't use extensively its "expensive" primitives (e.g. property chains). See the difference between loading speeds with RDFS+ and OWL 2 RL benchmarked here: http://graphdb.ontotext.com/documentation/standard/benchmark.html
Finally, I would recommend you to use the "optimized" versions of the pre-defined rule-sets. These versions exclude some RDFS reasoning rules, which are not useful for most of the applications, but add substantial reasoning overheads, e.g. the one that infers that the subject, the predicate and the object of a statement are instances of rdfs:Resource
Id: rdf1_rdfs4a_4b
x a y
-------------------------------
x <rdf:type> <rdfs:Resource>
a <rdf:type> <rdfs:Resource>
y <rdf:type> <rdfs:Resource>
If you want to stay 100% compliant with the W3C spec, you should stay with the non-optimized versions.
If you need further assistance, please, write to support#ontotext.com
In addition to what Atanas (our CEO) wrote and your excellent example, http://graphdb.ontotext.com/documentation/enterprise/rules-optimisations.html provides some ideas how to optimize your rulesets to make them faster.
Two of the ideas are:
ptop:transitiveOver is faster than owl:TransitiveProperty: quadratic vs cubic complexity over the length of transitive chains
ptop:PropChain (a 2-place chain) is faster than general owl:propertyChainAxiom (n-place chain) because it doesn't need to unroll the rdf:List underlying the representation of owl:propertyChainAxiom.
Under some conditions you can translate the standard OWL constructs to these custom constructs, to have both standards compliance and faster speed:
use rule TransitiveUsingStep; if every TransitiveProperty p (eg skos:broaderTransitive) is defined over a step property s (eg skos:broader) and you don't insert p directly
if you use only 2-step owl:propertyChainAxiom then translate them to custom using the following rule, and infer using rule ptop_PropChain:
Id: ptop_PropChain_from_propertyChainAxiom
q <owl:propertyChainAxiom> l1
l1 <rdf:first> p1
l1 <rdf:rest> l2
l2 <rdf:first> p2
l2 <rdf:rest> <rdf:nil>
----------------------
t <ptop:premise1> p1
t <ptop:premise2> p2
t <ptop:conclusion> q
t <rdf:type> <ptop:PropChain>
http://rawgit2.com/VladimirAlexiev/my/master/pubs/extending-owl2/index.html describes further ideas for extended property constructs, and has illustrations.
Let us know if we can help further.
After thinking this for bit, I came up with a concrete example. The Oral Health and Disease Ontology (http://purl.obolibrary.org/obo/ohd.owl) contains three interrelated entities:
a restored tooth
a restored tooth surface that is part of the restored tooth
a tooth restoration procedure that creates the restored tooth surface (e.g., when you have a filling placed in your tooth)
The axioms that define these entities are (using pseudo Manchester syntax):
restored tooth equivalent to Tooth and has part some dental restoration material
restored tooth surface subclass of part of restored tooth
filling procedure has specified output some restored tooth surface
The has specified output relation is a subproperty of the has participant relation, and the has participant relation contains the property chain:
has_specified_input o 'is part of'
The reason for this property chain is for reasoner to infer that if a tooth surface participates in a procedure, then the whole tooth that the surface is part of participates in the procedure, and, furthermore, the patient that the tooth is part of also participates in the procedure (due to the transitivity of part of).
As a concrete example, let define some individuals (using pseudo rdf):
:patient#1 a :patient .
:tooth#1 a :tooth; :part-of :patient#1
:restored-occlusal#1 a :restored-occlusal-surface; :part-of tooth#1 .
:procedure#1 :has-specified-output :restored-occlusal#1 .
Suppose you want to query for all restored teeth:
select ?tooth where {?tooth a :restored-tooth}
RDFS-Plus reasoning will not find any restored teeth b/c it doesn't reason over equivalent classes. But, OWL-Horst and OWL2-RL will find such teeth.
Now, suppose you want to find all patients that underwent (i.e. participated in) a tooth restoration procedure:
select ?patient where {
?patient a patient: .
?proc a tooth_restoration_procedure:; has_participant: ?patient .
}
Again, RDFS-Plus will not find any patients, and neither will OWL-Horst b/c this inference requires reasoning over the has participant property chain. OWL2-RL is needed in order to make this inference.
I hope this example is helpful for the interested reader :).
I welcome comments to improve the example. Please keep any comments within the scope of trying to make the example clearer. Its purpose is to give insight into what these different levels of reasoning do and not to give advice about which reasoner is most appropriate.

Best way to represent part-of (mereological) transitivity for OWL classes?

I have a background in frame-based ontologies, in which classes represent concepts and there's no restriction against assertion of class:class relationships.
I am now working with an OWL2 ontology and trying to determine the best/recommended way to represent "canonical part-of" relationships - conceptually, these are relationships that are true, by definition, of the things represented by each class (i.e., all instances). The part-of relationship is transitive, and I want to take advantage of that so that I'd be able to query the ontology for "all parts of (a canonical) X".
For example, I might want to represent:
"engine" is a part of "car", and
"piston" is a part of "engine"
and then query transitively, using SPARQL, for parts of cars, getting back both engine and piston. Note that I want to be able to represent individual cars later (and be able to deduce their parts by reference to their rdf:type), and of course I want to be able to represent sub-classes of cars as well, so I can't model the above-described classes as individuals - they must be classes.
It seems that I have 3 options using OWL, none ideal. Is one of these recommended (or are any discouraged), and am I missing any?
OWL restriction axioms:
rdfs:subClassOf(engine, someValuesFrom(partOf, car))
rdfs:subClassOf(piston, someValuesFrom(partOf, engine))
The major drawback of the above is that there's no way in SPARQL to query transitively over the partOf relationship, since it's embedded in an OWL restriction. I would need some kind of generalized recursion feature in SPARQL - or I would need the following rule, which is not part of any standard OWL profile as far as I can tell:
antecedent (body):
subClassOf(B, (P some A) ^
subClassOf(C, (P some B) ^
transitiveProperty(P)
consequent (head):
subClassOf(C, (P some A))
OWL2 punning: I could effectively represent the partOf relationships on canonical instances of the classes, by asserting the object-property directly on the classes. I'm not sure that that'd work transparently with SPARQL though, since the partOf relationships would be asserted on instances (via punning) and any subClassOf relationships would be asserted on classes. So if I had, for example, a subclass six_cylinder_engine, the following SPARQL snippet would not bind six_cylinder_engine:
?part (rdfs:subClassOf*/partOf*)+ car
Annotation property: I could assert partOf as an annotation property on the classes, with values that are also classes. I think that would work (minus transitivity, but I could recover that easily enough with SPARQL paths as in the query above), but it seems like an abuse of the intended use of annotation properties.
I think you have performed a good analysis of the problem and the advantages/disadvantages of different approaches. I don't know if any one is discouraged or encouraged. IMHO this problem has not received sufficient attention, and is a bigger problem in some domains than others (I work in bio-ontologies which frequently use partonomies, and hence this is very important).
For 1, your rule is valid and justified by OWL semantics. There are other ways to implement this using OWL reasoners, as well as RDF-level reasoners. For example, using the ROBOT command line wrapper to the OWLAPI, you can run the reason command using an Expression Materializing Reasoner. E.g
robot reason --exclude-tautologies true --include-indirect true -r emr -i engine.owl -o engine-reasoned.owl
This will give you an axiom piston subClassOf partOf some car that can be queried using a non-transitive SPARQL query.
The --exclude-tautologies removes inferences to owl:Thing, and --include-indirect will include transitive inferences.
For your option 2, you have to be careful in that you may introduce incorrect inferences. For example, assume there are some engines without pistons, i.e. engine SubClassOf inverse(part_of) some piston does not hold. However, in your punned shadow world, this would be entailed. This may or may not be a problem depending on your use case.
A variant of your 2 is to introduce different mapping rules for layering OWL T-Tboxes onto RDF, such as described in my OWLStar proposal. With this proposal, existentials would be mapped to direct triples, but there is another mechanism (e.g. reification) to indicate the intended quantification. This allows writing rules that are both safe (no undesired inferences) and complete (for anything expressible in OWL-RL). Here there is no punning (under the alternative RDF to OWL interpretation). You can also use the exact same transitive SPARQL query you wrote to get the desired results.

How does solr choose labels when using STC algorithm

I am currently trying to use Solr to do clustering. I am using the STC algorithm. However, I do not know how the labels of clusters are generated. I know that the labels of the nodes in the suffix tree are used, but in what way? What suffix(terms) will be chosen? Thank you.
STC is the implementation of Oren Zamir's Suffix Tree Clustering algorithm. For an in-depth description of the algorithm, take a look at Zamir's PhD dissertation.

Logic programming with integer or even floating point domains

I am reading a lot about logic programming - ASP (Answer Set Programming) is one example or this. They (logic programs) are usually in the form:
[Program 1]
Rule1: a <- a1, a2, ..., not am, am+1;
Rule2: ...
This set of rules is called the logic program and the s.c. model is the result of such computation - some kind of assignment of True/False values to each of a1, a2, ...
There is lot of research going on - e.g. how such kind of programs (rules) can be integrated with the (semantic web) ontologies to build knowledge bases that contain both - rules and ontologies (some kind of constraints/behaviour and data); there is lot of research about ASP itself - like parallel extensions, extensions for probabilistic logic, for temporal logic and so on.
My question is - is there some kind of research and maybe some proof-of-concept projects where this analysis is extended from Boolean variables to variables with integer and maybe even float domains? Currently I have not found any research that could address the following programs:
[Program 2]
Rule1 a1:=5 <- a2=5, a3=7, a4<8, ...
Rule2 ...
...
[the final assignment of values to a1, a2, etc., is the solution of this program]
Currently - as I understand - if one could like to perform some kind of analysis on Program-2 (e.g. to find if this program is correct in some sense - e.g. if it satisfies some properties, if it terminates, what domains are allowed not to violate some kind of properties and so on), then he or she must restate Program-2 in terms of Program-1 and then proceed in way which seems to be completely unexplored - to my knowledge (and I don't believe that is it unexplored, simply - I don't know some sources or trend). There is constraint logic programming that allow the use of statements with inequalities in Program-1, but it is too focused on Boolean variables as well. Actually - Programm-2 is of kind that can be fairly common in business rules systems, that was the cause of my interest in logic programming.
SO - my question has some history - my practical experience has led me to appreciate business rules systems/engines, especially - JBoss project Drools and it was my intention to do some kind of research of theory underlying s.c. production rules systems (I was and I am planning to do my thesis about them - if I would spot what can be done here), but I can say that there is little to do - after going over the literature (e.g. http://www.computer.org/csdl/trans/tk/2010/11/index.html was excellent IEEE TKDE special issues with some articles about them, one of them was writter by Drools leader) one can see that there is some kind of technical improvements of the decades old Rete algorithm but there is no theory of Drools or other production rule systems that could help with to do some formal analysis about them. So - the other question is - is there theory of production rule systems (for rule engines like Drools, Jess, CLIPS and so on) and is there practical need for such theory and what are the practical issues of using Drools and other systems that can be addressed by the theory of production rule systems.
p.s. I know - all these are questions that should be directed to thesis advisor, but my current position is that there is no (up to my knowledge) person in department where I am enrolled with who could fit to answer them, so - I am reading journals and also conference proceedings (there are nice conference series series of Lecture Notes in Computer Science - RuleML and RR)...
Thanks for any hint in advance!
In a sense the boolean systems already do what you suggest.
to ensure A=5 is part of your solution, consider the rules (I forget my ASP syntax so bear with me)
integer 1..100 //integers 1 to 100 exist
1{A(X) : integer(X)}1 //there is one A(X) that is true, where X is an integer
A(5) //A(5) is true
and I think your clause would require:
integer 1..100 //integers 1 to 100 exist
1{A(X) : integer(X)}1 //A1 can take only one value and must take a value
1{B(X) : integer(X)}1 //A2 ``
1{C(X) : integer(X)}1 //A3 ``
1{D(X) : integer(X)}1 //A4 ``
A(5) :- B(5), C(7), D(8) //A2=5, A3=7, A4=8 ==> A1=5
I hope I've understood the question correctly.
Recent versions of Clojure core.logic (since 0.8) include exactly this kind of support, based on cKanren
See an example here: https://gist.github.com/4229449

How to determine subject, object and other words?

I'm trying to implement application that can determine meaning of sentence, by dividing it to smaller pieces. So I need to know what words are subject, object etc. so that my program can know how to handle this sentence.
This is an open research problem. You can get an overview on Wikipedia, http://en.wikipedia.org/wiki/Natural_language_processing. Consider phrases like "Time flies like an arrow, fruit flies like a banana" - unambiguously classifying words is not easy.
You should look at the Natural Language Toolkit, which is for exactly this sort of thing.
See this section of the manual: Categorizing and Tagging Words - here's an extract:
>>> text = nltk.word_tokenize("And now for something completely different")
>>> nltk.pos_tag(text)
[('And', 'CC'), ('now', 'RB'), ('for', 'IN'), ('something', 'NN'),
('completely', 'RB'), ('different', 'JJ')]
"Here we see that and is CC, a coordinating conjunction; now and completely are RB, or adverbs; for is IN, a preposition; something is NN, a noun; and different is JJ, an adjective."
I guess there is not "simple" way to do this. You have to build a linguistic analyzer (which is quite possible), however, a language as a lot of exceptional cases. And that is what makes implementing a linguistic analyzer that hard.
The specific problem you mention, the identification of the subject and objects of a clause, is accomplished by syntactic parsing. You can get a good idea of how parsing works by using this demo of parsing software developed by Stanford University.
However, syntactic parsing does not determine the meanining of a sentence, only its structure. Determining meaning (semantics) is a very hard problem in general and there is no technology that can really 'understand' a sentence in the same way that a human would. Although there is no general solution, you may be able to do something in a very restricted subject domain. For example, is the data you want to analyse about a narrow topic with a limited set of 'things' that people talk about?
StompChicken has given the right answer to this question, but I'd like to add that the concepts of subject and object are known as grammatical relations, and that Briscoe and Carroll's RASP is a parser that can go the extra step of deducing a list of relations from the parse.
Here's some example output from their demo page. It's an extract from the output for a sentence that begins "We describe a robust accurate domain-independent approach...":
(|ncsubj| |describe:2_VV0| |We:1_PPIS2| _)
(|dobj| |describe:2_VV0| |approach:7_NN1|)

Resources