BlackBoxExplanation or GlassBoxExplanation, which one should I use? - owl-api

I'm writing a program that, given an OWL ontology, retrieves all the explanations for a query by using Pellet as reasoner.
To do that the OWLAPI provides a class named HSTExplanationGenerator that implements the Hitting Set Tree algorithm to find all the explanations.
When I want to create an instance of HSTExplanationGenerator I should give a class that implements the interface TransactionAwareSingleExpGen, a class that implements this interface should provide a method to compute an explanation.
Now, OWLAPI provides two classes which implement this interface: BlackBoxExplanation and GlassBoxExplanation. I have read the code of the two classes. GlassBoxExplanation gets the explanation from Pellet, prune it and then converts it into a set of OWLAxiom. However, I found it hard to understand what BlackBoxExplanation does. The questions are: which one should I use? Which are the main differences between these two classes?

GlassBoxExplanation is, as far as I can tell, provided by Pellet, not OWLAPI.
The main difference between a black box explanation and a glass box explanation is that the black box explanation cannot know the reasoner's internals - it is limited to what is available through the OWLReasoner interface. In this respect, the definition is no different from black box testing and white box testing in software engineering.
That said, you might want to use the owlexplanation project instead. It is based on laconic explanations, which are a more recent development in OWL entailment explanation than what is available in both OWLAPI and (old versions of) Pellet.
https://github.com/matthewhorridge/owlexplanation

Related

Can I use OWL API to enforce specific subject-predicate-object relationships?

I am working on a project using RDF data and I am thinking about implementing a data cleanup method which will run against an RDF triples dataset and flag triples which do not match a certain pattern, based on a custom ontology.
For example, I would like to enforce that class http://myontology/A must denote http://myontology/Busing the predicate http://myontology/denotes. Any instance of Class A which does not denote an instance of Class B should be flagged.
I am wondering if a tool such as the OWLReasoner from OWL-API would have the capability to accomplish something like this, if I designed a custom axiom for the Reasoner. I have reviewed the documentation here: http://owlcs.github.io/owlapi/apidocs_4/org/semanticweb/owlapi/reasoner/OWLReasoner.html
It seems to me that the methods available with the Reasoner might not be up for the purpose which I would like to use them for, but I'm wondering if anyone has experience using OWL-API for this purpose, or knows another tool which could do the trick.
Generally speaking, OWL reasoning is not well suited to finding information that's missing in the input and flagging it up: for example, if you create a class that asserts that an instance of A has exactly one denote relation to an instance of B, and have an instance of A that does not, under Open World assumption the reasoner will just assume that the missing statement is not available, not that you're in violation.
It would be possible to detect incorrect denote uses - if, instead of relating to an instance of B, the relation was to an instance of a class disjoint with B. But this seems a different use case than the one you're after.
You can implement code with the OWL API to do this check, but it likely wouldn't benefit from the ability to reason, and given that you're working at the RDF level I'd think an API like Apache Jena might actually work better for you (you won't need to worry if your input file is not OWL compliant, for example).

Generate individuals from OWL class defintion

I'm fairly new to ontologies and have the following situation:
Given a class definition, I want to automatically generate individuals based on all possible combinations of a given restriction.
For example:
Let's say a "Pizza" class has the property "hasTopping" which is supposed to be linked to an individual of class "Topping". I want to generate an individual of the class Pizza for each individual existing for a Topping. If there are two Topping individuals, Tomato and Cheese, I want to create one Pizza individual with "hasTopping Tomato" and one with "hasTopping Cheese".
Is there any general way to generate individuals in ontologies like this? (As an alternative to implement it myself.)
Is this "violating" the intent/purpose of ontologies in general? Would this usually be handled in a different way? (I'm not completely familiar with ontologies yet.)
There's no standard method to do this, so I think you'll have to implement it yourself. The Leigh University Benchmark does something similar, so it might provide you with some ideas: http://swat.cse.lehigh.edu/projects/lubm/
I don't think this violates the idea behind ontologies at all - seems quite straightforward. There is no best practice for it, so however you choose to implement it will probably be adequate.

Some questions about Hydra's terms

I'm working on a Hydra documentation generator for Golang. I've been using the demo as an example and I was wondering about the ambiguity in some hydra terms.
What's the difference between hydra:title and rdfs:label? label is used in vocab:User, but hydra:title is used for Resource and Collection, as well as in properties.
Speaking of Resource and Collection, why are they re-described in this ApiDocumentation? Shouldn't they be part of hydra/core?
In many properties, there's both a hydra:title + hydra:description and label + description that contain the same information. Why is that? Can I ignore one and be fine?
Apologies in advance if I failed to spot that in the spec, but I've only recently gained an interest in hypermedia APIs and many concepts are still a bit hazy.
• What's the difference between hydra:title and rdfs:label?
rdfs:label is used for the vocabulary definition itself. hydra:title is used to overwrite that label in Hydra clients (that use it for instance to render forms). This was the first issue that was opend when Hydra's further development was moved into a W3C Community Group: Hydra ISSUE-1
• Speaking of Resource and Collection, why are they re-described in this ApiDocumentation? Shouldn't they be part of hydra/core?
They are part of the Hydra Core Vocabulary. As such, it isn't necessary to re-describe them. It was an implementation shortcut I took.
• In many properties, there's both a hydra:title + hydra:description and label + description that contain the same information. Why is that? Can I ignore one and be fine?
See the answer to the first question. In general, you should prefer the Hydra versions in a Hydra-specific tool but fall back to the rdfs properties.
Btw. there's a dedicated mailing list for Hydra. Join the W3C Community Group if you are interested in influencing the future development of Hydra. You should definitely announce your documentation generator there as well.

What impact does using these facilities have on orthogonality?

I am reading The Pragmatic Programmer: From Journeyman to Master by Andrew Hunt, David Thomas. When I was reading about a term called orthogonality I was thinking that I am getting it right. I was understanding it very well. However, at the end of the chapter a few questions were asked to measure the level of understanding of the subject. While I was trying to answer those questions to myself I realized that I haven't understood it perfectly. So to clarify my understandings I am asking those questions here.
C++ supports multiple inheritance, and Java allows a class to
implement multiple interfaces. What impact does using these facilities
have on orthogonality? Is there a difference in impact between using multiple
inheritance and multiple interfaces?
There are actually three questions bundled up here: (1) What is the impact of supporting multiple inheritance on orthogonality? (2) What is the impact of implementing multiple interfaces on orthogonality? (3) What is the difference between the two sorts of impact?
Firstly, let us get to grips with orthogonality. In The Art of Unix Programming, Eric Raymond explains that "In a purely orthogonal design, operations do not have side effects; each action (whether it's an API call, a macro invocation, or a language operation) changes just one thing without affecting others. There is one and only one way to change each property of whatever system you are controlling."
So, now look at question (1). C++ supports multiple inheritance, so a class in C++ could inherit from two classes that have the same operation but with two different effects. This has the potential to be non-orthogonal, but C++ requires you to state explicitly which parent class has the feature to be invoked. This will limit the operation to only one effect, so orthogonality is maintained. See Multiple inheritance.
And question (2). Java does not allow multiple inheritance. A class can only derive from one base class. Interfaces are used to encode similarities which the classes of various types share, but do not necessarily constitute a class relationship. Java classes can implement multiple interfaces but there is only one class doing the implementation, so there should only be one effect when a method is invoked. Even if a class implements two interfaces which both have a method with the same name and signature, it will implement both methods simultaneously, so there should only be one effect. See Java interface.
And finally question (3). The difference is that C++ and Java maintain orthogonality by different mechanisms: C++ by demanding the the parent is explicitly specified, so there will be no ambiguity in the effect; and Java by implementing similar methods simultaneously so there is only one effect.
Irrespective of any number of interfaces/ classes you extend there will be only one implementation inside that class. Lets say your class is X.
Now orthogonality says - one change should affect only one module.
If you change your implementation of one interface in class X - will it affect other modules/classes using your class X ? Answer is no - because the other modules/classes are coding by interface not implementation.
Hence orthogonality is maintained.

How to automatically excerpt user generated content?

I run a website that allows users to write blog-post, I would really like to summarize the written content and use it to fill the <meta name="description".../>-tag for example.
What methods can I employ to automatically summarize/describe the contents of user generated content?
Are there any (preferably free) methods out there that have solved this problem?
(I've seen other websites just copy the first 100 or so words but this strikes me as a sub-optimal solution.)
Think of the task of summarization as a challenge to 'select the most important sentences' from the document.
The method described in The Automatic Creation of Literature Abstracts by H.P. Luhn (1958) describes a naive method that actually performs quite well. Try giving it a shot.
If your website is in Python coding this algorithm using the NLTK (Natural Language Toolkit) is a fun task.
Make it predictable.
From a users perspective simply using the first paragraph is not bad at all.
Using any automation is bound to fall flat in some cases. So I suggest to display
the first paragraph (maybe truncating at some point) as a summary and offer the ability to override that by an optional field.
I might try using mechanical Turk or any number of other crowdsourcing options.
Another item to check out, a SourceForge project, AutoSummary Semantic Analysis Engine
Not a trivial task... You should look for articles or books on "extractive summarization"
A few starters could be:
Books:
Natural Language Processing with Python
Foundations of Statistical Natural Language Processing
Articles:
Language independent extractive summarization
Extractive summarization: how to identify the gist of a text
Extractive Summarization using Inter- and Intra- Event Relevance
Yahoo has a free API for this:
http://developer.yahoo.com/search/content/V1/termExtraction.html
Apple's patent 6424362 - Auto-summary of document content contains sample code which might be useful...
This borders on artificial intelligence so there's not going to be an "easy" solution out there, but there are products that target this problem.
Check out Copernic Summarizer, for one.
Noun phrases typically tend to be important elements of a sentence. Picking sentence(s) with a high density of noun phrases could yield a good summary. You could get noun phrases using a POS tagger.
For a good summary, it is desirable that it is a meaningful sentence. Reading a broken sentence is slightly jarring.
Alternatively, when the author posts the article, the author can highlight what are the keywords that can be used in the description which can then be automatically put in the meta description tag.

Resources