OWLAPI slow calculating disjoint axioms - owl-api

My goal: find the disjoint axioms (asserted and inferred) in an ontology which contains around 5000 axioms.
My code:
for (OWLClass clazz1 : ontology.getClassesInSignature()) {
for (OWLClass clazz2 : ontology.getClassesInSignature()) {
OWLAxiom axiom = MyModel.factory.getOWLDisjointClassesAxiom(clazz2, clazz1);
if( !(ontology.containsAxiom(axiom)) && reasoner.isEntailed(axiom))
{
System.out.println(clazz2.toString() + " disjoint with " + clazz1.toString());
}
The problem: the execution time is extremely slow, I'd say eternal. Even if I reduce the number of comparison with some if statement, the situation is still the same.
Protege seems to be very quick to compute those inferred axioms and it's based on the same API I am using (OWLAPI). So, am I in the wrong approach?

Profiling the code will very likely reveal that the slow part is
reasoner.isEntailed(axiom)
This form requires the reasoner to recompute entailments for each class pair, including the pairs where clazz1 and clazz2 are equal (you might want to skip that).
Alternatively, you can iterate through the classes in signature once and use the reasoner to get all disjoint classes:
Set<OWLClass> visited=new HashSet<>();
for (OWLClass c: ontology.getClassesInSignature()) {
if (visited.add(c)) {
NodeSet set = reasoner.getDisjointClasses(c);
for (Node node: set.getNodes()) {
System.out.println("Disjoint with "+c+": "+node);
visited.addAll(node.getEntities());
}
}
}
Worst case scenario, this will make one reasoner call per class (because no class is disjoint). Best case scenario, all classes are disjoint or equivalent to another class, so only one reasoner call is required.

Related

Query on TFP Probabilistic Model

In the TFP tutorial, the model output is Normal distribution. I noted that the output can be replaced by an IndependentNormal layer. In my model, the y_true is binary class. Therefore, I used an IndependentBernoulli layer instead of IndependentNormal layer.
After building the model, I found that it has two output parameters. It doesn't make sense to me since Bernoulli distribution has one parameter only. Do you know what went wrong?
# Define the prior weight distribution as Normal of mean=0 and stddev=1.
# Note that, in this example, the we prior distribution is not trainable,
# as we fix its parameters.
def prior(kernel_size, bias_size, dtype=None):
n = kernel_size + bias_size
prior_model = Sequential([
tfpl.DistributionLambda(
lambda t: tfd.MultivariateNormalDiag(loc=tf.zeros(n), scale_diag=tf.ones(n))
)
])
return prior_model
# Define variational posterior weight distribution as multivariate Gaussian.
# Note that the learnable parameters for this distribution are the means,
# variances, and covariances.
def posterior(kernel_size, bias_size, dtype=None):
n = kernel_size + bias_size
posterior_model = Sequential([
tfpl.VariableLayer(tfpl.MultivariateNormalTriL.params_size(n), dtype=dtype),
tfpl.MultivariateNormalTriL(n)
])
return posterior_model
# Create a probabilistic DL model
model = Sequential([
tfpl.DenseVariational(units=16,
input_shape=(6,),
make_prior_fn=prior,
make_posterior_fn=posterior,
kl_weight=1/X_train.shape[0],
activation='relu'),
tfpl.DenseVariational(units=16,
make_prior_fn=prior,
make_posterior_fn=posterior,
kl_weight=1/X_train.shape[0],
activation='sigmoid'),
tfpl.DenseVariational(units=tfpl.IndependentBernoulli.params_size(1),
make_prior_fn=prior,
make_posterior_fn=posterior,
kl_weight=1/X_train.shape[0]),
tfpl.IndependentBernoulli(1, convert_to_tensor_fn=tfd.Bernoulli.logits)
])
model.summary()
screenshot of the results executed the codes on Google Colab
I agree the summary display is confusing but I think this is an artifact of the way tfp layers are implemented to interact with keras. During normal operation, there will only be one return value from a DistributionLambda layer. But in some contexts (that I don't fully grok) DistributionLambda.call may return both a distribution and a side-result. I think the summary plumbing triggers this for some reason, so it looks like there are 2 outputs, but there will practically only be one. Try calling your model object on X_train, and you'll see you get a single distribution out (its type is actually something called TensorCoercible, which is a wrapper around a distribution that lets you pass it into tf ops that call tf.convert_to_tensor -- the resulting value for that op will be the result of calling your convert_to_tensor_fn on the enclosed distribution).
In summary, your distribution layer is fine but the summary is confusing. It could probably be fixed; I'm not keras-knowledgeable enough to opine on how hard it would be.
Side note: you can omit the event_shape=1 parameter -- the default value is (), or "scalar", which will behave the same.
HTH!

What is the difference between deep_copy and gen keeping in Specman?

can someone tell me what is the difference between coping one transaction(item) to the another like in examples bellow (add_method_port_1 and add_method_port_2):
add_method_port_1 (added_item: item_s) is {
var new_item: new_item_s;
gen new_item keeping {
it.my_trans_s == added_item.as_a(t_trans_s);
};
};
add_method_port_2 (added_item: item_s) is {
var new_item : new_item_s = deep_copy(added_item.as_a(t_trans_s));
};
Where new_item_s looks like:
struct new_item_s like item_s {
%my_trans_s: t_trans_s;
};
Thanks,
Andrija
Actually, the results of the two methods are different even if the assumption mentioned in Rodion's answer does hold.
With the first method, new_item points to the same my_trans_s object as the original added_item, because the constraint it.my_trans_s == added_item.as_a(t_trans_s) means pointer equality.
With the second method, new_item points to a copy of the original my_trans_s, because deep_copy copies everything recursively.
In this specific example, assuming that new_item_s has only one field my_trans_s, there is no difference in outcome.
In practice, the meaning and the goal of "gen keeping" and deep_copy is quite different:
gen keeping, even with '==' constraints, practically assignments, means random-constraint generating an item executing iGen logic engine; if this is a struct then pre_generate and post_generate methods are invoked, and all the fields not mentioned in 'keeping {}' block are also randomly generated according to existing constraints and their type properties. It is usually used to create a new item for which only some properties are known.
deep_copy creates an exact copy (up to some minor nuances) of the given struct, and if it has fields which are also structs - copy of all connected graph topology. There is no random generation, no special methods, no logical engine executed. Usually it used to capture the data at some point for later analysis.
In other words, if the assumption "new_item_s has only one field my_trans_s" is wrong, the result are going to be very much different.

Incorporating fuzzy search in a matcher object

My task is querying medical texts for institute names using a rule as below:
[{'ENT_TYPE': 'institute_name'}, {'TEXT': 'Hospital'}]
The rule will only idenify a match if both terms are included therein. Thus, it will accept "Mount Sinai Hospital", but not "Mount Sinai". I've tried spaczz that wraps spaCy and works great for single term or phrase. However neither spaCy not spaczz allow for a fuzzy multi-words rule with more than one typo as in "Moung Sinai Mospital."
Therefore, I'm trying to re-write the Matcher object by incorporating a fuzzy similarity algorithm such as RapidFuzz but I'm having some difficulty with its Cython component.
The Matcher's Class call method finds all token sequences matching the supplied patterns on doclike, the document to match over or a Span (Type: Doc/Span), returning
a list of (match_id, start, end) tuples, describing the matches:
matches = find_matches (&self.patterns[0], self.patterns.size(), doclike, length,
extensions=self._extensions, predicates=self._extra_predicates)
for i, (key, start, end) in enumerate(matches):
on_match = self._callbacks.get(key, None)
if on_match is not None:
on_ma
return matches
find_matches is a cython class that returns the matches in a doc, with a compiled array of patterns as a list of (id, start, end) tuples and has main loop that seems to match the doc against the pre-defined patterns:
# Main loop
cdef int nr_predicate = len(predicates)
for i in range(length):
for j in range(n):
states.push_back(PatternStateC(patterns[j], i, 0))
transition_states(states, matches, predicate_cache,
doclike[i], extra_attr_values, predicates)
extra_attr_values += nr_extra_attr
predicate_cache += len(predicates)
Can you help me locate the actual matching operation (pattern against string) in the python/C-level objects as attributes? I hope to be able to extend this operation with the fuzzy matching algorithm. You can find the code for the Matcher class, the call method and the find_matches class here.
You can follow a more pythonic effort to achieve this goal by spaczz here.
I think the easiest way would be to add an additional predicate type called something like FUZZY. Look at how the regex, set, and comparison predicates are defined and do something similar for FUZZY with your custom code to match on strings with small edit differences:
https://github.com/explosion/spaCy/blob/master/spacy/matcher/matcher.pyx#L687-L781
The predicate classes are standard python classes, no cython required. You'll also need to add the predicate to the schema in spacy/matcher/_schemas.py.
Remember that like the rest of the Matcher predicates, it matches over tokens, so your definition of fuzziness will have to be at the token level.

Multidimensional data type in Scala

I have a generic class, which is constructed from multi-dimensional data (i.e. n-dimensional Ararys or Vectors).
In this case, I would like the class to be instantiated by one type only (e.g. Vector), aside from its dimensionality (Vector[Vector[T]] but not Vector[Array[T]]).
Having this class signature:
class Foo[T](x: Vector[T], y: Bar[T])
how could I guarantee that T would be for example Vector[T] or Vector[Vector[T]] (or etc...) but NOT Array[T] or Vector[Array[T]]?
While generics are wiped out at compile time by type-erasure, you can rely on the "evidence" mechanism.
An evidence is a special typeclass whose goal is only to witness the relation between type A and type B.
A<:<B is a witness that A is a subclass of B
A=:=B is a witness that A is a B
A>:>B is a witness that A is a superclass of B
so you can write something like that
case class MyContainer[A,B](b:Vector[B])(implicit ev: B <:< Vector[A])
Inside your class, you can simply treat every element of vector b as a Vector A by applying the evidence to each item: i.e.
b flatMap {
x => ev(x) map {_.toString}
}

Inferring using Jena

InfModel infmodel = ModelFactory.createInfModel(reasoner, m);
Resource vegetarian = infmodel.getResource(source + "Vegetarian");
Resource margherita = infmodel.getResource(source + "Example-Margherita");
if (infmodel.contains(margherita, RDF., vegetarian)) {
System.out.println("Margherita is a memberOf Vegetarian pizza");
}
The example given above is formed by formal pizza.owl. In this owl, Example-Margherita is an individual of Margherita class. So, it is already written in owl file. However, the problem is that the reasoner should infer that margherita-example should be also an vegetarian pizza.
Could anyone please give an example that shows how to find an individual's possible inferred classes like in Protege ?(Protege correctly infers that Example-Margherita is a Vegetarian Pizza. However, I can't infer programmatically)
I solved my question. I think there was a problem with my ontology. Therefore, I created another ontology to infer individuals. The ontology that I created contains Person and subclasses of Person : MalePerson, FemalePerson and MarriedPerson. And, there are two object properties(hasSpouse, hasSibling) and one data type property(hasAge).
And, I created 3 individuals.
John - MalePerson - hasAge(20) - hasSibling(Jane)
Jane - FemalePerson - hasSibling(John) - hasSpouse(Bob)
Bob - MalePerson - hasSpouse(Jane)
And, I put two restrictions for MalePerson and FemalePerson classes.
For MalePerson :
hasSpouse max 1
hasSpouse only MalePerson
For FemalePerson :
hasSpouse max 1
hasSpouse only FemalePerson
Lastly, I made MarriedPerson to be a defined class. Before reasoning, MarriedPerson has no individual. However, the model should infer that Jane and Bob are married. Therefore, at the end, MarriedPerson class should have 2 individuals.
When I ran this code in Java using Jena, I got 2 inferred individuals.
OntModel ontModel = ModelFactory.createOntologyModel();
InputStream in = FileManager.get().open(inputFileName);
if (in == null) {
throw new IllegalArgumentException( "File: " + inputFileName + " not found");
}
ontModel.read(in, "");
Reasoner reasoner = ReasonerRegistry.getOWLReasoner();
reasoner = reasoner.bindSchema(ontModel);
// Obtain standard OWL-DL spec and attach the Pellet reasoner
OntModelSpec ontModelSpec = OntModelSpec.OWL_DL_MEM;
ontModelSpec.setReasoner(reasoner);
// Create ontology model with reasoner support
OntModel model = ModelFactory.createOntologyModel(ontModelSpec, ontModel);
// MarriedPerson has no asserted instances
// However, if an inference engine is used, two of the three
// individuals in the example presented here will be
// recognized as MarriedPersons
//ns is the uri
OntClass marPerson = model.getOntClass(ns + "OWLClass_00000003866036241880"); // this is the uri for MarriedPerson class
ExtendedIterator married = marPerson.listInstances();
while(married.hasNext()) {
OntResource mp = (OntResource)married.next();
System.out.println(mp.getURI());
} // this code returns 2 individuals with the help of reasoner

Resources