Hermit reasoning inconsistency checking and memory consumption - owl

I use Hermit via Protege to check the consistency of my OWL model. However, it seems that the ontologies that I import in my model, lead to failure due to lack of memory. I run on a 16GB RAM PC and even when I use Xmx14500M -Xms14000M parameters for java heap, the reasoning process fails due to memory size, while showing the "Building class hierarchy..." message.
Questions:
I understand that (at least sometimes) when my model is inconsistent, these inconsistency errors are raised relatively early (e.g. in the first 3-4 minutes), before the "Building class Hierarchy..."
message. I cannot really confirm that this happens every time. Is it safe to assume that Hermit checks for inconsistencies first and then tries to construct the model's inferred class hierarchy? If this is true, when I reach the "Building class hierarchy..." message, it means that my model is consistent. Is this true?
While the inferred relationships (e.g. the inferred class hierarchy) could be useful in my use case, checking the consistency of my model is my priority. Could I somehow instruct Hermit to only check for inconsistencies and not try to build the requested class hierarchy?
Can I do something to increase performance? Apart from increasing the java heap size, is there any kind of customization that could be useful?
Would other reasoners typically behave better regarding memory consumption? Should I try another alternative?
Please note that I use Hermit reasoner 1.3.8.413 via Protege 5.2

Related

What could be causing NullPointerException in Camel XSLT processing?

For the last few weeks, I have been trying to deal with an intermittent problem on a camel route using XSLT processing following aggregation. It is intermittent in the sence that while it frequently raises this exception, I can re-run the data extract and processing that failed a few seconds later and it usually succeeds. I have yet to find any data that fails consistently.
I am assuming that the aggregation is causing the problem, but I can't for the life of me understand why. I thought it might be the custom aggregation bean I was using, so I replaced it with XSLTAggreationStrategy, but it still intermittently gives this issue, either when further transforming the aggregated XML, or when just writing it out to the a file.
This is executing in an Apache-Karaf environment, and I have Camel-Saxon 2.21.2 and Apache ServiceMix Saxon-HE 9.8.0.8_1 bundles loaded.
Thanks for looking.
The abridged stack trace is:
...
Caused by: [java.lang.NullPointerException -
null]java.lang.NullPointerException
at net.sf.saxon.dom.DOMNodeWrapper$ChildEnumeration.skipFollowingTextNodes(DOMNodeWrapper.java:1149)
at net.sf.saxon.dom.DOMNodeWrapper$ChildEnumeration.next(DOMNodeWrapper.java:1178)
at net.sf.saxon.tree.util.Navigator$EmptyTextFilter.next(Navigator.java:1078)
at net.sf.saxon.tree.util.Navigator$AxisFilter.next(Navigator.java:1039)
at net.sf.saxon.tree.util.Navigator$AxisFilter.next(Navigator.java:1017)
at net.sf.saxon.expr.parser.ExpressionTool.effectiveBooleanValue(ExpressionTool.java:643)
at net.sf.saxon.expr.Expression.effectiveBooleanValue(Expression.java:532)
at net.sf.saxon.pattern.PatternWithPredicate.matches(PatternWithPredicate.java:141)
at net.sf.saxon.trans.Mode.searchRuleChain(Mode.java:570)
at net.sf.saxon.trans.Mode.getRule(Mode.java:476)
at net.sf.saxon.trans.Mode.applyTemplates(Mode.java:1041)
at net.sf.saxon.expr.instruct.ApplyTemplates.apply(ApplyTemplates.java:281)
at net.sf.saxon.expr.instruct.ApplyTemplates.processLeavingTail(ApplyTemplates.java:241)
at net.sf.saxon.expr.instruct.Template.applyLeavingTail(Template.java:239)
at net.sf.saxon.trans.Mode.applyTemplates(Mode.java:1057)
at net.sf.saxon.expr.instruct.ApplyTemplates.apply(ApplyTemplates.java:281)
at net.sf.saxon.expr.instruct.ApplyTemplates.process(ApplyTemplates.java:237)
at net.sf.saxon.expr.instruct.ElementCreator.processLeavingTail(ElementCreator.java:431)
at net.sf.saxon.expr.instruct.ElementCreator.processLeavingTail(ElementCreator.java:373)
at net.sf.saxon.expr.instruct.Template.applyLeavingTail(Template.java:239)
at net.sf.saxon.trans.Mode.applyTemplates(Mode.java:1057)
at net.sf.saxon.Controller.transformDocument(Controller.java:2080)
at net.sf.saxon.Controller.transform(Controller.java:1903)
at org.apache.camel.builder.xml.XsltBuilder.process(XsltBuilder.java:141)
at org.apache.camel.impl.ProcessorEndpoint.onExchange(ProcessorEndpoint.java:103)
at org.apache.camel.component.xslt.XsltEndpoint.onExchange(XsltEndpoint.java:138) ...
In 9.8.0.8, the class net.sf.saxon.dom.DOMNodeWrapper has only 1144 lines, so a stacktrace showing line 1178 suggests there's some kind of versioning problem.
The class DOMNodeWrapper was first introduced in 9.5 (previously it was called NodeWrapper), and the line numbers are just one off from those in the current 9.5 source, so I suspect what you have loaded is some sub-release of the 9.5 branch. Other line numbers in the stack trace are also consistent with this being 9.5.
That of course doesn't explain the problem, but it might give a clue.
My immediate instinct was that over the years since 9.5 we might have fixed a multi-threading bug. DOM is not thread-safe, so Saxon takes considerable care to synchronize its access. Saxon bug https://saxonica.plan.io/issues/2376 addresses this problem. On the 9.5 branch this was first fixed in maintenance release 9.5.1.11, so it's possible you don't have that patch. I think it would be useful to investigate why you are loading an old version of Saxon, and another useful angle would be to discover exactly which version it is (the static method net.sf.saxon.Version.getProductVersion() will give you this information.)
Incidentally, if you are using multi-threaded access to a DOM tree then you should ask yourself whether this is a good idea. Saxon access to DOM is slow at the best of times (compared to JDOM and XOM, let alone to Saxon's native tree model), and the lack of thread safety and the need for synchronisation makes it a pretty poor choice in a multi-threaded application.
Also, note that Saxon can synchronize its own access to the DOM, but it can't synchronize with third-party code that might also be using the DOM.

Can I use OWL API to enforce specific subject-predicate-object relationships?

I am working on a project using RDF data and I am thinking about implementing a data cleanup method which will run against an RDF triples dataset and flag triples which do not match a certain pattern, based on a custom ontology.
For example, I would like to enforce that class http://myontology/A must denote http://myontology/Busing the predicate http://myontology/denotes. Any instance of Class A which does not denote an instance of Class B should be flagged.
I am wondering if a tool such as the OWLReasoner from OWL-API would have the capability to accomplish something like this, if I designed a custom axiom for the Reasoner. I have reviewed the documentation here: http://owlcs.github.io/owlapi/apidocs_4/org/semanticweb/owlapi/reasoner/OWLReasoner.html
It seems to me that the methods available with the Reasoner might not be up for the purpose which I would like to use them for, but I'm wondering if anyone has experience using OWL-API for this purpose, or knows another tool which could do the trick.
Generally speaking, OWL reasoning is not well suited to finding information that's missing in the input and flagging it up: for example, if you create a class that asserts that an instance of A has exactly one denote relation to an instance of B, and have an instance of A that does not, under Open World assumption the reasoner will just assume that the missing statement is not available, not that you're in violation.
It would be possible to detect incorrect denote uses - if, instead of relating to an instance of B, the relation was to an instance of a class disjoint with B. But this seems a different use case than the one you're after.
You can implement code with the OWL API to do this check, but it likely wouldn't benefit from the ability to reason, and given that you're working at the RDF level I'd think an API like Apache Jena might actually work better for you (you won't need to worry if your input file is not OWL compliant, for example).

Django Models / SQLAlchemy are bloated! Any truly Pythonic DB models out there?

"Make things as simple as possible, but no simpler."
Can we find the solution/s that fix the Python database world?
Update: A 'lustdb' prototype has been written by Alex Martelli - if you know any somewhat lightweight, high-level database libraries with multiple backends we could wrap in syntax sugar honey, please weigh in!
from someAmazingDB import *
#we imported a smart model class and db object which talk to database adapter/s
class Task (model):
title = ''
done = False #native types not a custom object we have to think about!
db.taskList = []
#or
db.taskList = expandableTypeCollection(Task) #not sure what this syntax would be
db['taskList'].append(Task(title='Beat old sql interfaces',done=False))
db.taskList.append(Task('Illustrate different syntax modes',True)) # ok maybe we should just use kwargs
#at this point it should be autosaved to a default db option
#by default we should be able to reload the console and access the default db:
>> from someAmazingDB import *
>> print 'Done tasks:'
>> for task in db.taskList:
>> if task.done:
>> print task.title
'Illustrate different syntax modes'
I'm a fan of Python, webPy and Cherry Py, and KISS in general.
We're talking automatic Python to SQL type translation or NoSQL.
We don't have to totally be SQL compatible! Just a scalable subset or ignore it!
Re:model changes, it's ok to ask the developer when they try to change it or have a set of sensible defaults.
Here is the challenge: The above code should work with very little modification or thinking required. Why must we put up with compromise when we know better?
It's 2010, we should be able to code scalable, simple databases in our sleep.
If you think this is important, please upvote!
What you request cannot be done in Python 2.whatever, for a very specific reason. You want to write:
class Task(model):
title = ''
isDone = False
In Python 2.anything, whatever model may possibly be, this cannot ever allow you to predict any "ordering" for the two fields, because the semantics of a class statement are:
execute the body, thus preparing a dict
locate the metaclass and run special methods thereof
Whatever the metaclass may be, step 1 has destroyed any predictability of the fields' order.
Therefore, your desired use of positional parameters, in the snippet:
Task('Illustrate different syntax modes', True)
cannot associate the arguments' values with the model's various fields. (Trying to guess by type association -- hoping no two fields ever have the same type -- would be even more horribly unpythonic than your expressed desire to use db.tasklist and db['tasklist'] indifferently and interchangeably).
One of the backwards-incompatible changes in Python 3 was introduced specifically to deal with situations of this ilk. In Python 3, a custom metaclass can define a __prepare__ function which runs before "step 1" in the above simplified list, and this lets it have more control about the class's body. Specifically, quoting PEP 3115...:
__prepare__ returns a dictionary-like object which is used to store
the class member definitions during evaluation of the class body.
In other words, the class body is evaluated as a function block
(just like it is now), except that the local variables dictionary
is replaced by the dictionary returned from __prepare__. This
dictionary object can be a regular dictionary or a custom mapping
type.
...
An example would be a metaclass that
uses information about the
ordering of member declarations to create a C struct. The metaclass
would provide a custom dictionary that simply keeps a record of the
order of insertions.
You don't want to "create a C struct" as in this example, but the order of fields is crucial (to allow the use of positional parameters that you want) and so the custom metaclass (obtained through base model) would have a __prepare__ classmethod returning an ordered dictionary. This removes the specific issue, but, of course, only if you're willing to switch all of your code using this "magic ORM" to Python 3. Would you be?
Once that's settled, the issue is, what database operations do you want to perform, and how. Your example, of course, does not clarify this at all. Is the taskList attribute name special, or should any other attribute assigned to the db object be "autosaved" (by name and, what other characteristic[s]?) and "autoretrieved" upon use? Are there to be ways to remove entities, alter them, locate them (otherwise than by having once been listed in the same attribute of the db object)? How does your sample code know what DB service to use and how to authenticate to it (e.g. by userid and password) if it requires authentication?
The specific tasks you list would not be hard to implement (e.g. on top of Google App Engine's storage service, which does not require authentication nor specification of "what DB service to use"). model's metaclass would introspect the class's fields and generate a GAE Model for the class, the db object would use __setattr__ to set an atexit trigger for storing the final value of an attribute (as an entity in a different kind of Model of course), and __getattr__ to fetch that attribute's info back from storage. Of course without some extra database functionality this all would be pretty useless;-).
Edit: so I did a little prototype (Python 2.6, and based on sqlite) and put it up on http://www.aleax.it/lustdb.zip -- it's a 3K zipfile including 225-lines lustdb.py (too long to post here) and two small test files roughly equivalent to the OP's originals: test0.py is...:
from lustdb import *
class Task(Model):
title = ''
done = False
db.taskList = []
db.taskList.append(Task(title='Beat old sql interfaces', done=False))
db.taskList.append(Task(title='Illustrate different syntax modes', done=True))
and test1.p1 is...:
from lustdb import *
print 'Done tasks:'
for task in db.taskList:
if task.done:
print task
Running test0.py (on a machine with a writable /tmp directory -- i.e., any Unix-y OS, or, on Windows, one on which a mkdir \tmp has been run at any previous time;-) has no output; after that, running test1.py outputs:
Done tasks:
Task(done=True, title=u'Illustrate different syntax modes')
Note that these are vastly less "crazily magical" than the OP's examples, in many ways, such as...:
1. no (expletive delete) redundancy whereby `db.taskList` is a synonym of `db['taskList']`, only the sensible former syntax (attribute-access) is supported
2. no mysterious (and totally crazy) way whereby a `done` attribute magically becomes `isDone` instead midway through the code
3. no mysterious (and utterly batty) way whereby a `print task` arbitrarily (or magically?) picks and prints just one of the attributes of the task
4. no weird gyrations and incantations to allow positional-attributes in lieu of named ones (this one the OP agreed to)
The prototype of course (as prototypes will;-) leaves a lot to be desired in many respects (clarity, documentation, unit tests, optimization, error checking and diagnosis, portability among different back-ends, and especially DB features beyond those implied in the question). The missing DB features are legion (for example, the OP's original examples give no way to identify a "primary key" for a model, or any other kinds of uniqueness constraints, so duplicates can abound; and it only gets worse from there;-). Nevertheless, for 225 lines (190 net of empty lines, comments and docstrings;-), it's not too bad in my biased opinion.
The proper way to continue playing with this project would of course be to initiate a new lustdb open source project on the hosting part of code.google.com (or any other good open source hosting site with issue tracker, wiki, code reviews support, online browsing, DVCS support, etc, etc) - I'd do it myself but I'm close to the limit in terms of number of open source projects I can initiate on code.google.com and don't want to "burn" the last one or two in this way;-).
BTW, the lustdb name for the module is a play of word with the OP's initials (first two letters each of first and last names), in the tradition of awk and friends -- I think it sounds nicely (and most other obvious names such as simpledb and dumbdb are taken;-).
I think you should try ZODB. It is object oriented database designed for storing python objects. Its API is quite close to example you have provided in your question, just take a look at tutorial.
What about using Elixir?
Forget ORM! I like vanilla SQL. The python wrappers like psycopg2 for postgreSQL do automatic type conversion, offer pretty good protection against SQL injection, and are nice and simple.
sql = "SELECT * FROM table WHERE id=%s"
data = (5,)
cursor.execute(sql, data)
The more I think on't the more the Smalltalk model of operation seems more relevant. Indeed the OP may not have reached far enough by using the term "database" to describe a thing which should have no need for naming.
A running Python interpreter has a pile of objects that live in memory. Their inter-relationships can be arbitrarily complex, but namespaces and the "tags" that objects are bound to are very flexible. And as pickle can explicitly serialize arbitrary structures for persistence, it doesn't seem that much of a reach to consider each Python interpreter living in that object space. Why should that object space evaporate with the interpreter's close? Semantically, this could be viewed as an extension of the anydbm tied dictionaries. And since most every thing in Python is dictionary-like, the mechanism is almost already there.
I think this may be the generic model that Alex Martelli was proposing above, it might be nice to say something like:
class Book:
def __init__(self, attributes):
self.attributes = attributes
def __getattr__(....): pass
$ python
>>> import book
>>> my_stuff.library = {'garp':
Book({'author': 'John Irving', 'title': 'The World According to Garp',
'isbn': '0-525-23770-4', 'location': 'kitchen table',
'bookmark': 'page 127'}),
...
}
>>> exit
[sometime next week]
$ python
>>> import my_stuff
>>> print my_stuff.library['garp'].location
'kitchen table'
# or even
>>> for book in my_stuff.library where book.location.contains('kitchen'):
print book.title
I don't know that you'd call the resultant language Python, but it seems like it is not that hard to implement and makes backing store equivalent to active store.
There is a natural tension between the inherent structure imposed - and sometimes desired - by RDBMs and the rather free-form navel-gazing put here, but NoSQLy databases are already approaching the content addressable memory model and probably better approximates how our minds keep track of things. Contrariwise, you wouldn't want to keep all the corporate purchase orders such a storage system, but perhaps you might.
How about you give an example of how "simple" you want your "dealing with database" to be, and I then tell you all the stuff that is needed for that "simplicity" to get working ?
(And of which it will still be YOU that will be required to give the information/config to the database interface engine, somewhere, somehow.)
To name but one example :
If your database management engine is some external machine with which you/your app interfaces over IP or some such, there is no way around the fact that the IP identity of where that database engine is running, will have to be provided by your app's database interface client, somewhere, somehow. Regardless of whether that gets explicitly exposed in the code or not.
I've been busy, here it is, released under LGPL:
http://github.com/lukestanley/lustdb
It uses JSON as it's backend at the moment.
This is not the same codebase Alex Martelli did.
I wanted to make the code more readable and reusable with different
backends and such.
Elsewhere I have been working on object oriented HTML elements
accessable in Python in similar ways, AND a library for making web.py
more minimalist.
I'm thinking of ways of using all 3 elements together with automatic
MVC prototype construction or smart mapping.
While old fashioned text based template web programming will be around
for a while still because of legacy systems and because it doesn't
require any particular library or implementation, I feel soon we'll
have a lot more efficent ways of building robust, prototype friendly
web apps.
Please see the mailing list for those interested.
If you like CherryPy, you might like the complementary ORMs I wrote: GeniuSQL (which follows a Table Data gateway model) and Dejavu (which is a complete Data Mapper).
There's far too much in this question and all its subcomments to address completely, but one thing I wanted to point out was that GeniuSQL and Dejavu have a very robust system for mapping native Python types to the types that your particular backend is using. There are very sane defaults, which can be overridden as needed, and even extended if you make a new backend or use types from a backend that isn't yet supported. See http://www.aminus.net/geniusql/chrome/common/doc/trunk/advanced.html#custom for more discussion on that.

Using Flyweight Pattern in database-driven application

Can anyone please give me any example of situation in a database-driven application where I should use Flyweight pattern?
How can I know that, I should use flyweight pattern at a point in my application?
I have learned flyweight pattern. But not able to understand an appropriate place in my database-driven business applications to use it.
Except for a very specialized database application, the Flyweight might be used by your application, but probably not for any class that represents an entity which is persisted in your database. Flyweight is used when there otherwise might be a need for so many instantiations of a class that if you instantiated one every discrete time you needed it performance would suffer. So instead, you instantiate a much smaller number of them and reuse them for each required instance by just changing data values for each use. This would be useful in a situation where, for example, you might have to instantiate thousands of such classes each second, which is generally not the case for entities persisted in a database.
You should apply any pattern when it naturally suggests itself as a solution to a concrete problem - not go looking for places in your application where you can apply a given pattern.
Flyweight's purpose is to address memory issues, so it only makes sense to apply it after you have profiled an application and determined that you have a ton of identical instances.
Colors and Brushes from the Base Class Library come to mind as examples.
Since a very important part of Flyweight is that the shared implementation is immutable, good candidates in a data-driven application would be what Domain-Driven Design refers to as Value Objects - but it only becomes relevant if you have a lot of identical values.
[Not a DB guy so this is my best guess]
The real bonus to the flyweight pattern is that you can reuse data if you need to; Another example is word processing where ideally you would have an object per "character" in your document, but that wuld eat up way too much memory so the flyweight memory lets you only store one of each unique value that you need.
A second (and perhaps simplest) way to look at it is like object pooling, only you're pooling on a "per-field" level as opposed to a "per-object" level.
In fact, now that i think about it, it's not unlike using a (comparatively small) chunk of memory in c(++) so store some raw data which you do pointer manipulation to get stuff out of.
[See this wikpedia article].

Can I get the instances of alive objects of a certain type in C#?

This is a C# 3.0 question. Can I use reflection or memory management classes provided by .net framework to count the total alive instances of a certain type in the memory?
I can do the same thing using a memory profiler but that requires extra time to dump the memory and involves a third party software. What I want is only to monitor a certain type and I want a light-weighted method which can go easily to unit tests. The purpose to count the alive instances is to ensure I don't have any expected living instances that cause "memory leak".
Thanks.
To do it entirely within the application you could do an instance-counter, but it would need to be explicitly coded and managed inside each class--there's no silver bullet that I'm aware of to let you query the framework from within the executing code to see how many instances are alive.
What you're asking for is really the domain of a profiler. You can purchase one or build your own, but it requires your application to run as a child process of the profiler. Rolling your own isn't an easy undertaking, by the way.
If you want to consider the instance counter it would have to be something like:
public class MyClass : IDisposable
public MyClass()
{
++ClassInstances;
}
public void Dispose()
{
--ClassInstances;
}
private static new object _ClassInstancesLock;
private static int _ClassInstances;
private static int ClassInstances
{
get
{
lock (_ClassInstancesLock)
{
return _ClassInstances
}
}
}
This is just a really rough sample, no tests for compilation; 0% guarantee for thread-safety (critical for this type of approach) and it leaves the door wide open for Dispose to be called, the instance counter to decrement, but for the object not to properly GC. To diagnose that bundle of joy you'll need, you guessed it, a professional profiler--or at least windbg.
Edit: I just noticed the very last line of your question and needed to say that my above approach, as shoddy and failure-prone as it is, is almost guaranteed to deceive and lie to you about the true number of instances if you're experiencing a leak. The best tool, IMO, for attacking these problems is ANTS Memory Profiler. Version 5 is a double-edge in that they broke Performance and Memory profiler into two seperate SKUs (used to be bundled together) but Memory Profiler 5.0 is absolutely lightning fast. Profiling these problems used to be slow as molases, but they've gotten around it somehome.
Unless this is for a personal project with 0 intent of redistribution you should invest the few hundred dollars needed for ANTS--but by all means use it's trial period first. It's a great tool for exactly this kind of analysis.
The only way I see to do this is without any form of instrumentation to use the CLR Profiling API to track object lifetimes. I'm not aware of any APIs available to the managed code to do the same thing, and, so far as I know, CLR doesn't keep the list of live objects anywhere (so even with profiler API you have to create the data structures for that yourself).
VB.NET has a feature where it lets you track objects in debugger, but it actually emits additional code specifically for that (which basically registers all created objects in internal list of weak references). You could do that as well, using e.g. PostSharp to post-process your assemblies.

Resources