Ontology: Create class connections in an ontology - owl

I have an ontology (a taxonomy, with classes only; no individuals) like this:
Owl:Thing
Parts
Jackfruit
Jackfruit_flower
Grape
Plant
Jackfruit_tree
Grapevine
I want to create a relation to connect the two classes Parts and Plant : grows_in which maps each part to a tree. Ultimately, I want to be able to write SparQL queries, for example: Which parts grow_in specific plants.
I read somewhere that using an annotation property is the best way to go about this. But, it seems like annotation properties don't work with reasoners / SparQL querying. It should be noted that the ontology I have is huge (>10,000 classes) and I'm using rdflib in python.
What's the best way I should go about creating relations between the classes (I already have the mappings with me, all that remains to be done is add in the relations) in order to be able to run SparQL queries?

What you should create is an Object Property: grows_in. Object Properties connecte Individuals to Individuals.
And moreover, this Object Property need to have Plants as Domain and Parts as Range.
Annotations properties are clearly not suited for it.
For example, a valable SPARQL Query will be :
PREFIX : <YOU_ONTOLOGY_URL>
SELECT * WHERE {
?parts :grows_in ?plan
}
EDIT: edited following comments

Related

Is (n)Hibernate suitable for a model layer with more than 200 classes arranged in a hierarchy?

My model layer has a pretty huge class hierarchy i.e. there are around 200 classes. The good/bad thing with the hierarchy is that all of them have the same base class (I am not talking about Object class here). The maximum distance between the base and leaf classes is 7 and the maximum number classes at any level in hierarchy is 80. I am using nHibernate to save/load data from persistent storage.
Problem The queries generated by nHibernate are pretty in efficient. The reason being that it tries to join tables that are not really needed in the query?
Has anyone used (n)Hibernate with such class hierarchy?
Please refer to NHibernate: Load base class objects only to see a specific example of one my problems.
NHibernate has to resort to that monster of a query because it did not know exactly which table contains the needed data. For example, with session.Get<Vehicle>(100), NHibernate does not know which table contains the data for the vehicle with ID 100, so it will have to join Vehicle, Car, Truck, Bicycle together. But with session.Get<Truck>(100), NHibernate know that data can only resided in Truck table, so the query will be much more efficient, with only one join: Vehicle and Truck.
So, if you cannot change your database schema, then I think you have only one option, namely trying to be as specific as possible when querying by always give NHibernate the exact type of entity you need.
If you can change your schema, then I think you should try to simplify the model classes, avoid such a large inheritance tree. NHibernate provides many other options to connect classes together.
Other option is to use One-to-one instead of joined-subclass to separate information of one domain model class into multiple classes without forcing them to inherit from each other. I have encountered the same problem with a much smaller inheritance tree (about 20 classes out of 150 classes share the same base class, and mapped using joined-subclass). When the family reaches 20-strong, I'm stop adding more class into the hierarchy if that's possible, and try to use One-to-One instead.

Database design rules to follow for a programmer

We are working on a mapping application that uses Google Maps API to display points on a map. All points are currently fetched from a MySQL database (holding some 5M + records). Currently all entities are stored in separate tables with attributes representing individual properties.
This presents following problems:
Every time there's a new property we have to make changes in the database, application code and the front-end. This is all fine but some properties have to be added for all entities so that's when it becomes a nightmare to go through 50+ different tables and add new properties.
There's no way to find all entities which share any given property e.g. no way to find all schools/colleges or universities that have a geography dept (without querying schools,uni's and colleges separately).
Removing a property is equally painful.
No standards for defining properties in individual tables. Same property can exist with different name or data type in another table.
No way to link or group points based on their properties (somehow related to point 2).
We are thinking to redesign the whole database but without DBA's help and lack of professional DB design experience we are really struggling.
Another problem we're facing with the new design is that there are lot of shared attributes/properties between entities.
For example:
An entity called "university" has 100+ attributes. Other entities (e.g. hospitals,banks,etc) share quite a few attributes with universities for example atm machines, parking, cafeteria etc etc.
We dont really want to have properties in separate table [and then linking them back to entities w/ foreign keys] as it will require us adding/removing manually. Also generalizing properties will results in groups containing 50+ attributes. Not all records (i.e. entities) require those properties.
So with keeping that in mind here's what we are thinking about the new design:
Have separate tables for each entity containing some basic info e.g. id,name,etc etc.
Have 2 tables attribute type and attribute to store properties information.
Link each entity (or a table if you like) to attribute using a many-to-many relation.
Store addresses in different table called addresses link entities via foreign keys.
We think this will allow us to be more flexible when adding, removing or querying on attributes.
This design, however, will result in increased number of joins when fetching data e.g.to display all "attributes" for a given university we might have a query with 20+ joins to fetch all related attributes in a single row.
We desperately need to know some opinions or possible flaws in this design approach.
Thanks for your time.
In trying to generalize your question without more specific examples, it's hard to truly critique your approach. If you'd like some more in depth analysis, try whipping up an ER diagram.
If your data model is changing so much that you're constantly adding/removing properties and many of these properties overlap, you might be better off using EAV.
Otherwise, if you want to maintain a relational approach but are finding a lot of overlap with properties, you can analyze the entities and look for abstractions that link to them.
Ex) My Db has Puppies, Kittens, and Walruses all with a hasFur and furColor attribute. Remove those attributes from the 3 tables and create a FurryAnimal table that links to each of those 3.
Of course, the simplest answer is to not touch the data model. Instead, create Views on the underlying tables that you can use to address (5), (4) and (2)
1 cannot be an issue. There is one place where your objects are defined. Everything else is generated/derived from that. Just refactor your code until this is the case.
2 is solved by having a metamodel, where you describe which properties are where. This is probably needed for 1 too.
You might want to totally avoid the problem by programming this in Smalltalk with Seaside on a Gemstone object oriented database. Then you can just have objects with collections and don't need so many joins.

What does inheritance in UML do with your ERD?

I created a class diagram for a system and now I have to model it into a real system. This means converting it to a database.
Now there is a base class which has just a few attributes, but there are many classes that inherit from it. Now my checklist for converting says I have to create a table for every class.
I don't know how to handle the inheritance, I can see that associations are done with PK and FK's but what about subclasses?
Is there some article which handles that or is there someone who can explain it to me?
Thanks in advance,
You have three alternatives to translate class hierarchies into relational tables:
- Create only a table for the superclass (all attributes and associations of subclasses are moved to the table corresponding to the superclass with the possibility of taking a NULL value)
- Create only tables for the subclasses: All attributes and associations of the superclass are repeated in each subclass
- Create tables both for the superclass and for each of the subclasses. In this case, the PK of the subclasses is at the same table a FK to the superclass (this ensures that all identifiers in a subclass table correspond to an existing identifier in the superclass table. A join between both tables allows to recover the full information of the element)
The best strategy depends on the problem (for instance, the number of attributes in each class, the number of levels in hierarchy, whether the hierarchy is disjoint or not,...)
If you want to see some examples, you can upload your hierarchy to the UMLtoDB online service http://modeling-languages.com/content/uml2db-full-code-generation-sql-scripts-databases
Drop all that UML nonsense - keep it simple. Its just amounts to duplication for no gain. Does Microsoft or Sun publish UML for dOT NET or Java... FOrgetting the odd sample, the majority of these frameworks dont have any official UML anywhere.
Usually, you design your datamodel (database tables/PK/FK etc.,) in parallel when you design your actual class diagram. After identifying all the cadidate classes and the dependencies on each of the classes, you will probably go on with the design sequence diagram. By this time, your data model should have been finalized.
I cannot understand your situation here, but IMO the process that you follow seems a bad idea to me.

Singular data-keys between application and database?

Is there a paradigm in which I can change a data-key name in one place and one place only, and have it properly be dealt with by both the application and database?
I have resorted most recently to using class constants to map to database field names, but
I still have to keep those aligned with the raw database keys.
What I mean is, using PHP as an example, right now I might use
$infoToUpdateUser[ User::FIELD_FIRST_NAME ]
This means that when I change it at the constant, I don't have to search through the code to change all references to that field.
Another area this crops up in is in referencing fields. Due to some early poor design decisions, I have, for example, these sorts of tables:
( table name : primary_key )
cats : cat_id
dogs : dog_id
parrots : bird_id (remember, poor design, thus the mismatch between parrots / bird_id)
lizards: lizard_id
etc
Then let's say I have a series of form classes that update records.
AnimalForm
DogForm extends AnimalForm
CatForm extends AnimalForm
ParrotForm extends AnimalForm
etc
Now I want to update a record in the SQL database using an update function in the parent class, AnimalForm, so I don't have to replicate code in 20 subclasses.
However I do not know of a way to generalize the update query, so currently each subclass has an idFieldName member variable, and the parent class inserts that into the query, like
"UPDATE " . $this->table . " SET <data> WHERE " . $this->idFieldName
It seems sloppy to do it this way but I can't think of a better solution at this point.
Is there a design model or paradigm that links together or abstracts data-key names to be shared as a reference by both a database and an application?
What you are looking for is called an Object-Relational Mapping layer.
An ORM separates the concerns of data access from business logic by mapping a relational database into an object model. Since the ORM does all the translation, if you change the name of a database table or column, you only have to tell the ORM once, and it will properly apply that change to all of your code.
Since you indicate that you are using PHP, here is a question that addresses ORM libraries in PHP. Additional information about ORM technologies can be found in Wikipedia.

How to get my SQL DB to match my Domain Driven Design

Okay, I'll be straight with you guys: I'm not sure exactly how Domain Driven my Design is, but I did start by building Model objects and ignoring the persistence layer altogether. Now I'm having difficulty deciding the best way to build my tables in SQL Server to match the models.
I'm building a web application in ASP.NET MVC, although I don't think the platform matters that much. I have the following object model hierarchy:
Property - has properties such as Address and Postcode
which have one or more
Case - inherits from PropertyObject
Quote - inherits from PropertyObject
which have one or more
Message - simple class that has properties Reference, Text and SentDate
Case and Quote have a lot of similar properties, so I also have a PropertyObject abstract base class that they inherit from. So Property has an Items property of type List which can contain both Case and Quote objects.
So essentially, I can have a Property that has a few Quotes and Cases and a load of Messages that can belong to either of those.
A PropertyObject has a Reference property (and therefore so do Quote and Case) so any Message object can be related back to a Quote OR Case by it's Reference property.
I'm thinking of using the Entity Framework to get my Models in and out of the database.
My initial thoughts were to have four tables: Property, Case, Quote and Message.
They'd all have their own sequential IDs, and the Case and Quote would be related back to Property by a PropertyID field.
The only way I can think of to relate a Message table back to the Case and Quote tables is to have both a RelationID and RelationType field, but there's no obvious way to tell SQL server how that relationship works, so I won't have any referential integrity.
Any ideas, suggestions, help?
Thanks,
Anthony
I am assuming Property doesn't also inherit from PropertyObject.
Given that these tables, Property, Case, Quote and Message, leads to a Table per Concrete Class or TPC inheritance strategy, which I generally don't recommend.
My recommendation is that you use either:
Table per Hierarchy or TPH - Case and Quote are stored in the same table with one column used as a discriminator, with nullable columns for properties that are not shared.
Table per Type or TPT - add a PropertyObject table with the shared fields and Case and Quote tables with just the extra fields for those types
Both of these strategies will allow you to maintain referential integrity and are supported by most ORMs.
see this for more: Tip 12 - How to choose an inheritance strategy
Hope this helps
Alex
Ahhh... Abstraction.
The trick with DDD is to recognize that abstraction is not always your friend. In some cases, too much abstraction leads to a too-complex relational model.
You don't always need inheritance. Indeed, the major purpose of inheritance is to reuse code. Reusing a structure can be important, but less so.
You have a prominent is-a pair of relationships: Case IS-A Property and Quote IS-A Property.
You have several ways to implement class hierarchies and "is-a" relationships.
As you've suggested with type discriminators to show which subclass this really is. This works when you often have to produce a union of the various subclasses. If you need all properties -- a union of CaseProperty and QuoteProperty, then this can work out.
You do not have to rely on inheritance; you can have disjoint tables for each set of relationships. CaseProperty and QuoteProperty. You'd have CaseMessage and QuoteMessage also, to follow the distinction forward.
You can have common features in a common table, and separate features in a separate table, and do a join to reconstruct a single object. So you might have a Property table with common features of all properties, plus CaseProperty and QuoteProperty with unique features of each subclass of Property. This is similar to what you're proposing with Case and Quote having foreign keys to Property.
You can flatten a polymorphic class hierarchy into a single table and use a type discriminator and NULL's. A master Property table has type discriminator for Case and Quote. Attributes of Case are nulled for rows that are supposed to be a Quote. Similarly, attributes of Quote are nulled for rows that are supposed to be a Case.
Your question "[how] to relate a Message table back to the Case and Quote tables" stems from a polymorphic set of subclases. In this case, the best solution might be this.
Message has an FK reference to Property.
Property has a type discriminator to separate Quote from Case. The Quote and Case class definitions both map to Property, but rely on a type discriminator, and (usually) different sets of columns.
The point is that the responsibility for Property, CaseProperty and QuoteProperty belongs to that class hierarchy, and not Message.
This is where the DDD concept of Services would come in. The Repository for each of your concrete classes only persist that entity, not the related objects.
So you have Property(), and is the base for your CaseProperty() : Property(). This special-entity is accessed via CasePropertyService(). Within here is where you would do your JOINs and such to the related tables in order to generate your CaseProperty() special entity (which is not really Case() and Property on its own, but a combination).
OT: Due to limitation of .net of where you can't inherit multiple classes, this is my work around. DDD is meant to be a guideline to the overall understanding of your domain. I often give my DDD outline to friends, and have them try to figure out what it does/represent. If it looks clean and they figure it out, it's clean. If your friends look at it and say, "I have no idea what you are trying to persist here." then go back to the drawing board.
But, there's a catch about using any ORM to persist storage of DDD objects (linq, EntityFramework, etc). Have a look at my answer over here:
Stackoverflow: Question about Repositories and their Save methods for domain objects
The catch is all objects must have an identity in the database for ORM. So, this helps you plan your DB structure.
I have recently moved away from using ORM to control direct access, and just have a clean DDD layer. I let my repositories and services control access to the DB layer, and use Velocity to entity-cache my objects. This actually works very well for: 1) DB performance, you design however is most efficient not being coupled to your DOmain objects with direct ORM representation, and 2) your domain model becomes much cleaner with no forced identies on Value Objects and such. Free!

Resources