How to restrict relations in an OWL ontology - data-modeling

I have what I think should be a common problem in OWL ontology design, but it proves a bit hard to get direct advice on this.
Summary: I need to restrict a ternary relation of 3 taxonomically rich class structures so that the range of the relation is dependent on the respective domain. And I don't absolutely have to model this as schema, I just want to store the information "this subclass is connected to that one" somehow. I see as best options object property restrictions or storing the information as instances, which directly realize the relations I need as instances of a general relation.
Situation:
I have a ternary relation that I modeled by an additional class
There are two taxonomically rather rich classes A,B with respectively many subclasses and a third class C with some subclasses that connects to the relationship between subclasses of A and B
I introduced class D which connects to subclasses of A,B,C
An obfuscated example is: A are herbivores, B are carnivores and C are climatological settings. Now I want to model that certain carnivores hunt certain herbivores and in addition, certain climatological settings affect the hunting behaviour. But not all climatological settings affect all pairs (a,b), so there's interestig information to be stored
Since I can't point from climatological settings directly to a pair (a,b), I introduce D, the class of hunting habits, which relate to carnivores as hunting participants, to herbivores as hunted participants and to climatological settings as modifying setting.
Problem:
I now have a relation from A to D which is general, yet I don't want every hunting habit (pair of hunting and hunted) to be affected by the same climatological settings
For example: Maybe I want rain to affect only the hunting habits between macroscopic animals, or I want volcano eruptions to only affect animals that live near volcanoes
Solution Options:
introduce object property restrictions, which are like virtual superclasses (e.g. the class of animals that participate in hunting habits which are affected by volcano eruptions)
directly introduce all the information in the instance level: create instances of all the animals I want to make assertions about, then also create instances of the climatological settings and hunting habits, then link those instances together
create a bunch of sub-relations that have domain and range the specific animals, hunting habits and climatological settings
I suspect the answer will be "It depends". In that case, I am really thankful to understand some of the advantages and disadvantages of the options. Of course there might be a really good solution that I am missing.

Related

How to represent in UML deletion consequences (nullify or cascading) in non-(whole / part) relations?

I think we agree that there is a correspondance between composition and delete cascading on one side and aggregation and nullify on delete on the other, in case we delete the whole instance in a whole / part relationship.
But what if there is no whole / part relationship between two classes:
I understand that we can only use composition and aggregation in cases where the whole / part hierarchy occurs: Car - Wheels, Apartment - Rooms and not in cases where this hierarchy does not occurs (e.g. Car - Driver classes).
So, how should we represent in UML this situation where there are deletion consequences in the database (nullify or cascading) but no "whole / part" relation?
Do we agree on the initial assumption?
The UML literature frequently refers to part-whole relationships regarding aggregation/composition. However, the definitions in the UML standard have evolved (see UML 2.5.1):
Sometimes a Property is used to model circumstances in which one instance is used to group together a set of instances; this is called aggregation. (...)
Shared: Indicates that the Property has shared aggregation semantics. Precise semantics of shared aggregation varies by application area and modeler.
Composite: Indicates that the Property is aggregated compositely, i.e., the composite object has responsibility for the existence and storage of the composed objects.
Composite aggregation is a strong form of aggregation that requires a part object be included in at most one composite object at a time. If a composite object is deleted, all of its part instances that are objects are deleted with it.
In other words, there is no precise semantic specified for the "aggregation" (i.e. shared aggregation) that would make a difference from a simple association: shared aggregation is a modeling placebo.
The relationship between database constraints and UML modeling are therefore not as straightforward as you would assume.
Close match?
Moreover, there is no general one-to-one mapping between a database schema and an UML model. More than one database schema could be used to implement the same UML class diagram. And conversely, more than one UML diagram may represent the design that is implemented by a given database schema. So the best we can do here, is to consider close-matches.
In your database, the table with the FOREIGN KEY constraint would correspond to a potential component in a composition, or an element of a shared aggregation, or an associated instance in a simple association :
a ON DELETE CASCADE could help to implement a composite aggregation: it's the only way in SQL to implement the kind of lifecycle management that you would expect in a composition: the components would be deleted when the composite is. It could as well implement an ordinary association, if some business rules/contracts (e.g. UML post conditions) would require such a related deletion.
a ON DELETE SET NULL could help to implement a shared aggregation, if its smeantics would be defined as you mean: if the aggregate is deleted, its elements would not be deleted, and could therefore be shared. But it could as well implement any ordinary association, since the deletion of an associated instance would not trigger a deletion either and the constraint would allow to maintain a clean referential integrity.
I agree, that composition means cascading delete, because according to UML the whole is responsible for the existence of the parts. A normal association means, you can delete any object without affecting any other objects that might have a link to it. UML doesn't define semantics for aggregation, so they will behave in the same way. But even if we take into account domain specific semantics for aggregation, I don't think there are examples where this is changed.
However, if you have an association with a multiplicity of 1 on one end, you cannot delete the object on this end, because the objects that have been linked to it would be invalid afterwards. This has nothing to do with composition or aggregation.
So, the remaining question is, how to express cascading delete if there is no whole-part relationship? Are there really examples where this happens? I don't see that Car - Driver, could not be in a whole-part relationship. Please bear in mind, that we are not talking about real cars or real people. We are talking about a software system that we want to represent knowledge about the real world for a specific purpose. And if the purpose is to issue boarding cards for cars and their drivers on a ferry, it makes perfect sense to view them as a composition.

Ontology Modelling: Object Property or Data Property?

I am modeling an ontology that should be used to represent knowledge about restaurants, their served dishes, prices and cuisines types.
One of the functionalities of this system will be to allow users browsing for places to eat some specific kind of Dishes or to search restaurants that are specialized in some cuisines.
Given that in mind, I have modeled the first version of my ontology, but I question appeared.
To represent the specialty of a Restaurant: (a) should I do it as an Object Property, having a class Cuisine, or (b) just as a data property, i.e. being a simple attribute of the Restaurant Class?
Which are the implications of choosing a or b?
In principle, the purpose of an ontology is to describe knowledge about a certain topic. An ontology should partially answer the question "What is a [NameOfTheConceptYouWantToDefine]?". In OWL, the question is answered by providing categories (OWL classes) and binary relations between objects of the categories (OWL individuals) or between an object and a data value (literals). For instance, ask yourself the question "what is an ingredient?". If your answer is "an ingredient is a finite sequence of unicode characters", then you'll need a datatype property to relate something to an ingredient. If you believe that an ingredient is a date or a number, same.
However, if you think an ingredient is an entity that cannot be digitally encoded in a data structure, then you may need a specific class for it, and object properties to relate things to it.
However, ontologies may also be used as a guide to data structures about the things you describe. Sometimes, it is convenient to use a character string as a description of the thing rather than talk about the thing itself. For instance, one may use a string to describe the ingredients of a recipe. This string should not be confused with the ingredient itself. To make this distinction explicit, you can use datatype properties with a clear name like ex:ingredientDescription.
Now, ask yourself "what is a cuisine?". Is it a string, a number, a date? Do you need to describe further the cuisines or do you just need a string-based cuisine description?
The motto of Semantic Web is “Things, not strings”. This is what makes RDF/RDFS/OWL different from other modelling frameworks.
In OWL 2, object properties might possess different characteristics, while data properties mightn't:
Also, data properties can not be parts of property chains. All these restrictions are due to decidability reasons.
There exist quite a few cuisines, they can have their own attributes (at least, detailed descriptions) and relations, so I'd suggest to use object properties.

Data Modeling for consumer goods

A company is trying to build a system that breaks down consumer goods (soft drinks, detergents, beauty products, etc.) down to the very basic components. The aim is to be able to break down all the characteristics of a product into as many enumerable quantities as possible. For instance, a soft drink will have the properties flavor, calories, color, cost, etc. Do note that the products will come from a huge variety of segments and not all properties will be applicable to all products (detergents don't have calories) and similarly sounding properties are not similar (detergents with a lime fragrance is different from a lime flavored soft drink). Also, search is expected to be fast and the database needs to understand relationships between products. Suggest only a data model for the same.
The feature you highlight, that not all properties describe all products, is a classic feature of a class/subclass situation. Or, if you prefer, type/subtype.
Dealing with just that feature of the problem, I'm going to call your attention to the EER (Extended Entity Relationship) model if you want to model your understanding of the subject matter. The EER has a way of depicting what it calls a generalization/specialization pattern. That's a good search term to find detailed descriptions of it. This will adequately depict what you've said you're after.
A word of caution, however. The majority of ER models you'll see here in SO are design models, not conceptual models. That is, they reflect the intent of designing tables made up of columns and rows, with keys and foreign keys, to contain the relevant data.
What I'm recommending is the EER model for a very different purpose. It's to depict the way the data looks to the subject matter expert, not the way the data looks to the database designer. That distinction is lost on those who have never learned the difference between analysis and design.
If your project is a major one, it's worth spending an appropriate amount of time on a detailed analysis of the subject matter before moving on to design. Understanding the problem before you try to solve it is key to successful work on big projects.
Once you have a good conceptual model that captures the analysis, the choice of a data model to reflect the design will depend on what kind of database you've decided to build. It might be relational, it might be multidimensional, it might be unstructured. It depends. The analysis, however, will be more useful if it's implementation independent.

Difference between total specialization and disjoint rule in dbms?

Both are looking same to me. i am not getting the exact deference. i search on different forums and sites but not getting clear. what is the difference between them?
The exact difference is as follows. You have to first separate the total/partial participation constraints to understand this better and we'll take them into account later on.
Disjoint Constraint
Any instance can map to at most one subclass. Not more than that. e.g Bank Account can be either 'Savings Account' or 'Current Account' not both. So when the database is operational, every given instance will be mapped to exactly one subclass defined under the super class. Another example would a meal will be mapped to either Veg or Non-veg..It can't be both.
Partial Constraint
Any instance may or may not map to multiple sub classes of a given super class. This usually happens when an instance play multiple roles and not limited to a single one. e.g Employee may map to either supervisor, manager or both. This means an employee can play both the roles of a manager and a supervisor. Another example would be a musician who maybe mapping to either violin player, guitar player, flutist,saxophonist or all of them.
Note:
So when you specify an 'ISA' relationship your subclasses may behave in either disjoint way or overlap way.. They can't be both, meaning that Disjoint is the exact opposite of Overlap constraint.
Now let's focus on Total and Partial constraints.
Regardless of the overlapping/disjoint constraints, total/partial mean 'do all the instances support the specialization?' question. So when the database is operational and if your ISA relationship is total, any instance coming will be directed to one of the sub classes and nothing will stay in the super class. Conversely, if it's partial, some instances may not have an appropriate subclass so they will stay in the super class.
This brings up the interesting notions as follows.
Total-Disjoint- All the instances coming, will map to one subclass only and will not be shared among other subclasses
Partial-Disjoint- All instances coming, may stay with superclass or map to one of the sub classes only
Total-Overlap- All instances coming, will map to multiple subclasses.
Partial-Overlap- All instances coming, may stay with the super class or map to multiple sub classes.
Total Participation vs Partial Participation.
In total participation, patient must be an outpatient or resident patient, it can not simply be the superclass patient type. Partial participation allows you to have a patient be just a patient.
Under Total specialization, there can be no entities that are of a superclass but are not of any of the subclasses. This is represented by the double line drawn from patient
Disjoint means a subclass type patient can be either an outpatient or resident patient but not both. Subclasses can only be one subclass from the superclass not both.
So in both diagrams a patient must be a one of the subclasses, but the disjoint means it can not be both subclasses.
When you use a total specialization, in the example shown, a patient must be either a outpatient or a resident patient, which means that all patients needs to be one of the sub types(outpatient or resident).
The disjoint rule is different in the way that a patient needs to be in only one subtype.
Basically, as I understand, the difference is that the total specialization says a super type needs to be in a sub type and the disjoint says it need to be in only one sub type.
I hope this helps.
There is a link that you can read about all these types and rules:
http://www.tomjewett.com/dbdesign/dbdesign.php?page=subclass.php
There are 2 different decisions {Total participation vs Partial participation} and {Disjoint vs Overlap}.
Participation -> ∈ {subclass1,..,subclassN} vs ∈ {subclass1,..,subclassN, superclass}
Subclass Type -> ∈ {S1 xor S2 xor ... xor SN} vs ∈ {S1 or S2 or ... or SN}
where S is the entities.

How to get my SQL DB to match my Domain Driven Design

Okay, I'll be straight with you guys: I'm not sure exactly how Domain Driven my Design is, but I did start by building Model objects and ignoring the persistence layer altogether. Now I'm having difficulty deciding the best way to build my tables in SQL Server to match the models.
I'm building a web application in ASP.NET MVC, although I don't think the platform matters that much. I have the following object model hierarchy:
Property - has properties such as Address and Postcode
which have one or more
Case - inherits from PropertyObject
Quote - inherits from PropertyObject
which have one or more
Message - simple class that has properties Reference, Text and SentDate
Case and Quote have a lot of similar properties, so I also have a PropertyObject abstract base class that they inherit from. So Property has an Items property of type List which can contain both Case and Quote objects.
So essentially, I can have a Property that has a few Quotes and Cases and a load of Messages that can belong to either of those.
A PropertyObject has a Reference property (and therefore so do Quote and Case) so any Message object can be related back to a Quote OR Case by it's Reference property.
I'm thinking of using the Entity Framework to get my Models in and out of the database.
My initial thoughts were to have four tables: Property, Case, Quote and Message.
They'd all have their own sequential IDs, and the Case and Quote would be related back to Property by a PropertyID field.
The only way I can think of to relate a Message table back to the Case and Quote tables is to have both a RelationID and RelationType field, but there's no obvious way to tell SQL server how that relationship works, so I won't have any referential integrity.
Any ideas, suggestions, help?
Thanks,
Anthony
I am assuming Property doesn't also inherit from PropertyObject.
Given that these tables, Property, Case, Quote and Message, leads to a Table per Concrete Class or TPC inheritance strategy, which I generally don't recommend.
My recommendation is that you use either:
Table per Hierarchy or TPH - Case and Quote are stored in the same table with one column used as a discriminator, with nullable columns for properties that are not shared.
Table per Type or TPT - add a PropertyObject table with the shared fields and Case and Quote tables with just the extra fields for those types
Both of these strategies will allow you to maintain referential integrity and are supported by most ORMs.
see this for more: Tip 12 - How to choose an inheritance strategy
Hope this helps
Alex
Ahhh... Abstraction.
The trick with DDD is to recognize that abstraction is not always your friend. In some cases, too much abstraction leads to a too-complex relational model.
You don't always need inheritance. Indeed, the major purpose of inheritance is to reuse code. Reusing a structure can be important, but less so.
You have a prominent is-a pair of relationships: Case IS-A Property and Quote IS-A Property.
You have several ways to implement class hierarchies and "is-a" relationships.
As you've suggested with type discriminators to show which subclass this really is. This works when you often have to produce a union of the various subclasses. If you need all properties -- a union of CaseProperty and QuoteProperty, then this can work out.
You do not have to rely on inheritance; you can have disjoint tables for each set of relationships. CaseProperty and QuoteProperty. You'd have CaseMessage and QuoteMessage also, to follow the distinction forward.
You can have common features in a common table, and separate features in a separate table, and do a join to reconstruct a single object. So you might have a Property table with common features of all properties, plus CaseProperty and QuoteProperty with unique features of each subclass of Property. This is similar to what you're proposing with Case and Quote having foreign keys to Property.
You can flatten a polymorphic class hierarchy into a single table and use a type discriminator and NULL's. A master Property table has type discriminator for Case and Quote. Attributes of Case are nulled for rows that are supposed to be a Quote. Similarly, attributes of Quote are nulled for rows that are supposed to be a Case.
Your question "[how] to relate a Message table back to the Case and Quote tables" stems from a polymorphic set of subclases. In this case, the best solution might be this.
Message has an FK reference to Property.
Property has a type discriminator to separate Quote from Case. The Quote and Case class definitions both map to Property, but rely on a type discriminator, and (usually) different sets of columns.
The point is that the responsibility for Property, CaseProperty and QuoteProperty belongs to that class hierarchy, and not Message.
This is where the DDD concept of Services would come in. The Repository for each of your concrete classes only persist that entity, not the related objects.
So you have Property(), and is the base for your CaseProperty() : Property(). This special-entity is accessed via CasePropertyService(). Within here is where you would do your JOINs and such to the related tables in order to generate your CaseProperty() special entity (which is not really Case() and Property on its own, but a combination).
OT: Due to limitation of .net of where you can't inherit multiple classes, this is my work around. DDD is meant to be a guideline to the overall understanding of your domain. I often give my DDD outline to friends, and have them try to figure out what it does/represent. If it looks clean and they figure it out, it's clean. If your friends look at it and say, "I have no idea what you are trying to persist here." then go back to the drawing board.
But, there's a catch about using any ORM to persist storage of DDD objects (linq, EntityFramework, etc). Have a look at my answer over here:
Stackoverflow: Question about Repositories and their Save methods for domain objects
The catch is all objects must have an identity in the database for ORM. So, this helps you plan your DB structure.
I have recently moved away from using ORM to control direct access, and just have a clean DDD layer. I let my repositories and services control access to the DB layer, and use Velocity to entity-cache my objects. This actually works very well for: 1) DB performance, you design however is most efficient not being coupled to your DOmain objects with direct ORM representation, and 2) your domain model becomes much cleaner with no forced identies on Value Objects and such. Free!

Resources