I am new to ontology. After some study, I still do not know what is ontology advantage in application.
I already know ontology can provide more meaningful querying interface than database, and ontology can use reasoner to find hidden info to get better result.
But.
With building a bool table in database to represent new concept for each instance, or simple if-else rule engine. We can get same result as ontology with better performance.
So, what is the most important reason of using ontology in application exactly?
Please refer to Databases vs Ontologies by Ian Horrocks
In short:
Databases has closed world assumption, ontologies has open world
assumption
In databases each individual has a single unique name, but in ontologies individuals might have more than one name
You can infer implicit information from ontologies, in databases you can't.
The schema in an ontology is large and complex but databases have simple and smaller schema. In other words, The focus on formal semantics is much stronger in ontologies than in databases. Because the aim of ontologies is to represent meaning rather than data. Please refer to Ontologies and DB Schema: What's the Difference by Mike Uschold
Related
As part of my research, I have been working on a DMN table and OWl ontology. I must map rules from the DMN table into an OCQA web ontology file and work later with SHACL to infer some rules.
In this process, as an initial step recently, my supervisor gave me a hint to map the DMN table with OWL.
His message:
"``https://www.omg.org/spec/DMN/1.2/About-DMN
You find an rdf specification on the bottom of the page.
First, you have to define the dmn schema in owl. Then you can directly define or import a dmn table into owl. Or you use the rdf schema."
I have very little idea about semantic web technology. So I studied some documents or tutorials and understood that classes, individuals, and relationships (object properties, data prop) made an ontology. Like this, I did practice an example development in Protege software also.
After his reply, I started to read what is an rdf scheme or OWL schema. But unfortunately, I didn't go further.
Can someone help me to understand the schema or help me define a relevant schema for the DMN table, as my supervisor suggested?
What should I do to define a schema? Is this nothing but making classes and individuals with relations or creating some placeholders and then getting them from the mapping of DMN? I am clueless. Should I use RDFPY library to do all these things?
"Stackoverflow is not supporting uploading DMN file. it is nothing but data in XML. below is the picture"
enter image description here DMN table for quality checks of a concrete curing method
Please, someone, help me to go further.
I would suggest making your schema in Protege. Your schema (ontology) will have a set of classes and properties between. Some ontologies will have named individuals, some will separate the named individuals out of the ontology. This is generally A-Box vs T-Box.
Personally, I use rdflib for python development but performance isn't great. For better performance you can use the redland python bindings - but it's a pain in the ass to work with.
As for your table... This is mostly an exercise in mapping from a relational data structure to a graph data structure. Each row represents a node of type 'Inspection' and has relations to the columns. For example consider the pseudo rdf,
Inspection_A rdf:type Inspection
Inspection_A hasActivity "Curing"
Inspection_A hasApprover "Site Engineer"
tldr; Your ontolgoy should have classes and relations (T-Box). Then in a separate file create instances of the classes (A-Box). T-Boxes will most likely be .owl files while the A-Box is turtle, nquads, json-ld, etc. When you load the two together in a graph database, it should be able to take the definitions in your T-Box and reason over your A-Box. You can use rdflib for programatically working with RDF. Your task is to map a table to a graph. The columns are relations, and table name is a class.
My question is that is it mandatory to follow any ontology methodology while developing an ontology?
As per my understandings:
You can develop an ontology without following any specific methodology
You can strictly follow an ontology methodology according to your need/context of your ontology/project.
Instead of strictly following, you can partially/ loosely follow an ontology methodology according to your need/context of your ontology/project
You can even merge steps from multiple ontologies according to your need/context of your ontology/project.
One cannot say that one methodology (i.e NeOn Methodology) is better than another one. you can select any methodology according to your need.
Ontology Development Guidelines and Methodology are same things.
Please comment/guide me point by point. Thanks.
Here are some examples of scientific papers about ontologies construction. These are not "mandatory" but good guidelines indeed.
Kolas, D., Dean, M., & Hebeler, J. (2006). Geospatial Semantic Web:
Architecture of Ontologies (p. 1‑10). IEEE.
https://doi.org/10.1109/AERO.2006.1656068
Denaux, R., Dolbear, C., Hart, G., Dimitrova, V., & Cohn, A. G.
(2011). Supporting domain experts to construct conceptual ontologies:
A holistic approach. Web Semantics: Science, Services and Agents on
the World Wide Web, 9, 113‑127.
https://doi.org/10.1016/j.websem.2011.02.001
Tan, H., Adlemo, A., Tarasov, V., & Johansson, M. E. (2017).
Evaluation of an Application Ontology. In CEUR Workshop Proceedings
(Vol. 2050). Bolzano, Italy.
There is nothing mandatory about how to develop an ontology. However, people have found pitfalls and repeating patterns, hence some methodologies have been developed.
Which one is best for your objectives is very dependent on your objectives. There can be no absolute, general rule.
It depends on the user which method of ontology the user wants to implements. As long as theres no duplication and the data quality is maintained its good to produce and implement.
As the general motivation for using ontologies is to eliminate differences in the meanings of terminologies among different stakeholders, its a good practice to follow some proven ontology development methodologies while developing the ontology to eliminate ontological inadequacies. One way to validate ontological adequacy of taxonomic relationships is by applying the OntoClean methodology, which was one of the first attempts to formalize notions of ontological analysis for information systems. It is based on the general ontological notions drawn from philosophy, like essence, identity, unity, rigidity, and dependence, used to characterize relevant aspects of the intended meaning of the properties, classes, and relations that make up an ontology, and imposing constraints on the taxonomic structure of an ontology.
A common mistake while developing ontologies is the 'misuse' of IS_A relation, commonly known as the IS_A overloading problem, which can be detected and prevented by applying the OntoClean method.
So yes, its a good practice to follow ontology validating methodologies such as OntoClean for validating the ontological adequacy of taxonomic relationships.
My question is about subsumption in databases (versus ontologies). My understanding says that if I have instances that belong to Class B, then Class A which is the superclass of Class B will also have these instances.
Ontologies provide in-built subsumption inference through the various reasoners (e.g. RDFS++, Pellet, etc.). I would like to know if it is possible to achieve a similar task in database systems. If so, how flexible or easy is it to implement? Are there any advantages of the database implementation (if any) over the ontology-based approach?
To clarify, an ontology doesn't perform reasoning. An ontology is the set of logical axioms that a reasoner uses to answer queries with the inferred (and explicit) information in your knowledge base.
There are a number of existing open-source and commercial systems which perform reasoning and could be considered a database as opposed to something that is purely for reasoning, like Pellet/Fact/Hermit. Examples include AllegroGraph, GraphDb, and Stardog. So obviously, yes, it's possible. There are a couple different ways to approach the implementation, so you have some flexility on how to design the system based on your preferred use case.
It's not hard to create a toy reasoner that will parse an ontology and do some basic reasoning like subsumption. But if you want to support (correctly) one of the OWL fragments, and you want to do it at scale, it's not easy.
Go look at how Jena and its reasoners are implemented, that will be enough to get you going.
Sesame also has a RDFS reasoner, so that would be another source for you to review.
The RDF is defined as representing the information related to the semantic web and for the information exchange on web. But it is also widely used as the database. So, what exactly the RDF is about ?
The OWL is similar to the RDF, then why only RDF is used in the database and not the OWL ?
Asking what RDF is about is entirely too broad of a question, there is a lot to say in that regard. So I'll attempt to briefly answer the specific question.
An RDF database, which is really just a graph database, stores an RDF graph which you can then use SPARQL to query. RDF isn't the database, it's the data model.
OWL has a mapping to RDF, but generally, is used to define logical constructs that a reasoner can use to infer new information from existing data. Some RDF databases include a reasoner that can take advantage of OWL, serialized as RDF, to perform reasoning either at query time, or eagerly during data updates, to expose this new, inferred information, to users via SPARQL.
What are the other types of database systems out there. I've recently came across couchDB that handles data in a non relational way. It got me thinking about what other models are other people is using.
So, I want to know what other types of data model is out there. (I'm not looking for any specifics, just want to look at how other people are handling data storage, my interest are purely academic)
The ones I already know are:
RDBMS (mysql,postgres etc..)
Document based approach (couchDB, lotus notes)
Key/value pair (BerkeleyDB)
db4o
Quote from the "about" page:
db4o is the open source object database that enables Java and .NET developers to store and retrieve any application object with only one line of code, eliminating the need to predefine or maintain a separate, rigid data model.
Older non-relational databases:
Network Database
Hierarchical Database
Both mostly went out of style when relational became feasible.
Column-oriented databases are also a bit of a different animal. Many of them do support standard relational database SQL though. These are generally used for data warehouse type applications.
Semantic Web is also a non-relational data storage paradigm. There are no relations, all metadata is stored in the same way as data, and every entity has potentially its own unique set of attributes. Open-source projects that implement RDF, a Semantic Web standard, include Jena and Sesame.
Isn't Amazon's SimpleDB non-relational?
db4o, as mentioned by Eric, is an Object-Oriented database management system (OODBMS).
There's object-based databases(Gemstore, for example). Google's Big-Table and Amason's Simple Storage I am not sure how you would categorize, but both are map-reduce based.
A non-relational document oriented database we have been looking at is Apache CouchDB.
Apache CouchDB is a distributed, fault-tolerant and schema-free document-oriented database accessible via a RESTful HTTP/JSON API. Among other features, it provides robust, incremental replication with bi-directional conflict detection and resolution, and is queryable and indexable using a table-oriented view engine with JavaScript acting as the default view definition language.
Our interest was in providing a distributed access user preferences store that would be immune to shape changes to which we could serialize preference objects from Java and access those just as easily with Javascript from a XULRunner based client application.
I'd like to detail more on Bill Karwin's answer about semantic web and triplestores, since it's what I am working on at the moment, and I have something to say on it.
The idea behind a triplestore is to store a graph-based database, whose datamodel roots in RDF. With RDF, you describe nodes and associations among nodes (in other words, edges). Data is organized in triples :
start node ----relation----> end node
(in RDF speech: subject --predicate--> object). With this very simple data model, any data network can be represented by adding more and more triples, provided you give a meaning to nodes and relations.
RDF is very general, and it's a graph-based data model well suited for search criteria looking for all triples with a particular combination of subject, predicate, or object, in any combination. Eventually, through a query language called SPARQL, you can also perform more complex queries, an operation that boils down to a graph isomorphism search onto the graph, both in terms of topology and in terms of node-edge meaning (we'll see this in a moment). SPARQL allows you only SELECT (and similar) queries. No DELETE, no INSERT, no UPDATE. The information you query (e.g. specific nodes you are interested in) are mapped into a table, which is what you get as a result of your query.
Now, topology in itself does not mean a lot. For this, a Schema language has been invented. Actually, more than one, and calling them schema languages is, in some cases, very limitative. The most famous and used today are RDF-Schema, OWL (Lite and Full), and they predate from the obsolete DAML+OIL. The point of these languages is, boiling down stuff, to give a meaning to nodes (by granting them a type, also described as a triple) and to relationships (edges). Also, you can define the "range" and "domain" of these relationships, or said differently what type is the start node and what type is the end node: you can say for example, that the property "numberOfWheels" can be applied only to connect a node of type Vehicle to a non-zero integer value.
ns:MyFiat --rdf:type--> ns:Vehicle
ns:MyFiat --ns:numberOfWheels-> 4
Now, you can use these ontologies in two directions: validation and inference. Validation is not that fancy today, but I've seen instances of use. Inference is what is cool today, because it allows reasoning. Inference basically takes a RDF graph containing a set of triples, takes an ontology, mixes them into a triplestore database which contains an "inference engine" and like magic the inference engine invents triples according to your ontological description. Example: suppose you just store this information in the database
ns:MyFiat --ns:numberOfWheels--> 4
and nothing else. No type is specified about this node, but the inference engine will add automatically a triple saying that
ns:MyFiat --rdf:type--> ns:Vehicle
because you said in your ontology that only objects of type Vehicle can be described by a property numberOfWheels.
Conversely, you can use the inference engine to validate your data against the ontology so to refuse not compliant data (sort of like XML-Schema for XML). In this case, you will need both triples to have your data successfully accepted by the triplestore.
Additional characteristics of triplestores are Formulas and Context-aware storage. Formulas are statements (as usual, triples subject predicate object) that describe something hypothetical. I never used Formulas, so I won't go into more details of something I don't know. Context awareness are basically subgraphs: the problem with storing triples is that you don't have anything to say where these triples come from. Suppose you have two dealers that describe the same price of a component. One says that the price is 5.99 and the other 4.99. If you just store both triples into a database, now you don't know anything about who stated each information. There are two ways to solve this problem.
One is reification. Reification means that you store additional triples to describe another triple. It's wasteful, and makes life hell because you have to reify every and each triple you store. The alternative is context-awareness. Having a context-aware storage It's like being able to box a bunch of triples into a container with a label on it (the context identifier). You now can use this identifier as subject for additional statements, hence describing a bunch of triples in a single action.
4. Navigational. Includes Tree/Hierarchy and Graph/Network.
File systems, the semantic web, XML, Object databases, CODASYL, and many others all fit into this category.
Those 4 are pretty much it.
There is also what is referred to as an "inverted index" or "inverted list" database. Software AG's Adabas product would be an example. As with hierachical, these databases continue to be used in large corporate or university environments because of legacy considerations or due to a performance advantage in certain situations (typically high-end transactional applications).
There are BASE systems (Basically Available, Soft State, Eventually consistent) and they work well with simple data models holding vast volumes of data. Google's BigTable, Dojo's Persevere, Amazon's Dynamo, Facebook's Cassandra are some examples.
See LINK
The illuminate Correlation Database is a new revolutionary non-relational database. The Correlation Database Management Dystem (CDBMS) is data model independent and designed to efficiently handle unplanned, ad hoc queries in an analytical system environment. Unlike relational database management systems or column-oriented databases, a correlation database uses a value-based storage (VBS) architecture in which each unique data value is stored only once and an auto-generated indexing system maintains the context for all values (data is 100% indexed). Queries are performed using natural language instead of SQL (NoSQL).
Learn more at: www.datainnovationsgroup.com