Tools to Generate an Ontology from a Database Schema and Content - database

Is there a tool that can infer an ontology from information contained in both a database schema and the content in that schema? Let's say that there are tables in the database defining the following:
The types of entities that can exist
Instances of those entities linked to type
The types of relationships that can exist
Instances of those relationships linked to type and the entities concerned
I feel that looking at the schema alone is going to give a much more general ontology than I would like.

Reveltyix has a partnership with Global IDs, a data goveranance and MDM. One of GID's data profiling tool can export RDF to us to boot strap up a good portion of the ontology, even when that spans multiple databases. The Revelytix technologies then use the resultant ontology to federate data and manage distributed information sources, without having to move the date.
Good Luck
Greg
201-232-9195

Related

Mapping DMN table to a OWL ontology

As part of my research, I have been working on a DMN table and OWl ontology. I must map rules from the DMN table into an OCQA web ontology file and work later with SHACL to infer some rules.
In this process, as an initial step recently, my supervisor gave me a hint to map the DMN table with OWL.
His message:
"``https://www.omg.org/spec/DMN/1.2/About-DMN
You find an rdf specification on the bottom of the page.
First, you have to define the dmn schema in owl. Then you can directly define or import a dmn table into owl. Or you use the rdf schema."
I have very little idea about semantic web technology. So I studied some documents or tutorials and understood that classes, individuals, and relationships (object properties, data prop) made an ontology. Like this, I did practice an example development in Protege software also.
After his reply, I started to read what is an rdf scheme or OWL schema. But unfortunately, I didn't go further.
Can someone help me to understand the schema or help me define a relevant schema for the DMN table, as my supervisor suggested?
What should I do to define a schema? Is this nothing but making classes and individuals with relations or creating some placeholders and then getting them from the mapping of DMN? I am clueless. Should I use RDFPY library to do all these things?
"Stackoverflow is not supporting uploading DMN file. it is nothing but data in XML. below is the picture"
enter image description here DMN table for quality checks of a concrete curing method
Please, someone, help me to go further.
I would suggest making your schema in Protege. Your schema (ontology) will have a set of classes and properties between. Some ontologies will have named individuals, some will separate the named individuals out of the ontology. This is generally A-Box vs T-Box.
Personally, I use rdflib for python development but performance isn't great. For better performance you can use the redland python bindings - but it's a pain in the ass to work with.
As for your table... This is mostly an exercise in mapping from a relational data structure to a graph data structure. Each row represents a node of type 'Inspection' and has relations to the columns. For example consider the pseudo rdf,
Inspection_A rdf:type Inspection
Inspection_A hasActivity "Curing"
Inspection_A hasApprover "Site Engineer"
tldr; Your ontolgoy should have classes and relations (T-Box). Then in a separate file create instances of the classes (A-Box). T-Boxes will most likely be .owl files while the A-Box is turtle, nquads, json-ld, etc. When you load the two together in a graph database, it should be able to take the definitions in your T-Box and reason over your A-Box. You can use rdflib for programatically working with RDF. Your task is to map a table to a graph. The columns are relations, and table name is a class.

Migrate from one Graph Database to another

How can you migrate your data from one graph database (Neo4j, Tiger Graph etc.) to another?
Background:
I have to decide between the standards of the W3C (RDF (S), OWL) and
Databases for property graphs (Neo4j, TigerGraph etc.).
I know that all "triple stores" that support the W3C standard also make it possible to simply "pull out" the data
and import it into another triple store.
For relational databases there is also the standard SQL (and dialects),
so that with a little effort you can get the data from one relational database to another.
But I can't think of such a solution for graph databases.
As someone already mentioned for the property graphs there is no defined standard as of now. There are efforts going to build such standards called GQL https://www.gqlstandards.org/
However, for importing data from RDF to property graphs. Tigergraph and neo4j provides option to load your rdf data to the respective platforms. This might not provide complete switch over capabilities from RDF to Property graph but can help with solutions for certain scenarios.
For interchanging data between property graphs you might have to re-create schema when you switch platforms. For data loading most of property graph dbs provide option to load using csv's.

What database to choose for the hierarchical content with relations?

I want to have a reviews-like website, but not only with reviews, other types of content as well. The design of the website combines both hierarchical structure (each content object/record/entity has a parent - kind of container), and relations - each content object/record/entity has a number of related other objects:
an author of the content (i.e. user)
related comments (with their own relations, particularly authors)
item being reviewed as a separate record in DB
images from the gallery
One of the most important things is performance. Relations used to be inefficient in the NoSQL, as I've read on the net and already tried out with other projects. On the other hand, the general design, apart from the relations mentioned, has an obvious content repository like structure, which is the exact reflection of hierarchical arrangement of objects (documents, articles, reviews) websites are designed. Also, I really like the loose structure of the records in NoSQL. Yet, I don't care about (nor use) things like versioning and other things related to NoSQL.
So I want to combine both wordls: hierarchical and relational within one project, or actually, its model. Apart from it, I want the project to be restful, so that a mobile apps could use the same content available through the API. Another requirement is that the content should be searchable.
What type of storage would you choose for a project like this?
I decided to go with the Graph DBs. Here's why I rejected the other ones:
I don't want to use NoSQL (Documents), since relations are hard to maintain and often require extra code infrastructure (often custom) to handle them, see e.g. Diaspora NoSQL problems
I don't want to use RDBMS, since the structure based DBs impose well known limitations and doesn't reflect the domain
I rejected the key-value and big table DBs as they have very specific use cases
Graph Databases have been used in number of content-oriented projects, and appeared to be doing the job surprisingly well.
You can easily model a hierarchical data structure in SQL with the following (using PostgreSQL):
CREATE TABLE comments (
id INTEGER,
parent INTEGER,
content VARCHAR(1024)
)
Where parent refers to the id of the parent comment.
If you are after a NoSQL database that exposes a RESTful interface, you could consider CouchDB.
You can then replicate CouchDB to Elasticsearch for more robust searching.
But if your data is relational then I would very much recommend you consider a SQL database like PostgreSQL first.

Is there such a thing as a schema in a graph database?

Is there such a thing as a schema in a graph database? For example, can you specify which types of node can have relationships with which other types of node?
What does such a schema look like?
Graph databases differ a lot in this area, just like das_weezul says. In the general case I think graph databases which are closer to object databases (OODB) also have built-in schema support. One nice thing about graph databases is that they're very well suited for mixing data and metadata. So a common approach for both dealing with schema support and security is to store this kind of metadata in a (sometimes hidden) part of the very same graph.
When it comes to Neo4j - where I'm on the team - there's currently at least two approaches in use for defining schemas:
Defining the schema in annotations, for example using Spring Data Graph (docs).
Using a meta-model layer on top of the database.
You'll find some more reading on this topic over at myNoSQL.
Yes. Schemas useful in selecting vertex labels, which are part both Neo4J 2 and Tinkerpop 3. I think writing down the schema helps clarify how the graph should be used, although most databases don't support validations against a schema.
I have a longer post on how to draw the schema as a graph. http://lambdazen.blogspot.com/2014/01/do-property-graphs-have-schemas.html
A graph database will always have a rudimentary schema consisting of (at least) Vertex and Edge objects, where an Edge can contain data about a particular relationship. The degree to which you can add to this schema varies widely across implementations. You may be able to customize the schema by inheriting from Edge and/or Vertex objects,for instance.
If the graph database uses an underlying RDBMS or ODBMS then you may have access to more powerful schema creation and manipulation capabilities.

Database system that is not relational

What are the other types of database systems out there. I've recently came across couchDB that handles data in a non relational way. It got me thinking about what other models are other people is using.
So, I want to know what other types of data model is out there. (I'm not looking for any specifics, just want to look at how other people are handling data storage, my interest are purely academic)
The ones I already know are:
RDBMS (mysql,postgres etc..)
Document based approach (couchDB, lotus notes)
Key/value pair (BerkeleyDB)
db4o
Quote from the "about" page:
db4o is the open source object database that enables Java and .NET developers to store and retrieve any application object with only one line of code, eliminating the need to predefine or maintain a separate, rigid data model.
Older non-relational databases:
Network Database
Hierarchical Database
Both mostly went out of style when relational became feasible.
Column-oriented databases are also a bit of a different animal. Many of them do support standard relational database SQL though. These are generally used for data warehouse type applications.
Semantic Web is also a non-relational data storage paradigm. There are no relations, all metadata is stored in the same way as data, and every entity has potentially its own unique set of attributes. Open-source projects that implement RDF, a Semantic Web standard, include Jena and Sesame.
Isn't Amazon's SimpleDB non-relational?
db4o, as mentioned by Eric, is an Object-Oriented database management system (OODBMS).
There's object-based databases(Gemstore, for example). Google's Big-Table and Amason's Simple Storage I am not sure how you would categorize, but both are map-reduce based.
A non-relational document oriented database we have been looking at is Apache CouchDB.
Apache CouchDB is a distributed, fault-tolerant and schema-free document-oriented database accessible via a RESTful HTTP/JSON API. Among other features, it provides robust, incremental replication with bi-directional conflict detection and resolution, and is queryable and indexable using a table-oriented view engine with JavaScript acting as the default view definition language.
Our interest was in providing a distributed access user preferences store that would be immune to shape changes to which we could serialize preference objects from Java and access those just as easily with Javascript from a XULRunner based client application.
I'd like to detail more on Bill Karwin's answer about semantic web and triplestores, since it's what I am working on at the moment, and I have something to say on it.
The idea behind a triplestore is to store a graph-based database, whose datamodel roots in RDF. With RDF, you describe nodes and associations among nodes (in other words, edges). Data is organized in triples :
start node ----relation----> end node
(in RDF speech: subject --predicate--> object). With this very simple data model, any data network can be represented by adding more and more triples, provided you give a meaning to nodes and relations.
RDF is very general, and it's a graph-based data model well suited for search criteria looking for all triples with a particular combination of subject, predicate, or object, in any combination. Eventually, through a query language called SPARQL, you can also perform more complex queries, an operation that boils down to a graph isomorphism search onto the graph, both in terms of topology and in terms of node-edge meaning (we'll see this in a moment). SPARQL allows you only SELECT (and similar) queries. No DELETE, no INSERT, no UPDATE. The information you query (e.g. specific nodes you are interested in) are mapped into a table, which is what you get as a result of your query.
Now, topology in itself does not mean a lot. For this, a Schema language has been invented. Actually, more than one, and calling them schema languages is, in some cases, very limitative. The most famous and used today are RDF-Schema, OWL (Lite and Full), and they predate from the obsolete DAML+OIL. The point of these languages is, boiling down stuff, to give a meaning to nodes (by granting them a type, also described as a triple) and to relationships (edges). Also, you can define the "range" and "domain" of these relationships, or said differently what type is the start node and what type is the end node: you can say for example, that the property "numberOfWheels" can be applied only to connect a node of type Vehicle to a non-zero integer value.
ns:MyFiat --rdf:type--> ns:Vehicle
ns:MyFiat --ns:numberOfWheels-> 4
Now, you can use these ontologies in two directions: validation and inference. Validation is not that fancy today, but I've seen instances of use. Inference is what is cool today, because it allows reasoning. Inference basically takes a RDF graph containing a set of triples, takes an ontology, mixes them into a triplestore database which contains an "inference engine" and like magic the inference engine invents triples according to your ontological description. Example: suppose you just store this information in the database
ns:MyFiat --ns:numberOfWheels--> 4
and nothing else. No type is specified about this node, but the inference engine will add automatically a triple saying that
ns:MyFiat --rdf:type--> ns:Vehicle
because you said in your ontology that only objects of type Vehicle can be described by a property numberOfWheels.
Conversely, you can use the inference engine to validate your data against the ontology so to refuse not compliant data (sort of like XML-Schema for XML). In this case, you will need both triples to have your data successfully accepted by the triplestore.
Additional characteristics of triplestores are Formulas and Context-aware storage. Formulas are statements (as usual, triples subject predicate object) that describe something hypothetical. I never used Formulas, so I won't go into more details of something I don't know. Context awareness are basically subgraphs: the problem with storing triples is that you don't have anything to say where these triples come from. Suppose you have two dealers that describe the same price of a component. One says that the price is 5.99 and the other 4.99. If you just store both triples into a database, now you don't know anything about who stated each information. There are two ways to solve this problem.
One is reification. Reification means that you store additional triples to describe another triple. It's wasteful, and makes life hell because you have to reify every and each triple you store. The alternative is context-awareness. Having a context-aware storage It's like being able to box a bunch of triples into a container with a label on it (the context identifier). You now can use this identifier as subject for additional statements, hence describing a bunch of triples in a single action.
4. Navigational. Includes Tree/Hierarchy and Graph/Network.
File systems, the semantic web, XML, Object databases, CODASYL, and many others all fit into this category.
Those 4 are pretty much it.
There is also what is referred to as an "inverted index" or "inverted list" database. Software AG's Adabas product would be an example. As with hierachical, these databases continue to be used in large corporate or university environments because of legacy considerations or due to a performance advantage in certain situations (typically high-end transactional applications).
There are BASE systems (Basically Available, Soft State, Eventually consistent) and they work well with simple data models holding vast volumes of data. Google's BigTable, Dojo's Persevere, Amazon's Dynamo, Facebook's Cassandra are some examples.
See LINK
The illuminate Correlation Database is a new revolutionary non-relational database. The Correlation Database Management Dystem (CDBMS) is data model independent and designed to efficiently handle unplanned, ad hoc queries in an analytical system environment. Unlike relational database management systems or column-oriented databases, a correlation database uses a value-based storage (VBS) architecture in which each unique data value is stored only once and an auto-generated indexing system maintains the context for all values (data is 100% indexed). Queries are performed using natural language instead of SQL (NoSQL).
Learn more at: www.datainnovationsgroup.com

Resources