I am modeling an ontology in Protege that represents a manufacturing process and I want to create instances with my dataset that is in different formats (csv, xsl, CAD etc.) My question is if this can be solved with apache jena?
Related
As part of my research, I have been working on a DMN table and OWl ontology. I must map rules from the DMN table into an OCQA web ontology file and work later with SHACL to infer some rules.
In this process, as an initial step recently, my supervisor gave me a hint to map the DMN table with OWL.
His message:
"``https://www.omg.org/spec/DMN/1.2/About-DMN
You find an rdf specification on the bottom of the page.
First, you have to define the dmn schema in owl. Then you can directly define or import a dmn table into owl. Or you use the rdf schema."
I have very little idea about semantic web technology. So I studied some documents or tutorials and understood that classes, individuals, and relationships (object properties, data prop) made an ontology. Like this, I did practice an example development in Protege software also.
After his reply, I started to read what is an rdf scheme or OWL schema. But unfortunately, I didn't go further.
Can someone help me to understand the schema or help me define a relevant schema for the DMN table, as my supervisor suggested?
What should I do to define a schema? Is this nothing but making classes and individuals with relations or creating some placeholders and then getting them from the mapping of DMN? I am clueless. Should I use RDFPY library to do all these things?
"Stackoverflow is not supporting uploading DMN file. it is nothing but data in XML. below is the picture"
enter image description here DMN table for quality checks of a concrete curing method
Please, someone, help me to go further.
I would suggest making your schema in Protege. Your schema (ontology) will have a set of classes and properties between. Some ontologies will have named individuals, some will separate the named individuals out of the ontology. This is generally A-Box vs T-Box.
Personally, I use rdflib for python development but performance isn't great. For better performance you can use the redland python bindings - but it's a pain in the ass to work with.
As for your table... This is mostly an exercise in mapping from a relational data structure to a graph data structure. Each row represents a node of type 'Inspection' and has relations to the columns. For example consider the pseudo rdf,
Inspection_A rdf:type Inspection
Inspection_A hasActivity "Curing"
Inspection_A hasApprover "Site Engineer"
tldr; Your ontolgoy should have classes and relations (T-Box). Then in a separate file create instances of the classes (A-Box). T-Boxes will most likely be .owl files while the A-Box is turtle, nquads, json-ld, etc. When you load the two together in a graph database, it should be able to take the definitions in your T-Box and reason over your A-Box. You can use rdflib for programatically working with RDF. Your task is to map a table to a graph. The columns are relations, and table name is a class.
I've been struggling to understand when these technologies are useful from a practical standpoint, and how they are different from each other. Could an expert check my understanding?
Graph databases: These are easier to understand and manage than a relational databases when relationships are complex, inherited, inferred with varying degrees of confidence, and likely to change. Some examples: a user doesn't know how much depth they'll need in a hierarchy; is inferring relationships from social media with varying degrees of confidence in ID resolution, topic resolution, and the strength of a relationship; or doesn't know what kinds of call center data they're going to want to store; all of these can be stored in relational databases, but they will need constant updates. They're also more performative for certain tasks.
Ontologies: These formal and standardized representations of knowledge are used to break down data silos. For instance, let's say a B2B sales company derives revenue from several different lines of business, which take one-time payments, subscriptions, sales of IP, and consulting services. The revenue data is stored in many different databases with lots of idiosyncrasies. An ontology allows the user to define a "customer payment" as anything that "creates or refunds revenue," so that subject matter experts can appropriately label the payments in their databases. Ontologies can be used with either graph databases or relational databases, but the emphasis on class inheritance makes them far easier to implement in a graph database, where the taxonomy of classes can be easily modeled.
Knowledge graph: A knowledge graph is a graph database where language (meaning, the entity and node taxonomies) are governed by an ontology. So in our B2B example, "customer payment" edges have one-time payments, subscription, etc subtypes, and connect "customer" classes to "line of business" classes.
Is that basically correct?
A graph database (GD) is a database that can store graph data, which primarily has three types of elements: nodes, edges, and properties. Two popular types of graph databases are (1) Resource Description Framework (RDF)-based graph databases eg. Blazegraph and (2) Label Propagation Graph (LPG)-based graph databases eg. Neo4j. RDF represents knowledge in the form of subject, verb, and object (S-V-O) triplet, such as John livesIn London, and as its nodes and edges cannot hold properties, additional nodes or literals needs to be added to represent properties. LPG represents knowledge in the form of edges, nodes, and attributes where nodes and edges can hold properties in the form of key:value. Eg., a node can have label Person, and Person can have properties name:Tom Hanks, born:1956.
An ontology is a description of the concepts and their relationships, using instances of concepts, attributes of instances (and classes), restrictions of classes, and rules (if-then statements). These rules describe the logical inferences that can be drawn from the assertions/axioms that comprise the overall theory that the ontology describes. Upper level ontology (eg. DOLCE) describes general concepts and relations, whereas domain ontology (eg. Gene Ontology) describes concepts and relations in a particular domain. A graph database may have an ontology in its schema level for logical consistency checking.
Generally, a knowledge graph (KG) is an organization of a knowledge base as a graph having nodes and links between the nodes. An example of an early KG is Wordnet which captures semantic relationships between words and their meanings. Later Google developed their Google Knowledge Graph (GKG) building on DBpedia and Freebase using RDFa, Microdata and JSON-LD content extracted from indexed web pages, and used schema.org vocabulary to organize the nodes. Google reported that it held around 70 billion facts in GKG.
Graph databases supports queries, but not logical inference which needs an ontology. If the connections within the data are of primary focus (eg. friends of a friend), retrieval more important than storage, and data model changes often, then graph database would be a good fit.
Ontology is used when we need to infer new knowledge from the given knowledge. For eg, if given (1) Socrates is Man, and (2) AllMen are Mortal, then the reasoner or inference engine in an ontology can infer a new knowledge (3) Socrates is Mortal. This is made possible by description logic axioms that the Web Ontology Language (OWL) uses to describe resources. The OWL is serialized using Resource Description Framework (RDF).
Ontology is also used when we need to check consistency in the data model. For eg, if an axiom says Human and Sponge are disjoint classes and we make John (a human) instance of both Human and Sponge classes then it will fail consistency test.
Taxonomy is the IS-A class hierarchy which forms the backbone of an ontology.
Knowledge Graphs are often associated with linked open data (LOD) projects built upon standard Web technologies such as HTTP, RDF, URIs, and SPARQL. KG may use ontologies for reasoning and graph databases to store the knowledge. Several large organizations have introduced their KGs.
I 'm using PowerDesigner 16.1 to create cdm and pdm models, and Bigazi to create Business Procees models, I want to exchange data and models between both of it. So I need a file format that is supported by the two tools
it's my first time writing here but i'm really struck with a problem:
is it possible to use the Jena reasoner on a No-SQL database, like Neo4J, already filled with data?
I've a Neo4J's graph rappresenting a bunch of triples and I would like to use the Jena API and the Jena reasoner on them. I thought about using the SDB/TDB component of Jena but I don't get how to actually load the data into my model since the SDB component seems to work with just SQL databases and the go throught the whole TDB javadoc seems to be a bit too much.
Should I define some kind of configuration file for the TDB model too ?
Thanks very much for the help.
You should have a look at this link which describes the connection between neo4j and triplestores. Or possible connections at least.
The neo4j model is very different than the RDF model, which Jena uses. RDF is composed of triples, meaning subjects, predicates, and objects. Here is an example of a graph composed of triples. Note the use of URIs for identifying resources, and note that the nodes are typically atomic data values. They're a URI, a simple number, a string, and so on.
In Neo4j, nodes are "Property Containers". Meaning that they're not just URIs, but they're actually bundles of information. Relationships connect nodes. So RDF "predicates" are sort of like Neo4j relationships, but neo4j nodes are not like RDF resources and literals.
Your main task if you want to use reasoners over a neo4j database is going to be to suck data out of neo4j, and format it as a set of RDF triples. You can then put those RDF triples into a Jena Model. When you have that jena model in memory, you can use existing jena APIs to use reasoners with that model.
I am in the process of creating a neo4j implementation of the jena API. For this I am subclassing ObjectProperty, Individual and OntClass and implement queries to the neo4j endpoint.
The main problem is that for reasoning the whole database must be loaded in memory in order to use Jena's inmemory reasoning. My solution at the moment is to use a "reasoning"-server to process this and write new results to the main persistence layer. This, of course, is only suitable for long term recommendation systems and not for UI interactions.
Have a look here for the current state of the project:
https://github.com/uzuzjmd/Wissensmodellierung
Path:
competence-database\src\main\scala\uzuzjmd\competence\persistence\neo4j
Anyone interested to participate in this open source project feel free to contact me.
I'm a bit late to the party but you can use https://github.com/neo4j-labs/neosemantics to output the Neo4J data into triples and read that into a Jena Model
Is there a tool that can infer an ontology from information contained in both a database schema and the content in that schema? Let's say that there are tables in the database defining the following:
The types of entities that can exist
Instances of those entities linked to type
The types of relationships that can exist
Instances of those relationships linked to type and the entities concerned
I feel that looking at the schema alone is going to give a much more general ontology than I would like.
Reveltyix has a partnership with Global IDs, a data goveranance and MDM. One of GID's data profiling tool can export RDF to us to boot strap up a good portion of the ontology, even when that spans multiple databases. The Revelytix technologies then use the resultant ontology to federate data and manage distributed information sources, without having to move the date.
Good Luck
Greg
201-232-9195