When an ontology is created from text consisting of a set of sentences, it can be useful to bind any given concept with all the sentences, where it is present. But that inevitably leads to a nasty duplication of sentences, when the usual Annotation is used for storing the related text.
E.g. the sentence "Attributive language is the base language which allows: Atomic negation (negation of concept names that do not appear on the left hand side of axioms), Concept intersection, Universal restrictions, Limited existential quantification." would need to be copied as an Annotation to the Entities: Attributive language, Language, Atomic negation, Negation, Concept names, Axiom, Concept intersection, Universal restriction, Limited existencial quantification.
What is in your opinion a good way to avoid copying the same sentence to several locations and yet to have traces from the Entity to the relevant sentences?
I would create a named individual with an IRI and attach the sentence to it, then add a relationship from the concepts to the individual.
The individual might have a type, e.g., Sentence, but this is not necessary. Properties used can be annotation properties or data/object properties.
Related
From what I've understood as reading the book Foundations of Semantic Web Technologies concerning owl formal semantics, Hitzler et al have put forward two kinds of model-theoretic semantics for SROIQ: one is the model checking like approach (where we check different interpretations to find the models of our KB) and the other is via predicate logic. In the latter approach, the book just translates SROIQ into predicate logic.
However, the book is a bit confusing for me and I do not know if I have gotten some points right, so here are my questions:
Is model-checking a kind of model-theoretic semantics?
Is translating your SROIQ into predicate logic also a model-theoretic semantics?
How is translating SROIQ into predicate logic a kind of "semantics"? Is that because after the conversion, we can pick up FOL semantics and algorithms?
Thanks!
P.S. This is a link to the book! Just in case!
Model theoretic semantics is how you determine the meaning of axioms - i.e., what rules are available to build a model or to check it is a valid one. Two examples: OWL semantics and RDF semantics. They have a lot of overlap but are not identical.
Model checking does not define semantics, it applies semantic rules defined in a model to actual knowledge bases. Translation to another formalism, e.g., predicates, might maintain the same semantics (i.e., all models stay the same in both formalisms), but this depends on the formalism involved.
Please share the difference between homonyms and synonyms in data science with examples.
Synonyms for concepts:
When you determine that two concepts are synonyms (say, sofa and couch), you use the class expression owl:equivalentClass. The entailment here is that any instance that was a member of class sofa is now also a member of class couch and vice versa. One of the nice things about this approach is that "context" of this equivalence is automatically scoped to the ontology in which you make the equivalence statement. If you had a very small mapping ontology between a furniture ontology and an interior decorating ontology, you could say in the map that these two are equivalent. In another situation if you needed to retain the (subtle) difference between a couch and a sofa, you do that by merely not including the mapping ontology that declared them equivalent.
Homonyms for concepts:
As Led Zeppelin says, "and you know sometimes words have two meaningsā¦" What happens when a "word" has two meanings is that we have what WordNet would call "word senses." In a particular language, a set of characters may represent more than one concept. One example is the English word "mole," for which WordNet has 6 word senses. The Semantic Web approach is to give each its own namespace; for instance, I might refer to the counterspy mole as cia:mole and the burrowing rodent as the mammal:mole. (These are shortened qnames for what would be full namespace names.) The nice thing about this is, if the CIA ever needed to refer to the rodent they could unambiguously refer to mammal:mole.
Credit
Homonyms- are words that have the same sound but have different in meaning.
2. Synonyms- are words that have the same or almost the same meaning.
Homonyms
Machine learning algorithms are now the subject of ethical debate. Bias, in layman's terms, is a pre-formed view created before facts are known. It applies to an estimating procedure's proclivity to provide estimations or predictions that are, on average, off goal in machine learning and data mining.
A policy's strength can be measured in a variety of ways, including confidence. "Decision trees" are diagrams that show how decisions are being made and what consequences are available. Rescale a statistic to match the scale of other variables in the model to normalise it.
Confidence is a statistician's metric for determining how reliable a sample is (we are 95 percent confident that the average blood sugar in the group lies between X and Y, based on a sample of N patients). Decision tree algorithms are methods that divide data across pieces that are becoming more and more homogeneous in terms of the outcome measure as they advance.
A graph is a graphical representation of data that statisticians call plots and charts. A graph seems to be an information structure that contains the ties and links among items, according to computer programmers. The act of arranging relational databases and their columns such that table relationships are consistent is known as normalisation.
Synonyms
Statisticians use the terms record, instance, sample, or example to describe their data. In computer science and machine learning, this can be called an attribute, input variable, or feature. The term "estimation" is also used, though its use is generally limited to numeric outcomes.
Statisticians call the non-time-series data format a record, or record. In statistics, estimation more often refers to the use of a sample statistic to measure something. Predictive modelling involves developing aggregations of low-level predictors into more informative "features".
The spreadsheet format, in which each column is still a variable, so each row is a record, is perhaps the most common non-time-series data type. Modeling in machine learning and artificial intelligence often begins with some very low-level prediction data.
I am a trying to design a database to act as a language dictionary where each word is associated not only to its definition by also to its grammatical "taxon". E.g., it should look something like this:
"eat": verb.imperative
"eat": verb.present
"ate": verb.past
"he": pronoun.masculine.singular
"she": pronoun.feminine.singular
"heiress": noun.feminine.singular
"heirs": noun.masculine.plural
"therefore": adverb
"but": conjunction
It seems that a natural data structure to hold such a grammatical "taxonomy" should be some kind of tree or graph. Although I haven't thought it through, I presume that should make it easier to perform queries of the type
plural OF masculine OF "heiress" -> "heirs"
At this point, however, I am just trying to come up with the least ineffective way to store such a dictionary in a regular relational database (namely a LibreOffice Base). What do you suggest the data schema should be like? Is there something more efficient than the brute force method where I'd have as many boolean columns as there are grammatical types and sub-types? E.g., "she" would be true for the columns pronoun, feminine, and singular, but false for all other column (verbs, adverb, conjunction, etc.)?
This is a really wide-open question, and there are many applications and much related research. Let me give some pointers based on software I have used.
One column would be the lexeme, for example "eat." A second column would give the part of speech, which in your data above would be a string or other identifier that shows whether it is a verb, pronoun, noun, adverb or conjunction.
It might make sense to create another table for verb information. For example, tense, aspect and mood might each be separate columns. But these columns would only make sense for verbs. For the nouns table, the columns would include number (singular, plural) and gender, and perhaps whether it is a count or mass noun. Pronouns would also include person (first, second or third person).
Do you plan to include every form of every word? For example, will this database store "eats" and "eating" as well as "jumps" and "jumping?" It is much more efficient to store rules like "-s" for present singular and "-ing" for progressive. Then if there are exceptions, for example "ate," it can be described as having the underlying form of "eat" + "-ed." This rule would go under the "eat" lexeme, and there would be no separate "ate" entry.
Also there are rules such as that plural changes words that end in y to -ies. This would go under the plural noun suffix ("-s"), not individual verbs.
With these things in mind, I offer a more specific answer to your question: No, I do not think this data is best described hierarchically, nor with a tree or graph, but rather analytically and relationally. LibreOffice Base would be a reasonable choice for a fairly simple project of this type, using macros to help with the processing.
So for:
"heiress" -> masculine plural = "heirs"
The first thing to do would be to analyze "heiress" as "heir" + feminine. Then compose the desired wordform by combining "heir" and "-s."
I was going to add a list of related software such as Python NLTK, but for one thing, the list of available software is nearly endless, and for another, software recommendations are off-topic for stackoverflow.
In the W3 OWL specification the properties of individuals are divided in to two groups: datattype properties and object properties. Object properties are defined (as one article I found put it):
"Object properties (owl:ObjectProperty) relates individuals (instances) of two OWL classes.
So in essence, object properties could also be called "individual properties", because they don't just point to generic objects of any sort, they point specifically to individuals.
Now, if this was just some random spec I would assume the authors simply chose their names poorly, but this is a W3 spec, and one specifically on the storage of knowledge no less; I have to assume people thought about the names of things!
Therefore, I'm hoping someone here can explain this seemingly strange naming choice. After all, you can call damn near anything in any spec an "ObjectFoo", because Object is a super-generic term, but normally people use the most specific term possible, not the least, when they name things.
Is there perhaps some other case where an ObjectProperty can refer to something other than an individual, or anything else I'm missing that might explain this?
The term "ObjectProperty" was (most probably) coined to distinguish it from "DatatypeProperty", in the sense that the latter can only have (datatyped) literal values, as opposed to full objects. And yes, it's not just individuals that can be the value of an ObjectProperty, classes can be values of them too - although if you do that, your ontology is no longer valid OWL DL and becomes OWL Full instead. But it's valid from a modeling perspective.
My answer is also a comment to Jeen. OWL DL and OWL Full are 2 distinct languages with a different (even disjoint) abstract syntax. OWL DL syntax is defined in terms of its structural specification. OWL Full syntax is RDF. The structural specification of OWL does not even contain any RDF triple.
Now, in OWL DL, it is invalid to relate two classes with an object property. Object properties can only relate instances of owl:Thing. They cannot relate literals, properties, ontologies, or datatypes either. If you call the notion IndividualProperty, you are creating an inconsistency in naming because DatatypeProperty does not mean a property that relates datatypes. It is a property that relates literals. So a better name would be ClassProperty. Or you would have to change both DatatypeProperty and ObjectProperty into LiteralProperty and IndividualProperty.
All in all, there are various ways to deal with this, the working group chose the one which gathered more votes. That's all how it always works in a standardisation group.
The name comes from "Subject Predicate Object". Those are properties that link the Subject with the Object, as opposite to Datatype Properties that are merely attributes.
Some scripting languages, such as Python and Javascript, have arrays (aka lists) as a separate datatype from hash tables (aka dictionaries, maps, objects). In other scripting languages, such as PHP and Lua, an array is merely a hash table whose keys happen to be integers. (The implementation may be optimized for that special case, as is done in the current version of Lua, but that's transparent to the language semantics.)
Which is the better approach?
The unified approach is more elegant in the sense of having one thing rather than two, though the gain isn't quite as large as it might seem at first glance, since you still need to have the notion of iterating over the numeric keys specifically.
The unified approach is arguably more flexible. You can start off with nested arrays, find you need to annotate them with other stuff, and just add the annotations, without having to rework the data structures to interleave the arrays with hash tables.
In terms of efficiency, it seems to be pretty much a wash (provided the implementation optimizes for the special case, as Lua does).
What am I missing? Does the separate approach have any advantages?
Having separate types means that you can make guarantees about performance, and you know that you will have "normal" semantics for things like array slicing. If you have a unified system, you need to work out what all operations, such as slicing, mean on sparse arrays.
An array is more than a table intentionally restricted to consecutive integer keys. It's a sequence, a collection of n items (not key-value pairs, just the values) with a well-defined order. This is, in my opinion, a data structure that has no place for additional data in the form of non-integer keys. It's conceptually simpler.
Also, implementing the two seperately may be simpler, especially when considering the addition of an optimization (which is apparently obscure enough that a performance-oriented language like Lua didn't implement it for many many years) which makes arrays perform well.
Also, the flexibility point is arguable. If the need for more complex annotation arises, it's quite possible that you'll soon also need polymorphism, in which case you should just switch to objects with an array among other attributes.
As mentioned, there are speed and complexity issues involved in having two separate types. However, one of the things that I find important about having two types is that it expresses the intent of the datastore.
A list is a an ordered list of items. The items and their order ARE the data, the keys only exist in a conceptual manner to describe the order of the items.
A map is a mapping of keys to values. The keys and the values they represent ARE the data.
The point to note that the keys are part of the data for a map, they're not for a list... conceptually. When you choose one data type over the other, you're specifying your intent.
I'll add as an aside that every language that shares a data type for lists and maps has certain... annoyances that come along with it. There are always certain concessions that need to be made to allow the combination, and they can bite you sometimes. It's generally not a big deal, but it can be annoying.