I have a language that describe a domain by predicates of arbitrary arity.
For example, in first-order logic I can write the following rule:
PersonlivesInCountry(x,y) <-> person(x), PersonlivesInState(x,z), StateisInCountry(z,y)
In this exampple, all the pair in PersonLivesInCountry are exactly those who match the other restrictions. The key point here is to express the join between the other restriction, for example the z in common with PersonLivesInState and StateIsInCountry predicates.
In my language, I always have this structure so I need to write a grammar to express this join condition.
Is it possible to ensure that each exit index of a relation is in a (inner) join with the successor?
Context-free grammars cannot express name agreement (unless the set of possible names is finite, and even then it's a pain).
That sort of analysis generally falls into the category of "semantic analysis", which is usually easier to write as a (series of) walks over the abstract syntax tree (AST). In that model, the parser just builds the AST.
For simple parsers, semantic analysis can be done during the parse, by attaching semantic actions to each syntactic rule. However, this often leads to badly-factored code which is harder to maintain and modify in the long run.
That said, it is not uncommon to see semantic actions used not just to build an AST but also a symbol table. I'm not as fond of this as I used to be; these days I limit symbol analysis in the parser to entering the symbol in a table of interned symbols, which simplifies memory management and speeds up symbol lookup during semantic analysis. YMMV. Good luck with the project.
Related
From what I've understood as reading the book Foundations of Semantic Web Technologies concerning owl formal semantics, Hitzler et al have put forward two kinds of model-theoretic semantics for SROIQ: one is the model checking like approach (where we check different interpretations to find the models of our KB) and the other is via predicate logic. In the latter approach, the book just translates SROIQ into predicate logic.
However, the book is a bit confusing for me and I do not know if I have gotten some points right, so here are my questions:
Is model-checking a kind of model-theoretic semantics?
Is translating your SROIQ into predicate logic also a model-theoretic semantics?
How is translating SROIQ into predicate logic a kind of "semantics"? Is that because after the conversion, we can pick up FOL semantics and algorithms?
Thanks!
P.S. This is a link to the book! Just in case!
Model theoretic semantics is how you determine the meaning of axioms - i.e., what rules are available to build a model or to check it is a valid one. Two examples: OWL semantics and RDF semantics. They have a lot of overlap but are not identical.
Model checking does not define semantics, it applies semantic rules defined in a model to actual knowledge bases. Translation to another formalism, e.g., predicates, might maintain the same semantics (i.e., all models stay the same in both formalisms), but this depends on the formalism involved.
Please share the difference between homonyms and synonyms in data science with examples.
Synonyms for concepts:
When you determine that two concepts are synonyms (say, sofa and couch), you use the class expression owl:equivalentClass. The entailment here is that any instance that was a member of class sofa is now also a member of class couch and vice versa. One of the nice things about this approach is that "context" of this equivalence is automatically scoped to the ontology in which you make the equivalence statement. If you had a very small mapping ontology between a furniture ontology and an interior decorating ontology, you could say in the map that these two are equivalent. In another situation if you needed to retain the (subtle) difference between a couch and a sofa, you do that by merely not including the mapping ontology that declared them equivalent.
Homonyms for concepts:
As Led Zeppelin says, "and you know sometimes words have two meaningsā¦" What happens when a "word" has two meanings is that we have what WordNet would call "word senses." In a particular language, a set of characters may represent more than one concept. One example is the English word "mole," for which WordNet has 6 word senses. The Semantic Web approach is to give each its own namespace; for instance, I might refer to the counterspy mole as cia:mole and the burrowing rodent as the mammal:mole. (These are shortened qnames for what would be full namespace names.) The nice thing about this is, if the CIA ever needed to refer to the rodent they could unambiguously refer to mammal:mole.
Credit
Homonyms- are words that have the same sound but have different in meaning.
2. Synonyms- are words that have the same or almost the same meaning.
Homonyms
Machine learning algorithms are now the subject of ethical debate. Bias, in layman's terms, is a pre-formed view created before facts are known. It applies to an estimating procedure's proclivity to provide estimations or predictions that are, on average, off goal in machine learning and data mining.
A policy's strength can be measured in a variety of ways, including confidence. "Decision trees" are diagrams that show how decisions are being made and what consequences are available. Rescale a statistic to match the scale of other variables in the model to normalise it.
Confidence is a statistician's metric for determining how reliable a sample is (we are 95 percent confident that the average blood sugar in the group lies between X and Y, based on a sample of N patients). Decision tree algorithms are methods that divide data across pieces that are becoming more and more homogeneous in terms of the outcome measure as they advance.
A graph is a graphical representation of data that statisticians call plots and charts. A graph seems to be an information structure that contains the ties and links among items, according to computer programmers. The act of arranging relational databases and their columns such that table relationships are consistent is known as normalisation.
Synonyms
Statisticians use the terms record, instance, sample, or example to describe their data. In computer science and machine learning, this can be called an attribute, input variable, or feature. The term "estimation" is also used, though its use is generally limited to numeric outcomes.
Statisticians call the non-time-series data format a record, or record. In statistics, estimation more often refers to the use of a sample statistic to measure something. Predictive modelling involves developing aggregations of low-level predictors into more informative "features".
The spreadsheet format, in which each column is still a variable, so each row is a record, is perhaps the most common non-time-series data type. Modeling in machine learning and artificial intelligence often begins with some very low-level prediction data.
I am a trying to design a database to act as a language dictionary where each word is associated not only to its definition by also to its grammatical "taxon". E.g., it should look something like this:
"eat": verb.imperative
"eat": verb.present
"ate": verb.past
"he": pronoun.masculine.singular
"she": pronoun.feminine.singular
"heiress": noun.feminine.singular
"heirs": noun.masculine.plural
"therefore": adverb
"but": conjunction
It seems that a natural data structure to hold such a grammatical "taxonomy" should be some kind of tree or graph. Although I haven't thought it through, I presume that should make it easier to perform queries of the type
plural OF masculine OF "heiress" -> "heirs"
At this point, however, I am just trying to come up with the least ineffective way to store such a dictionary in a regular relational database (namely a LibreOffice Base). What do you suggest the data schema should be like? Is there something more efficient than the brute force method where I'd have as many boolean columns as there are grammatical types and sub-types? E.g., "she" would be true for the columns pronoun, feminine, and singular, but false for all other column (verbs, adverb, conjunction, etc.)?
This is a really wide-open question, and there are many applications and much related research. Let me give some pointers based on software I have used.
One column would be the lexeme, for example "eat." A second column would give the part of speech, which in your data above would be a string or other identifier that shows whether it is a verb, pronoun, noun, adverb or conjunction.
It might make sense to create another table for verb information. For example, tense, aspect and mood might each be separate columns. But these columns would only make sense for verbs. For the nouns table, the columns would include number (singular, plural) and gender, and perhaps whether it is a count or mass noun. Pronouns would also include person (first, second or third person).
Do you plan to include every form of every word? For example, will this database store "eats" and "eating" as well as "jumps" and "jumping?" It is much more efficient to store rules like "-s" for present singular and "-ing" for progressive. Then if there are exceptions, for example "ate," it can be described as having the underlying form of "eat" + "-ed." This rule would go under the "eat" lexeme, and there would be no separate "ate" entry.
Also there are rules such as that plural changes words that end in y to -ies. This would go under the plural noun suffix ("-s"), not individual verbs.
With these things in mind, I offer a more specific answer to your question: No, I do not think this data is best described hierarchically, nor with a tree or graph, but rather analytically and relationally. LibreOffice Base would be a reasonable choice for a fairly simple project of this type, using macros to help with the processing.
So for:
"heiress" -> masculine plural = "heirs"
The first thing to do would be to analyze "heiress" as "heir" + feminine. Then compose the desired wordform by combining "heir" and "-s."
I was going to add a list of related software such as Python NLTK, but for one thing, the list of available software is nearly endless, and for another, software recommendations are off-topic for stackoverflow.
When an ontology is created from text consisting of a set of sentences, it can be useful to bind any given concept with all the sentences, where it is present. But that inevitably leads to a nasty duplication of sentences, when the usual Annotation is used for storing the related text.
E.g. the sentence "Attributive language is the base language which allows: Atomic negation (negation of concept names that do not appear on the left hand side of axioms), Concept intersection, Universal restrictions, Limited existential quantification." would need to be copied as an Annotation to the Entities: Attributive language, Language, Atomic negation, Negation, Concept names, Axiom, Concept intersection, Universal restriction, Limited existencial quantification.
What is in your opinion a good way to avoid copying the same sentence to several locations and yet to have traces from the Entity to the relevant sentences?
I would create a named individual with an IRI and attach the sentence to it, then add a relationship from the concepts to the individual.
The individual might have a type, e.g., Sentence, but this is not necessary. Properties used can be annotation properties or data/object properties.
At first this problem seems trivial: given two ontologies, which term in ontology A best refers to a term in ontology B.
But its simplicity is deceptive: this problem is extremely hard and has currently lead to thousands of academic publications, without any consensus on how to solve this problem.
Naively, one would expect that simply looking at the term "Heart Attack" in both ontologies would suffice.
However, ontologies almost never encode the same phrase.
In simple cases "Heart Attack" might be coded as "Heart Attacks", or "Heart attack (non-fatal)", but in more complicated cases it might only be coded as "Myocardial infarction".
In other cases it is even more complicated, for example dealing with compound (composed) terms.
More importantly, simply matching the term (or string) ignores the "ontological structure".
What if "Heart Attack" in ontology A is coded as caused-by high blood pressure, whereas in ontology B it might be coded as withdrawl-from-trial-non-fatal.
In this case it might be valid to match the two terms, but not trivially so.
And this assumes the equivalent term exists at all.
It's a classical problem called Semantic/Ontology Matching, Alignment, or Harmonization. The research out there involves lexical similarity, term usage in free text, graph homomorphisms, curated mappings (like MeSH/WordNet), topic modeling, and logical inference (first- or higher-order logic). But which is the most user friendly and production ready solution, that can be integrated into a Java(/Clojure) or Python app? I've looked at Ontology matching: A literature review but they don't seem to recommend anything ... any suggestions or experiences?
Have a look at http://oaei.ontologymatching.org/2014/results/ . There were several tracks open for matchers to be sent in and be evaluated. Not every matcher participates in every track. So you might want to read the track descriptions and pick one that seems to be the most similar to your problem. For example if you don't have to deal with multiple languages you probably don't have to check the MultiFarm track. After that check the results by having a look at Recall, Precision and F-Measure and decide for yourself. You also might want to check out some earlier years.