Difference between homonyms and synonyms in data science with examples - analytics

Please share the difference between homonyms and synonyms in data science with examples.

Synonyms for concepts:
When you determine that two concepts are synonyms (say, sofa and couch), you use the class expression owl:equivalentClass. The entailment here is that any instance that was a member of class sofa is now also a member of class couch and vice versa. One of the nice things about this approach is that "context" of this equivalence is automatically scoped to the ontology in which you make the equivalence statement. If you had a very small mapping ontology between a furniture ontology and an interior decorating ontology, you could say in the map that these two are equivalent. In another situation if you needed to retain the (subtle) difference between a couch and a sofa, you do that by merely not including the mapping ontology that declared them equivalent.
Homonyms for concepts:
As Led Zeppelin says, "and you know sometimes words have two meaningsā€¦" What happens when a "word" has two meanings is that we have what WordNet would call "word senses." In a particular language, a set of characters may represent more than one concept. One example is the English word "mole," for which WordNet has 6 word senses. The Semantic Web approach is to give each its own namespace; for instance, I might refer to the counterspy mole as cia:mole and the burrowing rodent as the mammal:mole. (These are shortened qnames for what would be full namespace names.) The nice thing about this is, if the CIA ever needed to refer to the rodent they could unambiguously refer to mammal:mole.
Credit

Homonyms- are words that have the same sound but have different in meaning.
2. Synonyms- are words that have the same or almost the same meaning.

Homonyms
Machine learning algorithms are now the subject of ethical debate. Bias, in layman's terms, is a pre-formed view created before facts are known. It applies to an estimating procedure's proclivity to provide estimations or predictions that are, on average, off goal in machine learning and data mining.
A policy's strength can be measured in a variety of ways, including confidence. "Decision trees" are diagrams that show how decisions are being made and what consequences are available. Rescale a statistic to match the scale of other variables in the model to normalise it.
Confidence is a statistician's metric for determining how reliable a sample is (we are 95 percent confident that the average blood sugar in the group lies between X and Y, based on a sample of N patients). Decision tree algorithms are methods that divide data across pieces that are becoming more and more homogeneous in terms of the outcome measure as they advance.
A graph is a graphical representation of data that statisticians call plots and charts. A graph seems to be an information structure that contains the ties and links among items, according to computer programmers. The act of arranging relational databases and their columns such that table relationships are consistent is known as normalisation.
Synonyms
Statisticians use the terms record, instance, sample, or example to describe their data. In computer science and machine learning, this can be called an attribute, input variable, or feature. The term "estimation" is also used, though its use is generally limited to numeric outcomes.
Statisticians call the non-time-series data format a record, or record. In statistics, estimation more often refers to the use of a sample statistic to measure something. Predictive modelling involves developing aggregations of low-level predictors into more informative "features".
The spreadsheet format, in which each column is still a variable, so each row is a record, is perhaps the most common non-time-series data type. Modeling in machine learning and artificial intelligence often begins with some very low-level prediction data.

Related

short text syntactic classification

I am newbie at machine learning and data mining. Here's the problem: I have one input variable currently which is a small text comprises of non-standard nouns and want to classify in target category. I have about 40% of total training data from entire dataset. Rest 60% we would like to classify as accurately as possible. Followings are some input variables across multiple observations those are assigned 'LEAD_GENERATION_REPRESENTATIVE' title.
"Business Development Representative MFG"
"Business Development Director Retail-KK"
"Branch Staff"
"Account Development Rep"
"New Business Rep"
"Hong Kong Cloud"
"Lead Gen, New Business Development"
"Strategic Alliances EMEA"
"ENG-BDE"
I think above give idea what I mean by non-standard nouns. I can see here few tokens that are meaningful like 'development','lead','rep' Others seems random without any semantic but they may be appearing multiple times in data. Another thing is some tokens like 'rep','account' can appear for multiple category. I think that will make weighting/similarity a challenging task.
My first question is "is it worth automating this kind of classification?"
Second : "is it a good problem to learn machine learning classification?". There are only 30k such entries and handful of target categories. I can find someone to manually do that which will also be more accurate.
here's my take on this problem so far:
Full-text engine: like solr to build index and query rules that draws matches based on tokens - word, phrase, synonyms, acronyms, descriptions. I can get someone to define detail taxonomy for each category. Use boosting, use pluggable scoring lib
Machine learning:
Naive Bayes classification
Decision tree
SVM
I have tried out Solr for this with revers lookup though since I don't have taxonomy available at moment. It seems like I can get about 80% true positives (I'll have to dig more into confusion matrix to reduce false positives). My query is bunch of booleans terms and phrases with proximity and boosts; negations to reduce errors. I'm afraid this approach may lead to overfit and wont scale.
I am aware that people usually tries multiple modeling techniques to achieve which one works best or derives combination of techniques. I want to understand this problem with feasibility and complexity point of view. If its too broad question please just comment on feasibility of solution.

Harmonizing terms in two different RDF ontologies

At first this problem seems trivial: given two ontologies, which term in ontology A best refers to a term in ontology B.
But its simplicity is deceptive: this problem is extremely hard and has currently lead to thousands of academic publications, without any consensus on how to solve this problem.
Naively, one would expect that simply looking at the term "Heart Attack" in both ontologies would suffice.
However, ontologies almost never encode the same phrase.
In simple cases "Heart Attack" might be coded as "Heart Attacks", or "Heart attack (non-fatal)", but in more complicated cases it might only be coded as "Myocardial infarction".
In other cases it is even more complicated, for example dealing with compound (composed) terms.
More importantly, simply matching the term (or string) ignores the "ontological structure".
What if "Heart Attack" in ontology A is coded as caused-by high blood pressure, whereas in ontology B it might be coded as withdrawl-from-trial-non-fatal.
In this case it might be valid to match the two terms, but not trivially so.
And this assumes the equivalent term exists at all.
It's a classical problem called Semantic/Ontology Matching, Alignment, or Harmonization. The research out there involves lexical similarity, term usage in free text, graph homomorphisms, curated mappings (like MeSH/WordNet), topic modeling, and logical inference (first- or higher-order logic). But which is the most user friendly and production ready solution, that can be integrated into a Java(/Clojure) or Python app? I've looked at Ontology matching: A literature review but they don't seem to recommend anything ... any suggestions or experiences?
Have a look at http://oaei.ontologymatching.org/2014/results/ . There were several tracks open for matchers to be sent in and be evaluated. Not every matcher participates in every track. So you might want to read the track descriptions and pick one that seems to be the most similar to your problem. For example if you don't have to deal with multiple languages you probably don't have to check the MultiFarm track. After that check the results by having a look at Recall, Precision and F-Measure and decide for yourself. You also might want to check out some earlier years.

Is the HTM cortical learning algorithm defined by Numenta's paper restricted by Euclidean geometry?

Specifically, their most recent implementation.
http://www.numenta.com/htm-overview/htm-algorithms.php
Essentially, I'm asking whether non-euclidean relationships, or relationships in patterns that exceed the dimensionality of the inputs, can be effectively inferred by the algorithm in its present state?
HTM uses Euclidean geometry to determine "neighborship" when analyzing patterns. Consistently framed input causes the algorithm to exhibit predictive behavior, and sequence length is practically unlimited. This algorithm learns very well - but I'm wondering whether it has the capacity to infer nonlinear attributes from its input data.
For example, if you input the entire set of texts from Project Gutenberg, it's going to pick up on the set of probabilistic rules that comprise English spelling, grammar, and readily apparent features from the subject matter, such as gender associations with words, and so forth. These are first level "linear" relations, and can be easily defined with probabilities in a logical network.
A nonlinear relation would be an association of assumptions and implications, such as "Time flies like an arrow, fruit flies like a banana." If correctly framed, the ambiguity of the sentence causes a predictive interpretation of the sentence to generate many possible meanings.
If the algorithm is capable of "understanding" nonlinear relations, then it would be able to process the first phrase and correctly identify that "Time flies" is talking about time doing something, and "fruit flies" are a type of bug.
The answer to the question is probably a simple one to find, but I can't decide either way. Does mapping down the input into a uniform, 2d, Euclidean plane preclude the association of nonlinear attributes of the data?
If it doesn't prevent nonlinear associations, my assumption would then be that you could simply vary the resolution, repetition, and other input attributes to automate the discovery of nonlinear relations - in effect, adding a "think harder" process to the algorithm.
From what I understand of HTM's, the structure of layers and columns mimics the structure of the neocortex. See appendix B here: http://www.numenta.com/htm-overview/education/HTM_CorticalLearningAlgorithms.pdf
So the short answer would be that since the brain can understand non-linear phenomenon with this structure, so can an HTM.
Initial, instantaneous sensory input is indeed mapped to 2D regions within an HTM. This does not limit HTM's to dealing with 2D representations any more than a one dimensional string of bits is limited to representing only one dimensional things. It's just a way of encoding stuff so that sparse distributed representations can be formed and their efficiencies can be taken advantage of.
To answer your question about Project Gutenberg, I don't think an HTM will really understand language without first understanding the physical world on which language is based and creates symbols for. That said, this is a very interesting sequence for an HTM, since predictions are only made in one direction, and in a way the understanding of what's happening to the fruit goes backwards. i.e. I see the pattern 'flies like a' and assume the phrase applies to the fruit the same way it did to time. HTM's do group subsequent input (words in this case) together at higher levels, so if you used Fuzzy Grouping (perhaps) as Davide Maltoni has shown to be effective, the two halves of the sentence could be grouped together into the same high level representation and feedback could be sent down linking the two specific sentences. Numenta, to my knowledge has not done too much with feedback messages yet, but it's definitely part of the theory.
The software which runs the HTM is called NuPIC (Numenta Platform for Intelligent Computing). A NuPIC region (representing a region of neocortex) can be configured to either use topology or not, depending on the type of data it's receiving.
If you use topology, the usual setup maps each column to a set of inputs which is centred on the corresponding position in the input space (the connections will be selected randomly according to a probability distribution which favours the centre). The spatial pattern recognising component of NuPIC, known as the Spatial Pooler (SP), will then learn to recognise and represent localised topological features in the data.
There is absolutely no restriction on the "linearity" of the input data which NuPIC can learn. NuPIC can learn sequences of spatial patterns in extremely high-dimensional spaces, and is limited only by the presence (or lack of) spatial and temporal structure in the data.
To answer the specific part of your question, yes, NuPIC can learn non-Euclidean and non-linear relationships, because NuPIC is not, and cannot be modelled by, a linear system. On the other hand, it seems logically impossible to infer relationships of a dimensionality which exceeds that of the data.
The best place to find out about HTM and NuPIC, its Open Source implementation, is at NuPIC's community website (and mailing list).
Yes, It can do non-linear. Basically it is multilayer. And all multilayer neural networks can infer non linear relationships. And I think the neighborship is calculated locally. If it is calcualted locally then globally it can be piece wise non linear for example look at Local Linear Embedding.
Yes HTM uses euclidean geometry to connect synapses, but this is only because it is mimicking a biological system that sends out dendrites and creates connections to other nearby cells that have strong activation at that point in time.
The Cortical Learning Algorithm (CLA) is very good at predicting sequences, so it would be good at determining "Time flies like an arrow, fruit flies like a" and predict "banana" if it has encountered this sequence before or something close to it. I don't think it could infer that a fruit fly is a type of insect unless you trained it on that sequence. Thus the T for Temporal. HTMs are sequence association compressors and retrievers (a form of memory). To get the pattern out of the HTM you play in a sequence and it will match the strongest representation it has encountered to date and predict the next bits of the sequence. It seems to be very good at this and the main application for HTMs right now are predicting sequences and anomalies out of streams of data.
To get more complex representations and more abstraction you would cascade a trained HTMs outputs to another HTMs inputs along with some other new sequence based input to correlate to. I suppose you could wire in some feedback and do some other tricks to combine multiple HTMs, but you would need lots of training on primitives first, just like a baby does, before you will ever get something as sophisticated as associating concepts based on syntax of the written word.
ok guys, dont get silly, htms just copy data into them, if you want a concept, its going to be a group of the data, and then you can have motor depend on the relation, and then it all works.
our cortex, is probably way better, and actually generates new images, but a computer cortex WONT, but as it happens, it doesnt matter, and its very very useful already.
but drawing concepts from a data pool, is tricky, the easiest way to do it is by recording an invarient combination of its senses, and when it comes up, associate everything else to it, this will give you organism or animal like intelligence.
drawing harder relations, is what humans do, and its ad hoc logic, imagine a set explaining the most ad hoc relation, and then it slowly gets more and more specific, until it gets to exact motor programs... and all knowledge you have is controlling your motor, and making relations that trigger pathways in the cortex, and tell it where to go, from the blast search that checks all motor, and finds the most successful trigger.
woah that was a mouthful, but watch out dummies, you wont get no concepts from a predictive assimilator, which is what htm is, unless you work out how people draw relations in the data pool, like a machine, and if you do that, its like a program thats programming itself.
no shit.

Structured, factored and atomic representation?

I am currently reading "Artificial Intelligence: A modern Approach". Though the terminology factored, structured and atomic representation is confusing what do these mean exactly?
In relation with programming...
Thanks
I'm not thrilled with the lines that Russell and Norvig draw, but: Generally, when you're using AI techniques to solve a problem, you're going to have a programmed model of the situation. Atomic/factored/structured is a qualitative measure of how much "internal structure" those models have, from least to most.
Atomic models have no internal structure; the state either does or does not match what you're looking for. In a sliding tile puzzle, for instance, you either have the correct alignment of tiles or you do not.
Factored models have more internal structure, although exactly what will depend on the problem. Typically, you're looking at variables or performance metrics of interest; in a sliding puzzle, this might be a simple heuristic like "number of tiles out of place," or "sum of manhatten distances."
Structured models have still more; again, exactly what depends on the problem, but they're often relations either of components of the model to itself, or components of the model to components of the environment.
It is very easy, especially when looking at very simple problems like the sliding tile, to unconsciously do all the hard intelligence work yourself, at a glance, and forget that your model doesn't have all your insight. For example, if you were to make a program to do a graph search technique on the sliding puzzle, you'd probably make some engine that took as input a puzzle state and an action, and generated a new puzzle state from that. The puzzle states are still atomic, but you the programmer are using a much more detailed model to link those inputs and outputs together.
I like explanation given by Novak. My 2 cents is to make clarity on difference between factored vs structured. Here is extract from definitions:
An atomic representation is one in which each state is treated as a
black box.
A factored representation is one in which the states are
defined by set of features.
A structured representation is one in which the states are expressed in form of objects and relations between them. Such knowledge about relations called facts.
Examples:
atomicState == goal: Y/N // Is goal reached?
It is the only question we can ask to black box.
factoredState{18} == goal{42}: N // Is goal reached?
diff( goal{42}, factoredState{18}) = 24 // How much is difference?
// some other questions. the more features => more type of questions
The simplest factored state must have at least one feature (of some type), that gives us ability to ask more questions. Normally it defines quantitative difference between states. The example has one feature of integer type.
11grade#schoolA{John(Math=A-), Marry(Music=A+), Job1(doMath)..} == goal{50% ready for jobs}
The key here - structured representation, allows higher level of formal logical reasoning at the search. See First-Order Logic #berkley for introductory info.
This subject easily confuses a practitioner (especially beginner) but makes great sense for comparing different goal searching algorithms. Such "world" state representation classification logically separates algorithms into different classes. It is very useful to draw lines in academic research and compare apples to apples when reasoning academically.

Pruning Deductions in Expert Systems

In a rule system, or any reasoning system that deduces facts via forward-chaining inference rules, how would you prune "unnecessary" branches? I'm not sure what the formal terminology is, but I'm just trying to understand how people are able to limit their train-of-thought when reasoning over problems, whereas all semantic reasoners I've seen appear unable to do this.
For example, in John McCarthy's paper An Example for Natural Language Understanding and the AI Problems It Raises, he describes potential problems in getting a program to intelligently answer questions about a news article in the New York Times. In section 4, "The Need For Nonmonotonic Reasoning", he discusses the use of Occam's Razer to restrict the inclusion of facts when reasoning about the story. The sample story he uses is one about robbers who victimize a furniture store owner.
If a program were asked to form a "minimal completion" of the story in predicate calculus, it might need to include facts not directly mentioned in the original story. However, it would also need some way of knowing when to limit its chain of deduction, so as not to include irrelevant details. For example, it might want to include the exact number of police involved in the case, which the article omits, but it won't want to include the fact that each police officer has a mother.
Good Question.
From your Question i think what you refer to as 'pruning' is a model-building step performed ex ante--ie, to limit the inputs available to the algorithm to build the model. The term 'pruning' when used in Machine Learning refers to something different--an ex post step, after model construction and that operates upon the model itself and not on the available inputs. (There could be a second meaning in the ML domain, for the term 'pruning.' of, but i'm not aware of it.) In other words, pruning is indeed literally a technique to "limit its chain of deduction" as you put it, but it does so ex post, by excision of components of a complete (working) model, and not by limiting the inputs used to create that model.
On the other hand, isolating or limiting the inputs available for model construction--which is what i think you might have had in mind--is indeed a key Machine Learning theme; it's clearly a factor responsible for the superior performance of many of the more recent ML algorithms--for instance, Support Vector Machines (the insight that underlies SVM is construction of the maximum-margin hyperplane from only a small subset of the data, i.e, the 'support vectors'), and Multi-Adaptive Regression Splines (a regression technique in which no attempt is made to fit the data by "drawing a single continuous curve through it", instead, discrete section of the data are fit, one by one, using a bounded linear equation for each portion, ie., the 'splines', so the predicate step of optimal partitioning of the data is obviously the crux of this algorithm).
What problem is solving by pruning?
At least w/r/t specific ML algorithms i have actually coded and used--Decision Trees, MARS, and Neural Networks--pruning is performed on an initially over-fit model (a model that fits the training data so closely that it is unable to generalize (accurately predict new instances). In each instance, pruning involves removing marginal nodes (DT, NN) or terms in the regression equation (MARS) one by one.
Second, why is pruning necessary/desirable?
Isn't it better to just accurately set the convergence/splitting criteria? That won't always help. Pruning works from "the bottom up"; the model is constructed from the top down, so tuning the model (to achieve the same benefit as pruning) eliminates not just one or more decision nodes but also the child nodes that (like trimming a tree closer to the trunk). So eliminating a marginal node might also eliminate one or more strong nodes subordinate to that marginal node--but the modeler would never know that because his/her tuning eliminated further node creation at that marginal node. Pruning works from the other direction--from the most subordinate (lowest-level) child nodes upward in the direction of the root node.

Resources