My Data base professor uses this for the Conceptual design:
(I had to fix the words cause it wasn't in English, sorry)
Due to a few personal reasons I couldn't watch his classes, and the book I'm using won't tell me why Teacher has this double line connection with Teaches.
It is all over the place on the notes of his, and I can't study properly like this.
Does anyone care to explain it to me?
This double line says that a teacher must teach at least 1 subject. Double line shows the full participation and single line shows partial participation and it is also known as minimum cardinality.
Related
What i mean is that can there be multiple different forms of diagram of the same language? Can it be drawn with multiple solutions? Or each language has only one solution in DFA? I attended a pop quiz today. Drew a solution and tried multiple strings. Each of those were accepted but i didn't get any points for it. Didn't get any feedback from my TA as why it was considered wrong.
The question was. Let L = {w | w contains an odd number of 0s or at least two 1s}.
This is what i did (sorry had to use ms paint).
If you notice a bit more carefully then 0101 is a string in your language but it is not accepted by your automata. Also to answer your other question, yes, there can be multiple DFAs which accept the same language. A trivial example would be the language 0* (Think about it if you are still interested, haha!).
P.S. - Just noticed a comment which pointed out the counter-example but I still went ahead. Sorry!
I am automating a process which asks questions (via SMS but shouldn't matter) to real people. The questions have yes/no answers, but the person might respond in a number of ways such as: sure, not at this time, yeah, never or in any other way that they might. I would like to attempt to parse this text and determine if it was a yes or no answer (of course it might not always be right).
I figured the ideas and concepts to do this might already exist as it seems like a common task for an AI, but don't know what it might be called so I can't find information on how I might implement it. So my questions is, have algorithms been developed to do this kind of parsing and if so where can I find more information on how to implement them?
This can be viewed as a binary (yes or no) classification task. You could write a rule-based model to classify or a statistics-based model.
A rule-based model would be like if answer in ["never", "not at this time", "nope"] then answer is "no". When spam filters first came out they contained a lot of rules like these.
A statistics-based model would probably be more suitable here, as writing your own rules gets tiresome and does not handle new cases as well.
For this you need to label a training dataset. After a little preprocessing (like lowercasing all the words, removing punctuation and maybe even a little stemming) you could get a dataset like
0 | never in a million years
0 | never
1 | yes sir
1 | yep
1 | yes yes yeah
0 | no way
Now you can run classification algorithms like Naive Bayes or Logistic Regression over this set (after you vectorize the words in either binary, which means is the word present or not, word count, which means the term frequency, or a tfidf float, which prevent bias to longer answers and common words) and learn which words more often belong to which class.
In the above example yes would be strongly correlated to a positive answer (1) and never would be strongly related to a negative answer (0). You could work with n-grams so a not no would be treated as a single token in favor of the positive class. This is called the bag-of-words approach.
To combat spelling errors you can add a spellchecker like Aspell to the pre-processing step. You could use a charvectorizer too, so a word like nno would be interpreted as nn and no and you catch errors like hellyes and you could trust your users to repeat spelling errors. If 5 users make the spelling error neve for the word never then the token neve will automatically start to count for the negative class (if labeled as such).
You could write these algorithms yourself (Naive Bayes is doable, Paul Graham has wrote a few accessible essays on how to classify spam with Bayes Theorem and nearly every ML library has a tutorial on how to do this) or make use of libraries or programs like Scikit-Learn (MultinomialNB, SGDclassifier, LinearSVC etc.) or Vowpal Wabbit (logistic regression, quantile loss etc.).
Im thinking on top of my head, if you get a response which you dont know if its yes / no, you can keep the answers in a DB like unknown_answers and 2 more tables as affirmative_answers / negative_answers, then in a little backend system, everytime you get a new unknown_answer you qualify them as yes or no, and there the system "learns" about it and with time, you will have a very big and good database of affirmative / negative answers.
I'm wondering is there an algorithm or a library which helps me identify the components in an English which has no meaning? e.g., very serious grammar error? If so, could you explain how it works, because I would really like to implement that or use that for my own projects.
Here's a random example:
In the sentence: "I closed so etc page hello the door."
As a human, we can quickly identify that [so etc page hello] does not make any sense. Is it possible for a machine to point out that the string does not make any sense and also contains grammar errors?
If there's such a solution, how precise can that be? Is it possible, for example, given a clip of an English sentence, the algorithm returns a measure, indicating how meaningful, or correct that clip is? Thank you very much!
PS: I've looked at CMU's link grammar as well as the NLTK library. But still I'm not sure how to use for example link grammar parser to do what I would like to do as the if the parser doesn't accept the sentence, I don't know how to tweak it to tell me which part it is not right.. and I'm not sure whether NLTK supported that.
Another thought I had towards solving the problem is to look at the frequencies of the word combination. Since I'm currently interested in correcting very serious errors only. If I define the "serious error" to be the cases where words in a clip of a sentence are rarely used together, i.e., the frequency of the combo should be much lower than those of the other combos in the sentence.
For instance, in the above example: [so etc page hello] these four words really seldom occur together. One intuition of my idea comes from when I type such combo in Google, no related results jump out. So is there any library that provides me such frequency information like Google does? Such frequencies may give a good hint on the correctness of the word combo.
I think that what you are looking for is a language model. A language model assigns a probability to each sentence of k words appearing in your language. The simplest kind of language models are n-grams models: given the first i words of your sentence, the probability of observing the i+1th word only depends on the n-1 previous words.
For example, for a bigram model (n=2), the probability of the sentence w1 w2 ... wk is equal to
P(w1 ... wk) = P(w1) P(w2 | w1) ... P(wk | w(k-1)).
To compute the probabilities P(wi | w(i-1)), you just have to count the number of occurrence of the bigram w(i-1) wi and of the word w(i-1) on a large corpus.
Here is a good tutorial paper on the subject: A Bit of Progress in Language Modeling, by Joshua Goodman.
Yes, such things exist.
You can read about it on Wikipedia.
You can also read about some of the precision issues here.
As far as determining which part is not right after determining the sentence has a grammar issue, that is largely impossible without knowing the author's intended meaning. Take, for example, "Over their, dead bodies" and "Over there dead bodies". Both are incorrect, and could be fixed either by adding/removing the comma or swapping their/there. However, these result in very different meanings (yes, the second one would not be a complete sentence, but it would be acceptable/understandable in context).
Spell checking works because there are a limited number of words against which you can check a word to determine if it is valid (spelled correctly). However, there are infinite sentences that can be constructed, with infinite meanings, so there is no way to correct a poorly written sentence without knowing what the meaning behind it is.
I think what you are looking for is a well-established library that can process natural language and extract the meanings.
Unfortunately, there's no such library. Natural language processing, as you probably can imagine, is not an easy task. It is still a very active research field. There are many algorithms and methods in understanding natural language, but to my knowledge, most of them only work well for specific applications or words of specific types.
And those libraries, such as the CMU one, seems to be still quite rudimental. It can't do what you want to do (like identifying errors in English sentence). You have to develop algorithm to do that using the tools that they provide (such as sentence parser).
If you want to learn about it check out ai-class.com. They have some sections that talks about processing language and words.
Tommy, jill and travelor belong to the Sc club.Every member of sc club is either a surfer or a bike rider or both.No bike rider likes a rainy day and all the surfers like a sunny day.Jill like whatever Tommy likes and likes whatever tommy dislikes.Tommy likes a rainy day and a sunny day.
I want to represent the above information in first order predicate logic in such a way that I can represent the question " who is a member of SC club who is a bike rider but not a surfer?" as a predicate logic expression.
What first order inference rule I should pick- forward chaining, backward chaining, or resolution refutation.??
First off this question sounds like it is being asked directly out of a book. If that is the case, it might help if you reference the book in your question. If you are truly stuck after trying to work it out, then ask yourself this...
How does each inference rule work, and what purpose does it serve toward finding solutions in first order logic problems? Once you know that, either...
you wont understand it, but you will have a better question to ask about a particular technique
the obvious answer will jump out at you
you will realize which of those techniques can work for your problem and just choose one
Showing that you have taken some time to try and figure out the problem before posting a book style question on stackoverflow will make other people more likely to help you. You will also have questions that show your lack of conceptual understanding, which is a very good reason to post a question here, as opposed to "answer my homework" sounding questions such as this.
I'm trying to implement application that can determine meaning of sentence, by dividing it to smaller pieces. So I need to know what words are subject, object etc. so that my program can know how to handle this sentence.
This is an open research problem. You can get an overview on Wikipedia, http://en.wikipedia.org/wiki/Natural_language_processing. Consider phrases like "Time flies like an arrow, fruit flies like a banana" - unambiguously classifying words is not easy.
You should look at the Natural Language Toolkit, which is for exactly this sort of thing.
See this section of the manual: Categorizing and Tagging Words - here's an extract:
>>> text = nltk.word_tokenize("And now for something completely different")
>>> nltk.pos_tag(text)
[('And', 'CC'), ('now', 'RB'), ('for', 'IN'), ('something', 'NN'),
('completely', 'RB'), ('different', 'JJ')]
"Here we see that and is CC, a coordinating conjunction; now and completely are RB, or adverbs; for is IN, a preposition; something is NN, a noun; and different is JJ, an adjective."
I guess there is not "simple" way to do this. You have to build a linguistic analyzer (which is quite possible), however, a language as a lot of exceptional cases. And that is what makes implementing a linguistic analyzer that hard.
The specific problem you mention, the identification of the subject and objects of a clause, is accomplished by syntactic parsing. You can get a good idea of how parsing works by using this demo of parsing software developed by Stanford University.
However, syntactic parsing does not determine the meanining of a sentence, only its structure. Determining meaning (semantics) is a very hard problem in general and there is no technology that can really 'understand' a sentence in the same way that a human would. Although there is no general solution, you may be able to do something in a very restricted subject domain. For example, is the data you want to analyse about a narrow topic with a limited set of 'things' that people talk about?
StompChicken has given the right answer to this question, but I'd like to add that the concepts of subject and object are known as grammatical relations, and that Briscoe and Carroll's RASP is a parser that can go the extra step of deducing a list of relations from the parse.
Here's some example output from their demo page. It's an extract from the output for a sentence that begins "We describe a robust accurate domain-independent approach...":
(|ncsubj| |describe:2_VV0| |We:1_PPIS2| _)
(|dobj| |describe:2_VV0| |approach:7_NN1|)