I am implementing and analysing the Find-S Algorithm (which I understood quite well). However, for the testing part, I am not sure whether the order of the examples in the training set affect the output.
Is this known or still unproven?
The order of examples will not affect the output if the function which expands the hypothesis is associative -- that is, if f(f(h0),x1),x2) = f(f(h0,x2),x1) for all h0,x1,x2.
The order of instances will affect your output because when FIND-S try to maximally specific hypothesis, it looks attributes and their values. It is discussed in Tom Mitchell's Machine Learning Book under title of '2.4 FIND-S FINDING A MAXIMALLY SPECIFIC HYPOTHESIS'.
Related
I want to find atoms, within a pre-defined set of atoms, that are in all possible optimal solutions of an ASP problem.
I'll explain what i want to do. I have an optimization problem where multiple optimal answer sets can exist. An answer set consists of atoms that characterizes a solution, for example node(0), edge(1,2) , as well as some that are not relevant, like color(0).
I want to compute all optimal answer sets, only show atoms of the type node() and edge(), then find the specific relevant atoms (like node(0) and edge(1,0)) that exist in all optimal answer sets.
I know i can limit the output of certain atoms with the #show directive. I can also add the flags --opt-mode=optN to compute all optimal answer and limit the output to only those with --quiet=1.
As per guide, using --enum-mode cautious computes the intersection of all answer sets.
An example:
If i have two optimal answer sets:
Answer set 1: node(1) node(0) edge(1,0) edge(0,1) color(1)
Answer set 2: node(1) node(0) edge(1,0) color(0)
I want to the result node(1) node(0) edge(1,0)
I can run clingo with --opt-mode=optN --quiet=1 then manually search for all node() or edge() atoms that all answer sets have in common.
I can add #show node/1 #show edge/2 to my encoding then run clingo with --opt-mode=optN --quiet=1 --enum-mode cautious.
Are all those two options logically equivalent?
Is there a more efficient way of achieving what i want?
I'm unsure about the interplay between the #show directive and the --opt-mode, --quiet and --enum-mode flags. I have a problem where answer sets can be very large, so i would prefer using option 2 since it would use the least disk space.
Thank you in advance.
You are definitely on the right track. I'm unsure about the combination of options though, especially --quiet as it only prevents showing the non-optimal models, but they will still be counted as models for the --enum-mode=cautious thingy.
You should simply use the clingo python API to achieve your goal. First, compute one optimal solution. Then you do have the bound that you can give as an additional argument to the opt-mode and your approach should work.
https://potassco.org/clingo/python-api/current/clingo/
Please explain the differences about Traditional and Fuzzy Logics(FLS).
It will help understand about those systems to beginners(Like me).
Fuzzy Logic (FL) is a method of reasoning that resembles human reasoning. The approach of FL imitates the way of decision making in humans that involves all intermediate possibilities between digital values YES and NO.
The conventional logic block that a computer can understand takes precise input and produces a definite output as TRUE or FALSE, which is equivalent to a human’s YES or NO.
Traditional logic: a system of formal logic mainly concerned with the syllogistic forms of deduction that is based on Aristotle and includes some of the changes and elaborations made by the Stoics and the Scholastics: Aristotelian logic — compare immediate inference, opposition, subject-predicate, syllogism, symbolic logic
FOR MORE INFO ABOUT THE FUZZY lOGIC SEE THIS :
https://www.tutorialspoint.com/artificial_intelligence/artificial_intelligence_fuzzy_logic_systems.htm
I hope that I could help you to understand the difference.
I'm not sure what exactly I'm trying to ask. I want to be able to make some code that can easily take an initial and final state and some rules, and determine paths/choices to get there.
So think, for example, in a game like Starcraft. To build a factory I need to have a barracks and a command center already built. So if I have nothing and I want a factory I might say ->Command Center->Barracks->Factory. Each thing takes time and resources, and that should be noted and considered in the path. If I want my factory at 5 minutes there are less options then if I want it at 10.
Also, the engine should be able to calculate available resources and utilize them effectively. Those three buildings might cost 600 total minerals but the engine should plan the Command Center when it would have 200 (or w/e it costs).
This would ultimately have requirements similar to 10 marines # 5 minutes, infantry weapons upgrade at 6:30, 30 marines at 10 minutes, Factory # 11, etc...
So, how do I go about doing something like this? My first thought was to use some procedural language and make all the decisions from the ground up. I could simulate the system and branching and making different choices. Ultimately, some choices are going quickly make it impossible to reach goals later (If I build 20 Supply Depots I'm prob not going to make that factory on time.)
So then I thought weren't functional languages designed for this? I tried to write some prolog but I've been having trouble with stuff like time and distance calculations. And I'm not sure the best way to return the "plan".
I was thinking I could write:
depends_on(factory, barracks)
depends_on(barracks, command_center)
builds_from(marine, barracks)
build_time(command_center, 60)
build_time(barracks, 45)
build_time(factory, 30)
minerals(command_center, 400)
...
build(X) :-
depends_on(X, Y),
build_time(X, T),
minerals(X, M),
...
Here's where I get confused. I'm not sure how to construct this function and a query to get anything even close to what I want. I would have to somehow account for rate at which minerals are gathered during the time spent building and other possible paths with extra gold. If I only want 1 marine in 10 minutes I would want the engine to generate lots of plans because there are lots of ways to end with 1 marine at 10 minutes (maybe cut it off after so many, not sure how you do that in prolog).
I'm looking for advice on how to continue down this path or advice about other options. I haven't been able to find anything more useful than towers of hanoi and ancestry examples for AI so even some good articles explaining how to use prolog to DO REAL THINGS would be amazing. And if I somehow can get these rules set up in a useful way how to I get the "plans" prolog came up with (ways to solve the query) other than writing to stdout like all the towers of hanoi examples do? Or is that the preferred way?
My other question is, my main code is in ruby (and potentially other languages) and the options to communicate with prolog are calling my prolog program from within ruby, accessing a virtual file system from within prolog, or some kind of database structure (unlikely). I'm using SWI-Prolog atm, would I be better off doing this procedurally in Ruby or would constructing this in a functional language like prolog or haskall be worth the extra effort integrating?
I'm sorry if this is unclear, I appreciate any attempt to help, and I'll re-word things that are unclear.
Your question is typical and very common for users of procedural languages who first try Prolog. It is very easy to solve: You need to think in terms of relations between successive states of your world. A state of your world consists for example of the time elapsed, the minerals available, the things you already built etc. Such a state can be easily represented with a Prolog term, and could look for example like time_minerals_buildings(10, 10000, [barracks,factory])). Given such a state, you need to describe what the state's possible successor states look like. For example:
state_successor(State0, State) :-
State0 = time_minerals_buildings(Time0, Minerals0, Buildings0),
Time is Time0 + 1,
can_build_new_building(Buildings0, Building),
building_minerals(Building, MB),
Minerals is Minerals0 - MB,
Minerals >= 0,
State = time_minerals_buildings(Time, Minerals, Building).
I am using the explicit naming convention (State0 -> State) to make clear that we are talking about successive states. You can of course also pull the unifications into the clause head. The example code is purely hypothetical and could look rather different in your final application. In this case, I am describing that the new state's elapsed time is the old state's time + 1, that the new amount of minerals decreases by the amount required to build Building, and that I have a predicate can_build_new_building(Bs, B), which is true when a new building B can be built assuming that the buildings given in Bs are already built. I assume it is a non-deterministic predicate in general, and will yield all possible answers (= new buildings that can be built) on backtracking, and I leave it as an exercise for you to define such a predicate.
Given such a predicate state_successor/2, which relates a state of the world to its direct possible successors, you can easily define a path of states that lead to a desired final state. In its simplest form, it will look similar to the following DCG that describes a list of successive states:
states(State0) -->
( { final_state(State0) } -> []
; [State0],
{ state_successor(State0, State1) },
states(State1)
).
You can then use for example iterative deepening to search for solutions:
?- initial_state(S0), length(Path, _), phrase(states(S0), Path).
Also, you can keep track of states you already considered and avoid re-exploring them etc.
The reason you get confused with the example code you posted is essentially that build/1 does not have enough arguments to describe what you want. You need at least two arguments: One is the current state of the world, and the other is a possible successor to this given state. Given such a relation, everything else you need can be described easily. I hope this answers your question.
Caveat: my Prolog is rusty and shallow, so this may be off base
Perhaps a 'difference engine' approach would be appropriate:
given a goal like 'build factory',
backwards-chaining relations would check for has-barracks and tell you first to build-barracks,
which would check for has-command-center and tell you to build-command-center,
and so on,
accumulating a plan (and costs) along the way
If this is practical, it may be more flexible than a state-based approach... or it may be the same thing wearing a different t-shirt!
I'm wondering is there an algorithm or a library which helps me identify the components in an English which has no meaning? e.g., very serious grammar error? If so, could you explain how it works, because I would really like to implement that or use that for my own projects.
Here's a random example:
In the sentence: "I closed so etc page hello the door."
As a human, we can quickly identify that [so etc page hello] does not make any sense. Is it possible for a machine to point out that the string does not make any sense and also contains grammar errors?
If there's such a solution, how precise can that be? Is it possible, for example, given a clip of an English sentence, the algorithm returns a measure, indicating how meaningful, or correct that clip is? Thank you very much!
PS: I've looked at CMU's link grammar as well as the NLTK library. But still I'm not sure how to use for example link grammar parser to do what I would like to do as the if the parser doesn't accept the sentence, I don't know how to tweak it to tell me which part it is not right.. and I'm not sure whether NLTK supported that.
Another thought I had towards solving the problem is to look at the frequencies of the word combination. Since I'm currently interested in correcting very serious errors only. If I define the "serious error" to be the cases where words in a clip of a sentence are rarely used together, i.e., the frequency of the combo should be much lower than those of the other combos in the sentence.
For instance, in the above example: [so etc page hello] these four words really seldom occur together. One intuition of my idea comes from when I type such combo in Google, no related results jump out. So is there any library that provides me such frequency information like Google does? Such frequencies may give a good hint on the correctness of the word combo.
I think that what you are looking for is a language model. A language model assigns a probability to each sentence of k words appearing in your language. The simplest kind of language models are n-grams models: given the first i words of your sentence, the probability of observing the i+1th word only depends on the n-1 previous words.
For example, for a bigram model (n=2), the probability of the sentence w1 w2 ... wk is equal to
P(w1 ... wk) = P(w1) P(w2 | w1) ... P(wk | w(k-1)).
To compute the probabilities P(wi | w(i-1)), you just have to count the number of occurrence of the bigram w(i-1) wi and of the word w(i-1) on a large corpus.
Here is a good tutorial paper on the subject: A Bit of Progress in Language Modeling, by Joshua Goodman.
Yes, such things exist.
You can read about it on Wikipedia.
You can also read about some of the precision issues here.
As far as determining which part is not right after determining the sentence has a grammar issue, that is largely impossible without knowing the author's intended meaning. Take, for example, "Over their, dead bodies" and "Over there dead bodies". Both are incorrect, and could be fixed either by adding/removing the comma or swapping their/there. However, these result in very different meanings (yes, the second one would not be a complete sentence, but it would be acceptable/understandable in context).
Spell checking works because there are a limited number of words against which you can check a word to determine if it is valid (spelled correctly). However, there are infinite sentences that can be constructed, with infinite meanings, so there is no way to correct a poorly written sentence without knowing what the meaning behind it is.
I think what you are looking for is a well-established library that can process natural language and extract the meanings.
Unfortunately, there's no such library. Natural language processing, as you probably can imagine, is not an easy task. It is still a very active research field. There are many algorithms and methods in understanding natural language, but to my knowledge, most of them only work well for specific applications or words of specific types.
And those libraries, such as the CMU one, seems to be still quite rudimental. It can't do what you want to do (like identifying errors in English sentence). You have to develop algorithm to do that using the tools that they provide (such as sentence parser).
If you want to learn about it check out ai-class.com. They have some sections that talks about processing language and words.
I'm trying to implement application that can determine meaning of sentence, by dividing it to smaller pieces. So I need to know what words are subject, object etc. so that my program can know how to handle this sentence.
This is an open research problem. You can get an overview on Wikipedia, http://en.wikipedia.org/wiki/Natural_language_processing. Consider phrases like "Time flies like an arrow, fruit flies like a banana" - unambiguously classifying words is not easy.
You should look at the Natural Language Toolkit, which is for exactly this sort of thing.
See this section of the manual: Categorizing and Tagging Words - here's an extract:
>>> text = nltk.word_tokenize("And now for something completely different")
>>> nltk.pos_tag(text)
[('And', 'CC'), ('now', 'RB'), ('for', 'IN'), ('something', 'NN'),
('completely', 'RB'), ('different', 'JJ')]
"Here we see that and is CC, a coordinating conjunction; now and completely are RB, or adverbs; for is IN, a preposition; something is NN, a noun; and different is JJ, an adjective."
I guess there is not "simple" way to do this. You have to build a linguistic analyzer (which is quite possible), however, a language as a lot of exceptional cases. And that is what makes implementing a linguistic analyzer that hard.
The specific problem you mention, the identification of the subject and objects of a clause, is accomplished by syntactic parsing. You can get a good idea of how parsing works by using this demo of parsing software developed by Stanford University.
However, syntactic parsing does not determine the meanining of a sentence, only its structure. Determining meaning (semantics) is a very hard problem in general and there is no technology that can really 'understand' a sentence in the same way that a human would. Although there is no general solution, you may be able to do something in a very restricted subject domain. For example, is the data you want to analyse about a narrow topic with a limited set of 'things' that people talk about?
StompChicken has given the right answer to this question, but I'd like to add that the concepts of subject and object are known as grammatical relations, and that Briscoe and Carroll's RASP is a parser that can go the extra step of deducing a list of relations from the parse.
Here's some example output from their demo page. It's an extract from the output for a sentence that begins "We describe a robust accurate domain-independent approach...":
(|ncsubj| |describe:2_VV0| |We:1_PPIS2| _)
(|dobj| |describe:2_VV0| |approach:7_NN1|)