Hard to pin non-regular language with pumping lemma - theory

I'm having trouble proving a particular language is non-regular. The language is defined as
La = { wz: w,z ∈ {0,1}* and |w| > |z|}
I don't know how to approach this one. No matter what string I choose, I always run into the issue where w and z are moving targets for me; I haven't been able to create a string which couldn't be pumped or otherwise contradicted. Any thoughts on the right direction for this one?

This problem was part of a homework set, and apparently this question wasn't worded properly and is in fact regular.

Related

constructing a non deterministic turing machine

Draw the diagram of a two tape Non deterministic Turing Machine M that decides the language
L={w∈Σ* | w=uuu ∈Σ* }
if i could get help explaining the steps how to construct the NDTM (linguistically), I believe I could draw the diagram but I couldnt come out with an answer..
thank you
By u*u*u (viewed in the edit history), I presume what you intend is the language of all words of the form u^3 (u repeated three times) where u is any string over the alphabet.
Our NDTM needs to accept strings in the language in at least one way, and it must never accept anything not in the language. In particular, the key is that an NDTM can reject strings in the language, as long as some path through the NDTM does accept every string in the language.
Given that, our first step can be do guess about the length of u. The NDTM can mark three tape symbols (say, by writing versions of the symbols that are underlined) by nondeterministically transitioning from state q0 to q1 then q2 at arbitrary points while scanning right. Then, we can reset the tape head and use a deterministic TM to answer the question: did the split we guessed in the first step result in a string of the form u^3?
This is deterministic since we know the delineation of parts. We can check the first two parts (say, by bouncing back ad forth and marking symbols we've already processed), and then the second two parts (using the same technique, but applied to the 2nd and 3rd parts).
We have reduced the problem to that of checking whether a string is of the form w|w where we know the split. This deterministic TM is easier to come up with. When we put it after the NDTM that guesses about how to split up the initial input, we get a NDTM that can (and for exactly one guess, does) accept any string of the form u^3, but cannot possibly accept anything else. This is what we were after and we are done.

Does this proof really prove undecidability of halting?

I want to ask a couple questions about the following proof. The proof originally came from a textbook and then a question on stackoverflow below.
How does this proof, that the halting problem‍​ is undecidable, work?
Question 1:
Does the proof below essentially make H a simulator for its input machine?
In other words, is there an important difference between saying H = M and the following description from the proof?
H([M,w]) = {accept if M accepts w}
= {reject if M does not accept w.}
Question 2:
How is my following comments correct or incorrect?
I thought the halting problem was the problem of deciding if a given machine will halt regardless of its output(accept/reject). If a solution exists for a halting problem, it has to be something that analyses source code like a compiler/decompiler/disassembler instead of actually running it. If it needed to run it, obviously it would never determine on a "no" answer.
Noticing that apparent problem in the proof, the whole proof seems not to show undecidability of the halting problem.
The proof instead seems to show this:
The following algorithm will not halt:
boolean D()
{
return not D();
}
Following is the proof in question retyped from Intro to the Theory of Computation by Sipser.
THE HALTING PROBLEM IS UNDECIDABLE
Now we are ready to prove Theorem 4.11, the undecidability of the language
ATM = {[M,w] | M is a TM and M accepts w}.
PROOF:
We assume that ATM is decidable and obtain a contradiction. Suppose that H is a decider for ATM. On input , where M is a TM and w is a string, H halts and accepts if M accepts w. Furthermore, H halts and rejects if M fails to accept w. In other words, we assume that H is a TM, where
H([M,w]) = {accept if M accepts w}
= {reject if M does not accept w.}
Now we construct a new Turing machine D with H as a subroutine. This new TM calls H to determine what M does when the input to M is its own description . Once D has determined this information, it does the opposite. That is, it rejects if M accepts and accepts if M does not accept. The following is a description of D.
D = "On input [M], where M is a TM:
1. Run H on input [M, [M]].
2. Output the opposite of what H outputs; that is, if H accepts, reject and if H rejects, accept."
Don't be confused by the idea of running a machine on its own description! That is similar to running a program with itself as input, something that does occasionally occer in practice. For example, a compiler is a program that translates other programs. A compiler for the language Pascal may itself be written in Pascal, so running that program on itself would make sense. In summary,
D([M]) = { accept if M does not accept [M]
= { reject if M accepts [M]
What happens when we run D with its own description as input> In that case we get:
D([D]) = {accept if D does not accept [D]
= {reject if D accepts [D]
No matter what D does, it is forces to do the opposite, which is obviously a contradiction. Thus neither TM D nor TM H can exist.
In other words, is there an important difference between saying H = M and the following description from the proof?
The H machine is called Universal Turing Machine (UTM) and is able to simulate any other Turing Machine, including itself.
If M is an Universal Turing Machine like H, it is ok to say H = M, otherwise this would be weird.
I thought the halting problem was the problem of deciding if a given machine will halt regardless of its output(accept/reject). If a solution exists for a halting problem, it has to be something that analyses source code like a compiler/decompiler/disassembler instead of actually running it. If it needed to run it, obviously it would never determine on a "no" answer.
That is why the proof works based on contradiction and it is kind hard to understand.
Basically it assumes first that exists such a machine that answers "yes" or "no" to any given input. [Hypothesis]
Let's call this machine Q.
Assuming Q is valid and it is an UTM, it can simulate another machine S that works following the steps below:
S reads an input (a program and its input)
S duplicates the input it just read
S calls Q passing the copied input
S waits for Q to answer (and based on our hypothesis it always will)
Let's imagine now the input Q(S, S). Q will receive the program S and the argument of S is S itself. This input will make S call Q indefinitely and will never stop.
Since Q and S were legal programs but there is a kind of input that makes Q never stop, Q is a machine impossible to built and therefore it is impossible to decide if a program S stops or not.
Therefore we have the proof that the halting problem is undecidable.
Sipser explains it well. Read it again now and see if you catch the idea :)
Now, on to your question again. The Turing Machine is our most powerful machine for representing problems. As a recognition machine, it has to go through the input and run the algorithm to determine if it is valid or not. It is impossible to know the output of an algorithm without running it.
The compiler is just a translator of syntax and little semantics. It cannot foresee how one will use the program and what the output will be.

Suggestions for writing a Parser that reads a CFG and removes left recursion

I need some advice about writing a parser (using C and Lex) that accepts a CFG and removes left recursion. Since the parser, needs to accept any string and a grammer, I am clueless about how to start. Although I am familiar with the algorithm to remove left recursoin (as mentioned here, I am clueless about how to start, and what will be the data structures involved. What is the best way to store the grammar, and the string. And how can I efficiently apply the algorithm. Please suggest. (Since this is homework, please do not provide me the code, instead any other help/pseudocode shall do) :)
You need to define a grammar P that describes grammars. This should be pretty easy.
You need to build a parser for P. It needn't be complicated, because P won't be complicated. Trust me. You can use this method to hand-code a recursive-descent parser: https://stackoverflow.com/a/2336769/120163
As P parses, it build up a data structure representing the grammar.
Such a data structure needs to record, for each rule, its left side, its length, and the tokens the comprise the rule body.
At this point, you've collected the rules in the data structure, and now you need to apply your left recursion algorithm by adjusting the rules.
Finally you need to print out the rules; they should reappear in the syntax you chose for P.

A detail on the Pumping Lemma for regular languages

I have one small question about the pumping lemma for regular languages - is it good enough to show that if a specific string belonging to a language L can't be pumped, then the language is irregular? For example - if I choose language L1 being of the form a^nb^n (ab, aabb, aaabbb ...) and I show that the string aabb can't be pumped and still be a part of L1, then is it valid for me to immediately conclude that L1 is irregular?
Cheers.
Yes, that's how the pumping lemma works. It's only useful for proving languages to not be regular. Satisfying the pumping lemma is only a necessary but not a sufficient condition for a language being regular.
(Nota bene: Likewise for context-free languages and the respective pumping lemma there)
It's not quite sufficient to demonstrate that a single, finite-length string does not pump. For a rigorous argument, you'd also have to prove that length of your example string is greater than the pumping length of the language. Usually you assume that some pumping length
p exists, then construct a string longer than p that cannot be pumped.
Yes, this is how it works, but be careful in showing how a string "cannot be pumped"
To do that you have to break a string w in L, into substrings xyz and show that some versions of xy^1z, for int i i>=0 lead to strings not in L, but are still accepted by DFA M (for M built to accept L), arriving at a contradiction.
Note that you cannot pick the location of y and therefore must consider 3 possible positions of it. That's the key, in my opinion.
The pumping lemma says:
If a language A is regular => there is a number p (pumping length) where, if s is any string in L such that |s| >= p, then s may be divided into three pieces s=xyz, satisfying the following condition:
xyiz is in L for each i>=0
|y|>=0
p>=|xy|
The right way to show that a certain language L is not regular is to suppose L regular and try to reach a contradiction.
Lets try to demonstrate that L = {0n1n}|n>=0} is not regular.
We start assuming to the contrary that L is regular.
You can think about this kind of demonstration as a game:
Challenger: He choose the pumping length p. You cannot do any presumption on it.
You: Now it is your turn: choose the "kind" of string that represents the irregularity of the language.
Lets say that the string is in the form 0p1p.
A good tip in this step is to try to limit the adversary next move.
Challenger: He presents to you a string s in the form 0p1p.
You: It's time to pump! If you chose correctly the form of the string in your previous move, you can do some assumption. In our case, for example, we know that the substring y consists only of 0s (at least one 0 because |y|>0), because |xy|<=p and first p-elements are 0s.
Now we show that it exists i>=0 such that xyiz is not in L. For example, for i=2 the string xyyz has more 0s than 1s and so is not a member of L. This case is a contradiction. => L is not regular.
Never forget to demonstrate why the pumped string cannot be a member of L.
Probably it is late to help you, but someone else may need this kind of explanation...maybe ^^
Cheers.

What is the Pumping Lemma in Layman's terms?

I saw this question, and was curious as to what the pumping lemma was (Wikipedia didn't help much).
I understand that it's basically a theoretical proof that must be true in order for a language to be in a certain class, but beyond that I don't really get it.
Anyone care to try to explain it at a fairly granular level in a way understandable by non mathematicians/comp sci doctorates?
The pumping lemma is a simple proof to show that a language is not regular, meaning that a Finite State Machine cannot be built for it. The canonical example is the language (a^n)(b^n). This is the simple language which is just any number of as, followed by the same number of bs. So the strings
ab
aabb
aaabbb
aaaabbbb
etc. are in the language, but
aab
bab
aaabbbbbb
etc. are not.
It's simple enough to build a FSM for these examples:
This one will work all the way up to n=4. The problem is that our language didn't put any constraint on n, and Finite State Machines have to be, well, finite. No matter how many states I add to this machine, someone can give me an input where n equals the number of states plus one and my machine will fail. So if there can be a machine built to read this language, there must be a loop somewhere in there to keep the number of states finite. With these loops added:
all of the strings in our language will be accepted, but there is a problem. After the first four as, the machine loses count of how many as have been input because it stays in the same state. That means that after four, I can add as many as as I want to the string, without adding any bs, and still get the same return value. This means that the strings:
aaaa(a*)bbbb
with (a*) representing any number of as, will all be accepted by the machine even though they obviously aren't all in the language. In this context, we would say that the part of the string (a*) can be pumped. The fact that the Finite State Machine is finite and n is not bounded, guarantees that any machine which accepts all strings in the language MUST have this property. The machine must loop at some point, and at the point that it loops the language can be pumped. Therefore no Finite State Machine can be built for this language, and the language is not regular.
Remember that Regular Expressions and Finite State Machines are equivalent, then replace a and b with opening and closing Html tags which can be embedded within each other, and you can see why it is not possible to use regular expressions to parse Html
It's a device intended to prove that a given language cannot be of a certain class.
Let's consider the language of balanced parentheses (meaning symbols '(' and ')', and including all strings that are balanced in the usual meaning, and none that aren't). We can use the pumping lemma to show this isn't regular.
(A language is a set of possible strings. A parser is some sort of mechanism we can use to see if a string is in the language, so it has to be able to tell the difference between a string in the language or a string outside the language. A language is "regular" (or "context-free" or "context-sensitive" or whatever) if there is a regular (or whatever) parser that can recognize it, distinguishing between strings in the language and strings not in the language.)
LFSR Consulting has provided a good description. We can draw a parser for a regular language as a finite collection of boxes and arrows, with the arrows representing characters and the boxes connecting them (acting as "states"). (If it's more complicated than that, it isn't a regular language.) If we can get a string longer than the number of boxes, it means we went through one box more than once. That means we had a loop, and we can go through the loop as many times as we want.
Therefore, for a regular language, if we can create an arbitrarily long string, we can divide it into xyz, where x is the characters we need to get to the start of the loop, y is the actual loop, and z is whatever we need to make the string valid after the loop. The important thing is that the total lengths of x and y are limited. After all, if the length is greater than the number of boxes, we've obviously gone through another box while doing this, and so there's a loop.
So, in our balanced language, we can start by writing any number of left parentheses. In particular, for any given parser, we can write more left parens than there are boxes, and so the parser can't tell how many left parens there are. Therefore, x is some amount of left parens, and this is fixed. y is also some number of left parens, and this can increase indefinitely. We can say that z is some number of right parens.
This means that we might have a string of 43 left parens and 43 right parens recognized by our parser, but the parser can't tell that from a string of 44 left parens and 43 right parens, which isn't in our language, so the parser can't parse our language.
Since any possible regular parser has a fixed number of boxes, we can always write more left parens than that, and by the pumping lemma we can then add more left parens in a way that the parser can't tell. Therefore, the balanced parenthesis language can't be parsed by a regular parser, and therefore isn't a regular expression.
Its a difficult thing to get in layman's terms, but basically regular expressions should have a non-empty substring within it that can be repeated as many times as you wish while the entire new word remains valid for the language.
In practice, pumping lemmas are not sufficient to PROVE a language correct, but rather as a way to do a proof by contradiction and show a language does not fit in the class of languages (Regular or Context-Free) by showing the pumping lemma does not work for it.
Basically, you have a definition of a language (like XML), which is a way to tell whether a given string of characters (a "word") is a member of that language or not.
The pumping lemma establishes a method by which you can pick a "word" from the language, and then apply some changes to it. The theorem states that if the language is regular, these changes should yield a "word" that is still from the same language. If the word you come up with isn't in the language, then the language could not have been regular in the first place.
The simple pumping lemma is the one for regular languages, which are the sets of strings described by finite automata, among other things. The main characteristic of a finite automation is that it only has a finite amount of memory, described by its states.
Now suppose you have a string, which is recognized by a finite automaton, and which is long enough to "exceed" the memory of the automation, i.e. in which states must repeat. Then there is a substring where the state of the automaton at the beginning of the substring is the same as the state at the end of the substring. Since reading the substring doesn't change the state it may be removed or duplicated an arbitrary number of times, without the automaton being the wiser. So these modified strings must also be accepted.
There is also a somewhat more complicated pumping lemma for context-free languages, where you can remove/insert what may intuitively be viewed as matching parentheses at two places in the string.
By definition regular languages are those recognized by a finite state automaton. Think of it as a labyrinth : states are rooms, transitions are one-way corridors between rooms, there's an initial room, and an exit (final) room. As the name 'finite state automaton' says, there is a finite number of rooms. Each time you travel along a corridor, you jot down the letter written on its wall. A word can be recognized if you can find a path from the initial to the final room, going through corridors labelled with its letters, in the correct order.
The pumping lemma says that there is a maximum length (the pumping length) for which you can wander through the labyrinth without ever going back to a room through which you have gone before. The idea is that since there are only so many distinct rooms you can walk in, past a certain point, you have to either exit the labyrinth or cross over your tracks. If you manage to walk a longer path than this pumping length in the labyrinth, then you are taking a detour : you are inserting a(t least one) cycle in your path that could be removed (if you want your crossing of the labyrinth to recognize a smaller word) or repeated (pumped) indefinitely (allowing to recognize a super-long word).
There is a similar lemma for context-free languages. Those languages can be represented as word accepted by pushdown automata, which are finite state automata that can make use of a stack to decide which transitions to perform. Nonetheless, since there is stilla finite number of states, the intuition explained above carries over, even through the formal expression of the property may be slightly more complex.
In laymans terms, I think you have it almost right. It's a proof technique (two actually) for proving that a language is NOT in a certain class.
Fer example, consider a regular language (regexp, automata, etc) with an infinite number of strings in it. At a certain point, as starblue said, you run out of memory because the string is too long for the automaton. This means that there has to be a chunk of the string that the automaton can't tell how many copies of it you have (you're in a loop). So, any number of copies of that substring in the middle of the string, and you still are in the language.
This means that if you have a language that does NOT have this property, ie, there is a sufficiently long string with NO substring that you can repeat any number of times and still be in the language, then the language isn't regular.
For example, take this language L = anbn.
Now try to visualize finite automaton for the above language for some n's.
if n = 1, the string w = ab. Here we can make a finite automaton with out looping
if n = 2, the string w = a2b2. Here we can make a finite automaton with out looping
if n = p, the string w = apbp. Essentially a finite automaton can be assumed with 3 stages.
First stage, it takes a series of inputs and enter second stage. Similarly from stage 2 to stage 3. Let us call these stages as x, y and z.
There are some observations
Definitely x will contain 'a' and z will contain 'b'.
Now we have to be clear about y:
case a: y may contain 'a' only
case b: y may contain 'b' only
case c: y may contain a combination of 'a' and 'b'
So the finite automaton states for stage y should be able to take inputs 'a' and 'b' and also it should not take more a's and b's which cannot be countable.
If stage y is taking only one 'a' and one 'b', then there are two states required
If it is taking two 'a' and one 'b', three states are required with out loops
and so on....
So the design of stage y is purely infinite. We can only make it finite by putting some loops and if we put loops, the finite automaton can accept languages beyond L = anbn. So for this language we can't construct a finite automaton. Hence it is not regular.
This is not an explanation as such but it is simple.
For a^n b^n our FSM should be built in such a way that b must know the number of a's already parsed and will accept the same n number of b's. A FSM can not simply do stuff like that.

Resources