Show a language is infinite - infinite

Given a DFA M with n states defined over an alphabet ∑, and a string x∈L(M) such that |x|>n, I must show that the L(M) is an infinite language.
Is there any way I can prove this using the pumping lemma?

I'm not immediately sure why the pumping lemma needs to be involved directly. The proof idea is the same one behind the proof of the pumping lemma, and we can explain the relationship. Possibly, with that explanation in hand, you can figure out a way to work out a proof that relies on the pumping lemma directly. I think that should in theory be possible to do without too much mental gymnastics.
Imagine a DFA with three states that accepts a string of length four. This is a concrete instance of the general claim you've been asked to prove. If we can understand why this is specifically true here, it will be easier to generalize to DFAs with an arbitrary number of states.
A DFA executes one transition for every symbol of input. Each transition takes the DFA to some state. Since the string of length four is accepted, we execute four transitions - beginning with the initial state - and we end up in some accepting state. Since each transition takes the DFA to some state, we move into states four times. Since there are only three states in the DFA, that means one of the states was transitioned into more than once: we cannot transition four times around three states without visiting some state more than once. The general principle here is referred to as the Pigeonhole principle: if there are n pigeons in m holes, and n > m, then some hole has more than one pigeon. This is the same reasoning behind why the pumping lemma works, by the way.
Now, consider what this observation means. We accepted a string and, while accepting that string, some state was visited twice. This means that there is a loop in our DFA somewhere - a way to get to an earlier state from a later state - and, furthermore, it is possible to accept strings after taking the loop. If the loop can be taken once:
it might not have been taken at all, so there is definitely a shorter string accepted by the DFA.
it might be taken twice, so there is definitely a longer string accepted by the DFA.
indeed, it might be taken over and over again an arbitrary number of times, so infinitely many longer strings must be in the language.
When using the pumping lemma, you typically assume a language is regular and conclude that what the pumping lemma says about regular languages - what we have said above - is true. Then, you provide a counterexample - a string of length greater than the pumping length p (more on what p is in a moment) - and conclude the assumption is false, that is, that the language is not regular.
The pumping length is essentially the number of states in some DFA for the language. If the language is regular, there are infinitely many DFAs that accept it. For the purposes of using the pumping lemma, it doesn't really matter which DFA for the language you use, since any DFA will visit some state more than once if it processes an input whose length exceeds the number of states in the DFA. Perhaps it is typical to imagine a minimal DFA for the language, which is guaranteed to exist and be unique up to isomorphism.

Related

How expressive can we be with arrays in Z3(Py)? An example

My first question is whether I can express the following formula in Z3Py:
Exists i::Integer s.t. (0<=i<|arr|) & (avg(arr)+t<arr[i])
This means: whether there is a position i::0<i<|arr| in the array whose value a[i] is greater than the average of the array avg(arr) plus a given threshold t.
I know this kind of expressions can be queried in Dafny and (since Dafny uses Z3 below) I guess this can be done in Z3Py.
My second question is: how expressive is the decidable fragment involving arrays in Z3?
I read this paper on how the full theory of arrays is not decidable (http://theory.stanford.edu/~arbrad/papers/arrays.pdf), but only a concrete fragment, the array property fragment.
Is there any interesting paper/tutorial on what can and cannot be done with arrays+quantifiers+functions in Z3?
You found the best paper to read regarding reasoning with Array's, so I doubt there's a better resource or a tutorial out there for you.
I think the sequence logic (not yet officially supported by SMTLib, but z3 supports it), is the right logic to use for reasoning about these sorts of problems, see: https://microsoft.github.io/z3guide/docs/theories/Sequences/
Having said that, most properties about arrays/sequences of "arbitrary size" require inductive proofs. This is because most interesting functions on them are essentially recursive (or iterative), and induction is the only way to prove properties for such programs. While SMT solvers improved significantly regarding support for recursive definitions and induction, they still don't perform anywhere near well compared to a traditional theorem prover. (This is, of course, to be expected.)
I'd recommend looking at the sequence logic, and playing around with recursive definitions. You might get some mileage out of that, though don't expect proofs for anything that require induction, especially if the inductive-hypothesis needs some clever invariant to be specified.
Note that if you know the length of your array concretely (i.e., 10, 15, or some other hopefully not too large a number), then it's best to allocate the elements symbolically yourself, and not use arrays/sequences at all. (And you can repeat your proof for lenghts 0, 1, 2, .. upto some fixed number.) But if you want proofs that work for arbitrary lengths, your best bet is to use sequences in z3, not arrays; with all the caveats I mentioned above.

How to call a structured language that cannot loop or a functional language that cannot return

I created a special-purpose "programming language" that deliberately (by design) cannot evaluate the same piece of code twice (ie. it cannot loop). It essentially is made to describe a flowchart-like process where each element in the flowchart is a conditional that performs a different test on the same set of data (without being able to modify it). Branches can split and merge, but never in a circular fashion, ie. the flowchart cannot loop back onto itself. When arriving at the end of a branch, the current state is returned and the program exits.
When written down, a typical program superficially resembles a program in a purely functional language, except that no form of recursion is allowed and functions can never return anything; the only way to exit a function is to call another function, or to invoke a general exit statement that returns the current state. A similar effect could also be achieved by taking a structured programming language and removing all loop statements, or by taking an "unstructured" programming language and forbidding any goto or jmp statement that goes backwards in the code.
Now my question is: is there a concise and accurate way to describe such a language? I don't have any formal CS background and it is difficult for me to understand articles about automata theory and formal language theory, so I'm a bit at a loss. I know my language is not Turing complete, and through great pain, I managed to assure myself that my language probably can be classified as a "regular language" (ie. a language that can be evaluated by a read-only Turing machine), but is there a more specific term?
Bonus points if the term is intuitively understandable to an audience that is well-versed in general programming concepts but doesn't have a formal CS background. Also bonus points if there is a specific kind of machine or automaton that evaluates such a language. Oh yeah, keep in mind that we're not evaluating a stream of data - every element has (read-only) access to the full set of input data. :)
I believe that your language is sufficiently powerful to encode precisely the star-free languages. This is a subset of that regular languages in which no expression contains a Kleene star. In other words, it's the language of the empty string, the null set, and individual characters that is closed under concatenation and disjunction. This is equivalent to the set of languages accepted by DFAs that don't have any directed cycles in them.
I can attempt a proof of this here given your description of your language, though I'm not sure it will work precisely correctly because I don't have full access to your language. The assumptions I'm making are as follows:
No functions ever return. Once a function is called, it will never return control flow to the caller.
All calls are resolved statically (that is, you can look at the source code and construct a graph of each function and the set of functions it calls). In other words, there aren't any function pointers.
The call graph is acyclic; for any functions A and B, then exactly one of the following holds: A transitively calls B, B transitively calls A, or neither A nor B transitively call one another.
More generally, the control flow graph is acyclic. Once an expression evaluates, it never evaluates again. This allows us to generalize the above so that instead of thinking of functions calling other functions, we can think of the program as a series of statements that all call one another as a DAG.
Your input is a string where each letter is scanned once and only once, and in the order in which it's given (which seems reasonable given the fact that you're trying to model flowcharts).
Given these assumptions, here's a proof that your programs accept a language iff that language is star-free.
To prove that if there's a star-free language, there's a program in your language that accepts it, begin by constructing the minimum-state DFA for that language. Star-free languages are loop-free and scan the input exactly once, and so it should be easy to build a program in your language from the DFA. In particular, given a state s with a set of transitions to other states based on the next symbol of input, you can write a function that
looks at the next character of input and then calls the function encoding the state being transitioned to. Since the DFA has no directed cycles, the function calls have no directed cycles, and so each statement will be executed exactly once. We now have that (∀ R. is a star-free language → &Exists; a program in your language that accepts it).
To prove the reverse direction of implication, we essentially reverse this construction and create an ε-NFA with no cycles that corresponds to your program. Doing a subset construction on this NFA to reduce it to a DFA will not introduce any cycles, and so you'll have a star-free language. The construction is as follows: for each statement si in your program, create a state qi with a transition to each of the states corresponding to the other statements in your program that are one hop away from that statement. The transitions to those states will be labeled with the symbols of input consumed making each of the decisions, or ε if the transition occurs without consuming any input. This shows that (∀ programs P in your language, &exists; a star-free language R the accepts just the strings accepted by your language).
Taken together, this shows that your programs have identically the power of the star-free languages.
Of course, the assumptions I made on what your programs can do might be too limited. You might have random-access to the input sequence, which I think can be handled with a modification of the above construction. If you can potentially have cycles in execution, then this whole construction breaks. But, even if I'm wrong, I still had a lot of fun thinking about this, and thank you for an enjoyable evening. :-)
Hope this helps!
I know this question is somewhat old, but for posterity, the phrase you are looking for is "decision tree". See http://en.wikipedia.org/wiki/Decision_tree_model for details. I believe this captures exactly what you have done and has a pretty descriptive name to boot!

What is the best way of determining a loop invariant?

When using formal aspects to create some code is there a generic method of determining a loop invariant or will it be completely different depending on the problem?
It has already been pointed out that one same loop can have several invariants, and that Calculability is against you. It doesn't mean that you cannot try.
You are, in fact, looking for an inductive invariant: the word invariant may also be used for a property that is true at each iteration but for which is it not enough to know that it hold at one iteration to deduce that it holds at the next. If I is an inductive invariant, then any consequence of I is an invariant, but may not be an inductive invariant.
You are probably trying to get an inductive invariant to prove a certain property (post-condition) of the loop in some defined circumstances (pre-conditions).
There are two heuristics that work quite well:
start with what you have (pre-conditions), and weaken until you have an inductive invariant. In order to get an intuition how to weaken, apply one or several forward loop iterations and see what ceases to be true in the formula you have.
start with what you want (post-conditions) and strengthen until you have an inductive invariant. To get the intuition how to strengthen, apply one or several loop iterations backwards and see what needs to be added so that the post-condition can be deduced.
If you want the computer to help you in your practice, I can recommend the Jessie deductive verification plug-in for C programs of Frama-C. There are others, especially for Java and JML annotations, but I am less familiar with them. Trying out the invariants you think of is much faster than working out if they work on paper. I should point out that verifying that a property is an inductive invariant is also undecidable, but modern automatic provers do great on many simple examples. If you decide to go that route, get as many as you can from the list: Alt-ergo, Simplify, Z3.
With the optional (and slightly difficult to install) library Apron, Jessie can also infer some simple invariants automatically.
It's actually trivial to generate loop invariants. true is a good one for instance. It fulfills all three properties you want:
It holds before loop entry
It holds after each iteration
It holds after loop termination
But what you're after is probably the strongest loop invariant. Finding the strongest loop invariant however, is sometimes even an undecidable task. See article Inadequacy of Computable Loop Invariants.
I don't think it's easy to automate this. From wiki:
Because of the fundamental similarity of loops and recursive programs, proving partial correctness of loops with invariants is very similar to proving correctness of recursive programs via induction. In fact, the loop invariant is often the inductive property one has to prove of a recursive program that is equivalent to a given loop.
I've written about writing loop invariants in my blog, see Verifying Loops Part 2. The invariants needed to prove a loop correct typically comprise 2 parts:
A generalisation of the state that is intended when the loop terminates.
Extra bits needed to ensure that the loop body is well-formed (e.g. array indices in bounds).
(2) is straightforward. To derive (1), start with a predicate expressing the desired state after termination. Chances are it contains a 'forall' or 'exists' over some range of data. Now change the bounds of the 'forall' or 'exists' so that (a) they depend on variables modified by the loop (e.g. loop counters), and (b) so that the invariant is trivially true when the loop is first entered (usually by making the range of the 'forall' or 'exists' empty).
There are a number of heuristics for finding loop invariants. One good book about this is "Programming in the 1990s" by Ed Cohen. It's about how to find a good invariant by manipulating the postcondition by hand. Examples are: replace a constant by a variable, strengthen invariant, ...

General proof of equivalence of two FSMs in finite time?

Does a general proof exist for the equivalence of two (deterministic) finite state machines that always takes finite time? That is, given two FSMs, can you prove that given the same inputs they will always produce the same outputs without actually needing to execute the FSMs (which could be non-terminating?). If such a proof does exist, what is the time complexity?
There is a proof, though I don't know it. Look for Sipser's textbook on the subject, that's where I know of it from.
Scrounging my memory: basically, there is a unique minimal DFA for a given DFA, and there is a minimization algorithm that always terminates. Minimize both A and B, and see if they have the same minimal DFA. I don't know the complexity of the minimization, though its not too bad (I think its polynomial). Graph isomorphism is pretty hard to compute, but because there's a special starting node, it may hopefully be somewhat easier. You may not even require graph isomorphism, to be honest.
But no, you don't ever need to actually run the DFAs, just analyze their structure.
Suppose you have two FSMs with O(n) states. Then you can make an FSM of size O(n2) that recognizes only the symmetric difference of their accept languages. (Make an FSM that has states that correspond to a pair of states, one from each FSM. Then on each step, update each part of the pair simultaneously. A state in the new FSM is an accept state iff exactly one of the pair was an accept state.) Now minimize this FSM and see if it is the same as the trivial one-state FSM that rejects everything. Minimizing an FSM with m states takes time O(m log m), so overall you can do everything in time O(n2 log n).
#Agor correctly states that Sipser is a good reference for these sorts of things. The key point of my answer is that you can do this in polynomial time, even with a small exponent.

Halting in non-Turing-complete languages

The halting problem cannot be solved for Turing complete languages and it can be solved trivially for some non-TC languages like regexes where it always halts.
I was wondering if there are any languages that has both the ability to halt and not halt but admits an algorithm that can determine whether it halts.
The halting problem does not act on languages. Rather, it acts on machines
(i.e., programs): it asks whether a given program halts on a given input.
Perhaps you meant to ask whether it can be solved for other models of
computation (like regular expressions, which you mention, but also like
push-down automata).
Halting can, in general, be detected in models with finite resources (like
regular expressions or, equivalently, finite automata, which have a fixed
number of states and no external storage). This is easily accomplished by
enumerating all possible configurations and checking whether the machine enters
the same configuration twice (indicating an infinite loop); with finite
resources, we can put an upper bound on the amount of time before we must see
a repeated configuration if the machine does not halt.
Usually, models with infinite resources (unbounded TMs and PDAs, for instance),
cannot be halt-checked, but it would be best to investigate the models and
their open problems individually.
(Sorry for all the Wikipedia links, but it actually is a very good resource for
this kind of question.)
Yes. One important class of this kind are primitive recursive functions. This class includes all of the basic things you expect to be able to do with numbers (addition, multiplication, etc.), as well as some complex classes like #adrian has mentioned (regular expressions/finite automata, context-free grammars/pushdown automata). There do, however, exist functions that are not primitive recursive, such as the Ackermann function.
It's actually pretty easy to understand primitive recursive functions. They're the functions that you could get in a programming language that had no true recursion (so a function f cannot call itself, whether directly or by calling another function g that then calls f, etc.) and has no while-loops, instead having bounded for-loops. A bounded for-loop is one like "for i from 1 to r" where r is a variable that has already been computed earlier in the program; also, i cannot be modified within the for-loop. The point of such a programming language is that every program halts.
Most programs we write are actually primitive recursive (I mean, can be translated into such a language).
The short answer is yes, and such languages can even be extremely useful.
There was a discussion about it a few months ago on LtU:
http://lambda-the-ultimate.org/node/2846

Resources