How to call a structured language that cannot loop or a functional language that cannot return - theory

I created a special-purpose "programming language" that deliberately (by design) cannot evaluate the same piece of code twice (ie. it cannot loop). It essentially is made to describe a flowchart-like process where each element in the flowchart is a conditional that performs a different test on the same set of data (without being able to modify it). Branches can split and merge, but never in a circular fashion, ie. the flowchart cannot loop back onto itself. When arriving at the end of a branch, the current state is returned and the program exits.
When written down, a typical program superficially resembles a program in a purely functional language, except that no form of recursion is allowed and functions can never return anything; the only way to exit a function is to call another function, or to invoke a general exit statement that returns the current state. A similar effect could also be achieved by taking a structured programming language and removing all loop statements, or by taking an "unstructured" programming language and forbidding any goto or jmp statement that goes backwards in the code.
Now my question is: is there a concise and accurate way to describe such a language? I don't have any formal CS background and it is difficult for me to understand articles about automata theory and formal language theory, so I'm a bit at a loss. I know my language is not Turing complete, and through great pain, I managed to assure myself that my language probably can be classified as a "regular language" (ie. a language that can be evaluated by a read-only Turing machine), but is there a more specific term?
Bonus points if the term is intuitively understandable to an audience that is well-versed in general programming concepts but doesn't have a formal CS background. Also bonus points if there is a specific kind of machine or automaton that evaluates such a language. Oh yeah, keep in mind that we're not evaluating a stream of data - every element has (read-only) access to the full set of input data. :)

I believe that your language is sufficiently powerful to encode precisely the star-free languages. This is a subset of that regular languages in which no expression contains a Kleene star. In other words, it's the language of the empty string, the null set, and individual characters that is closed under concatenation and disjunction. This is equivalent to the set of languages accepted by DFAs that don't have any directed cycles in them.
I can attempt a proof of this here given your description of your language, though I'm not sure it will work precisely correctly because I don't have full access to your language. The assumptions I'm making are as follows:
No functions ever return. Once a function is called, it will never return control flow to the caller.
All calls are resolved statically (that is, you can look at the source code and construct a graph of each function and the set of functions it calls). In other words, there aren't any function pointers.
The call graph is acyclic; for any functions A and B, then exactly one of the following holds: A transitively calls B, B transitively calls A, or neither A nor B transitively call one another.
More generally, the control flow graph is acyclic. Once an expression evaluates, it never evaluates again. This allows us to generalize the above so that instead of thinking of functions calling other functions, we can think of the program as a series of statements that all call one another as a DAG.
Your input is a string where each letter is scanned once and only once, and in the order in which it's given (which seems reasonable given the fact that you're trying to model flowcharts).
Given these assumptions, here's a proof that your programs accept a language iff that language is star-free.
To prove that if there's a star-free language, there's a program in your language that accepts it, begin by constructing the minimum-state DFA for that language. Star-free languages are loop-free and scan the input exactly once, and so it should be easy to build a program in your language from the DFA. In particular, given a state s with a set of transitions to other states based on the next symbol of input, you can write a function that
looks at the next character of input and then calls the function encoding the state being transitioned to. Since the DFA has no directed cycles, the function calls have no directed cycles, and so each statement will be executed exactly once. We now have that (∀ R. is a star-free language → ∃ a program in your language that accepts it).
To prove the reverse direction of implication, we essentially reverse this construction and create an ε-NFA with no cycles that corresponds to your program. Doing a subset construction on this NFA to reduce it to a DFA will not introduce any cycles, and so you'll have a star-free language. The construction is as follows: for each statement si in your program, create a state qi with a transition to each of the states corresponding to the other statements in your program that are one hop away from that statement. The transitions to those states will be labeled with the symbols of input consumed making each of the decisions, or ε if the transition occurs without consuming any input. This shows that (∀ programs P in your language, &exists; a star-free language R the accepts just the strings accepted by your language).
Taken together, this shows that your programs have identically the power of the star-free languages.
Of course, the assumptions I made on what your programs can do might be too limited. You might have random-access to the input sequence, which I think can be handled with a modification of the above construction. If you can potentially have cycles in execution, then this whole construction breaks. But, even if I'm wrong, I still had a lot of fun thinking about this, and thank you for an enjoyable evening. :-)
Hope this helps!

I know this question is somewhat old, but for posterity, the phrase you are looking for is "decision tree". See http://en.wikipedia.org/wiki/Decision_tree_model for details. I believe this captures exactly what you have done and has a pretty descriptive name to boot!

Related

Show a language is infinite

Given a DFA M with n states defined over an alphabet ∑, and a string x∈L(M) such that |x|>n, I must show that the L(M) is an infinite language.
Is there any way I can prove this using the pumping lemma?
I'm not immediately sure why the pumping lemma needs to be involved directly. The proof idea is the same one behind the proof of the pumping lemma, and we can explain the relationship. Possibly, with that explanation in hand, you can figure out a way to work out a proof that relies on the pumping lemma directly. I think that should in theory be possible to do without too much mental gymnastics.
Imagine a DFA with three states that accepts a string of length four. This is a concrete instance of the general claim you've been asked to prove. If we can understand why this is specifically true here, it will be easier to generalize to DFAs with an arbitrary number of states.
A DFA executes one transition for every symbol of input. Each transition takes the DFA to some state. Since the string of length four is accepted, we execute four transitions - beginning with the initial state - and we end up in some accepting state. Since each transition takes the DFA to some state, we move into states four times. Since there are only three states in the DFA, that means one of the states was transitioned into more than once: we cannot transition four times around three states without visiting some state more than once. The general principle here is referred to as the Pigeonhole principle: if there are n pigeons in m holes, and n > m, then some hole has more than one pigeon. This is the same reasoning behind why the pumping lemma works, by the way.
Now, consider what this observation means. We accepted a string and, while accepting that string, some state was visited twice. This means that there is a loop in our DFA somewhere - a way to get to an earlier state from a later state - and, furthermore, it is possible to accept strings after taking the loop. If the loop can be taken once:
it might not have been taken at all, so there is definitely a shorter string accepted by the DFA.
it might be taken twice, so there is definitely a longer string accepted by the DFA.
indeed, it might be taken over and over again an arbitrary number of times, so infinitely many longer strings must be in the language.
When using the pumping lemma, you typically assume a language is regular and conclude that what the pumping lemma says about regular languages - what we have said above - is true. Then, you provide a counterexample - a string of length greater than the pumping length p (more on what p is in a moment) - and conclude the assumption is false, that is, that the language is not regular.
The pumping length is essentially the number of states in some DFA for the language. If the language is regular, there are infinitely many DFAs that accept it. For the purposes of using the pumping lemma, it doesn't really matter which DFA for the language you use, since any DFA will visit some state more than once if it processes an input whose length exceeds the number of states in the DFA. Perhaps it is typical to imagine a minimal DFA for the language, which is guaranteed to exist and be unique up to isomorphism.

What is the meaning of 'construct' in programming languages

I see the term 'construct' come up very often in programming readings. The current book I am reading, "Programming in C" by Stephen Koching has used it a few times throughout the book. One example is in the chapter on looping, which says:
"When developing programs, it sometimes becomes desirable to have the
test made at the end of the loop rather than at the beginning.
Naturally, the C language provides a special language construct to
handle such a situation. This looping statement is known as the do
statement."
In this case what does the term 'construct' mean, and does the word 'construct' have any relation to an object 'constructor' in other languages?
In this case you can replace the word construct with syntax.
does the word 'construct' have an relation to an object 'constructor' in other languages?
No. These two terms are different. There is nothing like constructor in C
It's a generic term that normally refers to some particular syntax included in the language to perform some task (like a loop with the condition at the end). It has no relation at all with constructors.1
Well, besides the fact that constructors are a particular language construct in many OO languages.
does the word 'construct' have an relation to an object 'constructor' in other languages?
The sentence uses the noun, not a verb, meaning of the word "construct":
construct (n) - something (such as an idea or a theory) that is formed in people's minds.
In this case, "construct" refers to an abstract way of describing something (namely, a loop) in terms of the syntax of that particular language. "Language construct" means "a way to do [something] with that language".
A construct is simply a concept implementation mechanism used by a given programming language - the language's syntax.
In your case, the concept here is a loop and its construct is the manner in which it is implemented by the C programming language.
Programming languages provide constructs for various programming concepts that define how these programming concepts are implemented in that language.
Does the word 'construct' have an relation to an object 'constructor' in other languages?
The two terms are different, a constructor is used in Object Oriented Languages such as java, it is not available in the C programming language.
first of all, remember that c-language, not an Object Oriented Programming language. the constructor is an OOP terminology. So there is Construct refers to syntax and pre-defined keywords like do and while in this case.
The word “construct” as used in computer programming has very broad or general meaning. I hope these thoughts will help to explain what is meant by the word “construct” in programming.
A computer program is a list of instructions that the computer is able to (a) understand and (b) execute (perform or carry out). The simplest program would be a list of - let’s call them statements - that the computer would execute in sequence, one after the other, from the first to the last, and then end. But that would be incredibly limiting - so limiting in fact that I don’t think computers would ever have become much more than simple calculators. One of the fundamental differences between a simple calculator and a computer is that the statements do not have to be executed in sequence. The sequence can be interrupted by “special” instructions (statements) which can divert the flow of execution from one stream to a totally different stream which has a completely different agenda.
The first obvious way this is done is with methods (functions or procedures). When a method is called, the flow of execution is diverted from one stream of statements to a totally different stream of statements, often unrelated to the stream from which it came. If that concept is accepted, then I think that an instruction that calls a method could also be regarded as a “construct”.
Let’s divert this discussion for a moment to talk about “blocks” of code.
Programmers who work in languages like C, C++ or Java know that pairs of opening and closing braces (curly brackets), are used to identify blocks of code. And it’s blocks of code that divide a program up into different processes or procedures. A block of code that is headed by say a while() loop is just as valid as a method, in that it interrupts the otherwise unimpeded flow of execution through a program. The same applies to the many categories of operators. “new” will divert the flow of statement execution to a constructor method. So we have all these various syntactical expressions that have one thing in common - they divert the flow of execution which, left to its own devices - would happily proceed executing statements of code in sequence one after the other, without any interruption.
So I am suggesting that the word “construct” is a useful collective noun that embraces all of these different and diverse syntactical expressions e.g. if() for() switch() etc. that use the different categories of operators to perform functions that are defined in their respective blocks of code. Would love to hear other opinions.
In short, all in-built features(like an array, loop, if else statements) of the programming language are language constructs.

Is C an imperative or declarative programming language

It is quite confusing to know difference between Imperative and Declarative programming can any one explain difference between both in real world terms?
Kindly clarify whether C is an Imperative or Declarative Language?
C is an imperative programming language.
A one line difference between the two would be Declarative programming is when you say what you want, and imperative language is when you say how to get what you want. In Declarative programming the focus is on what the computer should do rather than how it should do it (ex. SQL) whereas in the Imperative programming the focus is on what steps the computer should take rather than what the computer will do (ex. C, C++, Java).
Imperative programming is a programming paradigm that describes computation in terms of statements that change a program state
Declarative programming is a programming paradigm, a style of building the structure and elements of computer programs, that expresses the logic of a computation without describing its control flow
Many imperative programming languages (such as Fortran, BASIC and C) are abstractions of assembly language.
The wiki says:-
As an imperative language, C uses statements to specify actions. The
most common statement is an expression statement, consisting of an
expression to be evaluated, followed by a semicolon; as a side effect
of the evaluation, functions may be called and variables may be
assigned new values. To modify the normal sequential execution of
statements, C provides several control-flow statements identified by
reserved keywords. Structured programming is supported by if(-else)
conditional execution and by do-while, while, and for iterative
execution (looping). The for statement has separate initialization,
testing, and reinitialization expressions, any or all of which can be
omitted. break and continue can be used to leave the innermost
enclosing loop statement or skip to its reinitialization. There is
also a non-structured goto statement which branches directly to the
designated label within the function. switch selects a case to be
executed based on the value of an integer expression.
Caveat
I am writing with a lot of generalities, so please bear with me.
In Theory
C is imperative, because code reads like a recipe for how to do something. However, if you use a lot of well-named functions and function pointers for polymorphism, it's possible to make C code look like a declarative language.
In imperative languages, you are focused on the algorithm/implementation. Engineering is inherently imperative, because you are focused on efficiency of a process: the cost of doing something in terms of time or money (or memory in CS) required.
In contrast, Mathematics is generally declarative (but writing a proof tends to be more imperative). In math, you care more about correctness and defining invariant relationships/operations, as opposed to how quickly you can get the answer.
Note that many functional languages tend to be declarative in nature (eg R, Lisp).
What does z = x + y mean? (Semantics)
In an imperative language, it means read from memory locations x and y, add those values together, and put the result into memory location z, and do it right now. If you assign a different value to x, you will have to use the z = x + y statement again to recalculate z.
In a declarative (lazy) language, it means z is a variable whose value is the sum of the values of two other variables x and y. The addition operation isn't executed until you try to read the value of z. What's the implication? If you read from z, the value will always be the sum of x and y at that moment in time; you do not need to reissue the statement. In pure declarative languages where there are no variables, a reissue can actually be caught as an error!!!
Keep this example in mind and you will see why mathematicians tend to prefer declarative languages. For example, I can define hypotenuse = sqrt( height^2 + length^2 ) and never worry about having to reissue that statement. The relationship is an invariant that will always hold, just like a mathematical truth always holds.
In Real Life (and why should I care?)
Proponents of declarative languages claim: an efficient solution that is wrong (buggy) is useless. They want bug-free, state-less functions without side-effects that can be reused without modification.
Proponents of imperative languages claim: a correct solution that takes forever to run is also useless. They want control over the memory/speed tradeoff. They want to be able to optimised based on physical and time constraints.
Of course, nothing is 100% imperative or declarative. Imperative code that is correct and well-written implies certain relationships. OTOH, declarative code, in sufficient depth and in conjunction with the language specifications, describes those relationships well enough for the compiler/interpreter to turn your code into a series of CPU instructions.
Because we are dealing with computers, a declarative compiler/interpreter must be smart enough to make time vs memory tradeoffs, whereas in an imperative language, it is up to the programmer to make those decisions more explicitly.
So a declarative language requires that the programmer focus on defining relationships between variables and other invariants. It is up to the compiler/interpreter to turn those relationships into a series of instructions/operations for the CPU. Most declarative compilers/interpreters are smart enough to handle most real-world cases, but may have trouble with edge cases. Unfortunately, in those situations you will have to coax the compiler/interpreter.
Which one is better?
Proponents of declarative languages claim that such languages allow the programmers to focus on the domain and to write code that reads easier for non-programmers. It is easier to write correct code, claim the advocates. However, the trade-off is, coaxing the compiler/interpreter to make the correct memory vs speed tradeoff can require some intricate knowledge of the language. You will understand this problem if you use a declarative language like R or SQL or LISP. It is certainly possible to define a new declarative language which has nothing to do with computers (but doing so may make it harder for the writer of the interpreter/compiler). Many mathematicians and pure CS researchers like declarative languages.
Imperative languages tend to give you finer grained control over the machine. There is no question that you are programming a computer. The trap is, we can end up pre-maturely focusing on unnecessary speed optimisations that hurt code maintenance and readability. In the early days of computing where speed or memory were severely limited, you needed to have imperative languages to get useful work done, optimised correctly for your situation. Engineers and tinkerers tend to gravitate towards imperative languages.
C is an imperative language.
An imperative language specifies how to do what you want. A declarative language specifies what you want, but not how to do it; the language works out how to do it. Prolog is an example of a declarative language.
I would like to comment that some aspects of the C language would be, in the absence of explicit rules, declarative...
int i = 4;
int j = 5;
float f = i/j;
would seem to mean that you intend float to be .80 (and in a declarative language it would be, most likely)... but since there are well defined procedures int/int evaluates to an int using integer division (which in C is floor division).
it is the aspect of the explicitly defined behavior that makes C Imperative.
there is the secret under layer of C where optimizations can be made, as long as they have a guarantee to not change the output of the program, that makes the compiler have some declarative behavior, where the declaration is the behavior of the input C program, but the end result can be anything that matches that C program in functionality
§5.1.2.3 part 10:
Alternatively, an implementation might perform various optimizations
within each translation unit, such that the actual semantics would
agree with the abstract semantics only when making function calls
across translation unit boundaries. In such an implementation, at the
time of each function entry and function return where the calling
function and the called function are in different translation units,
the values of all externally linked objects and of all objects
accessible via pointers therein would agree with the abstract
semantics. Furthermore, at the time of each such function entry the
values of the parameters of the called function and of all objects
accessible via pointers therein would agree with the abstract
semantics. In this type of implementation, objects referred to by
interrupt service routines activated by the signal function would
require explicit specification of volatile storage, as well as other
implementation-defined restrictions.
and a concrete example from the next part:
EXAMPLE 2 In executing the fragment
char c1, c2; /* ... */
c1 = c1 + c2;
the ‘‘integer promotions’’ require that the abstract machine promote
the value of each variable to int size and then add
the two ints and truncate the sum. Provided the addition of two chars
can be done without overflow, or with overflow wrapping silently to
produce the correct result, the actual execution need only produce the
same result, possibly omitting the promotions.
-> Imperative programming: telling the "machine" how to do something, and as a result what you want to happen will happen.
-> Declarative programming: telling the "machine" what you would like to happen, and let the computer figure out how to do it.
So We can say C is an imperative Language.

curious about how "loop = loop" is evaluated in Haskell

I thought expressions like this would cause Haskell to evaluate forever. But the behaviors in both GHCi and the compiled program surprised me.
For example, in GHCi, these expressions blocked until I Control+C, but consumed no CPU. Looked like it was sleeping.
let loop = loop
let loop = 1 + loop
I tried compiling these programs with GHC:
main = print loop
where loop = 1 + loop
main = print loop
where loop = if True then loop else 1
What was printed was:
Main: <<loop>>
So my question is: Obviously these expressions are compiled to something different than loops or recursive calls in imperative languages. What are they compiled to? Is this a special rule to handle 0-arg functions that have themselves in the right hand side, or it's a special case of something more general that I don't know?
[EDIT]:
One more question: If this happens to be a special handling from the compiler, what is the reason behind doing this when it's impossible to check for all infinite loops? 'Familiar' languages don't care about cases like while (true); or int f() { return f();}, right?
Many thanks.
GHC implements Haskell as a graph reduction machine. Imagine your program as a graph with each value as a node, and lines from it to each value that value depends on. Except, we're lazy, so you really start with just one node -- and to evaluate that node, GHC has to "enter" it and open it up to a function with arguments. It then replaces the function call with the body of the function, and attempts to reduce it enough to get it into head normal form, etc.
The above being very handwavy and I'm sure eliding some necessary detail in the interest of brevity.
In any case, when GHC enters a value, it generally replaces it with a black hole while the node is being evaluated (or, depending on your terminology, while the closure is being reduced) This has a number of purposes. First, it plugs a potential space leak. If the node references a value which is used nowhere else, the black hole allows that value to be garbage-collected even while the node is being evaluated. Second, this prevents certain types of duplicate work, since in a multi-threaded environment, two threads may attempt to enter the same value. The black-hole will cause the second thread to block rather than evaluate the value already being evaluated. Finally, this happens to allow for a limited form of loop detection, since if a thread attempts to re-enter its own black hole, we can throw an exception.
Here's a bit of a more metaphorical explanation. If I have a series of instructions that moves a turtle (in logo) around the screen, there's no one way to tell what shape they will produce, or whether that shape terminates without running them. But if, while running them, I notice that the path of the turtle has crossed itself, I can indicate to the user "aha! the turtle has crossed its path!" So I know that the turtle has reached a spot it has been before -- if the path is a circuit through evaluating the nodes of a graph, then that tells us we're in a loop. However, the turtle can also go in, for example, an expanding spiral. And it will never terminate, but it will also never cross its prior path.
So, because of the use of black holes, for multiple reasons, we have some notion of a marked "path" that evaluation has followed. And if the path crosses itself, we can tell and throw an exception. However, there are a million ways for things to diverge that don't involve the path crossing itself. And in those cases, we can't tell, and don't throw an exception.
For super-geeky technical detail about the current implementation of black holes, see Simon Marlow's talk from the recent Haskell Implementors Workshop, "Scheduling Lazy Evaluation on Multicore" at the bottom of http://haskell.org/haskellwiki/HaskellImplementorsWorkshop/2010.
In some, limited cases, the compiler can determine such a loop exists as part of its other control flow analyses, and at that point replaces the looping term with code that throws an appropriate exception. This cannot be done in all cases, of course, but only in some of the more obvious cases, where it falls out naturally from other work the compiler is doing.
As for why Haskell finds this more often than other languages:
These cases do not occur in languages which are strict such as C. These loops specifically happen when a lazy variable's computation depends on its own value.
Languages such as C have very specific semantics in loops; ie, what order to do what in. As such, they are forced to actually execute the loop. Haskell, however defines a special value _|_ ("the bottom"), which is used to represent erroneous values. Values which are strict on themselves - ie, they depend on their own value to compute - are _|_. The result of pattern-matching on _|_ can either be an infinite loop or an exception; your compiler is choosing the latter here.
The Haskell compiler is very interested in performing strictness analysis - ie, proving that a certain expression depends on certain other expressions - in order to perform certain optimizations. This loop analysis falls out naturally as an edge case in the strictness analyzer which must be handled in one way or another.

Halting in non-Turing-complete languages

The halting problem cannot be solved for Turing complete languages and it can be solved trivially for some non-TC languages like regexes where it always halts.
I was wondering if there are any languages that has both the ability to halt and not halt but admits an algorithm that can determine whether it halts.
The halting problem does not act on languages. Rather, it acts on machines
(i.e., programs): it asks whether a given program halts on a given input.
Perhaps you meant to ask whether it can be solved for other models of
computation (like regular expressions, which you mention, but also like
push-down automata).
Halting can, in general, be detected in models with finite resources (like
regular expressions or, equivalently, finite automata, which have a fixed
number of states and no external storage). This is easily accomplished by
enumerating all possible configurations and checking whether the machine enters
the same configuration twice (indicating an infinite loop); with finite
resources, we can put an upper bound on the amount of time before we must see
a repeated configuration if the machine does not halt.
Usually, models with infinite resources (unbounded TMs and PDAs, for instance),
cannot be halt-checked, but it would be best to investigate the models and
their open problems individually.
(Sorry for all the Wikipedia links, but it actually is a very good resource for
this kind of question.)
Yes. One important class of this kind are primitive recursive functions. This class includes all of the basic things you expect to be able to do with numbers (addition, multiplication, etc.), as well as some complex classes like #adrian has mentioned (regular expressions/finite automata, context-free grammars/pushdown automata). There do, however, exist functions that are not primitive recursive, such as the Ackermann function.
It's actually pretty easy to understand primitive recursive functions. They're the functions that you could get in a programming language that had no true recursion (so a function f cannot call itself, whether directly or by calling another function g that then calls f, etc.) and has no while-loops, instead having bounded for-loops. A bounded for-loop is one like "for i from 1 to r" where r is a variable that has already been computed earlier in the program; also, i cannot be modified within the for-loop. The point of such a programming language is that every program halts.
Most programs we write are actually primitive recursive (I mean, can be translated into such a language).
The short answer is yes, and such languages can even be extremely useful.
There was a discussion about it a few months ago on LtU:
http://lambda-the-ultimate.org/node/2846

Resources