curious about how "loop = loop" is evaluated in Haskell - loops

I thought expressions like this would cause Haskell to evaluate forever. But the behaviors in both GHCi and the compiled program surprised me.
For example, in GHCi, these expressions blocked until I Control+C, but consumed no CPU. Looked like it was sleeping.
let loop = loop
let loop = 1 + loop
I tried compiling these programs with GHC:
main = print loop
where loop = 1 + loop
main = print loop
where loop = if True then loop else 1
What was printed was:
Main: <<loop>>
So my question is: Obviously these expressions are compiled to something different than loops or recursive calls in imperative languages. What are they compiled to? Is this a special rule to handle 0-arg functions that have themselves in the right hand side, or it's a special case of something more general that I don't know?
[EDIT]:
One more question: If this happens to be a special handling from the compiler, what is the reason behind doing this when it's impossible to check for all infinite loops? 'Familiar' languages don't care about cases like while (true); or int f() { return f();}, right?
Many thanks.

GHC implements Haskell as a graph reduction machine. Imagine your program as a graph with each value as a node, and lines from it to each value that value depends on. Except, we're lazy, so you really start with just one node -- and to evaluate that node, GHC has to "enter" it and open it up to a function with arguments. It then replaces the function call with the body of the function, and attempts to reduce it enough to get it into head normal form, etc.
The above being very handwavy and I'm sure eliding some necessary detail in the interest of brevity.
In any case, when GHC enters a value, it generally replaces it with a black hole while the node is being evaluated (or, depending on your terminology, while the closure is being reduced) This has a number of purposes. First, it plugs a potential space leak. If the node references a value which is used nowhere else, the black hole allows that value to be garbage-collected even while the node is being evaluated. Second, this prevents certain types of duplicate work, since in a multi-threaded environment, two threads may attempt to enter the same value. The black-hole will cause the second thread to block rather than evaluate the value already being evaluated. Finally, this happens to allow for a limited form of loop detection, since if a thread attempts to re-enter its own black hole, we can throw an exception.
Here's a bit of a more metaphorical explanation. If I have a series of instructions that moves a turtle (in logo) around the screen, there's no one way to tell what shape they will produce, or whether that shape terminates without running them. But if, while running them, I notice that the path of the turtle has crossed itself, I can indicate to the user "aha! the turtle has crossed its path!" So I know that the turtle has reached a spot it has been before -- if the path is a circuit through evaluating the nodes of a graph, then that tells us we're in a loop. However, the turtle can also go in, for example, an expanding spiral. And it will never terminate, but it will also never cross its prior path.
So, because of the use of black holes, for multiple reasons, we have some notion of a marked "path" that evaluation has followed. And if the path crosses itself, we can tell and throw an exception. However, there are a million ways for things to diverge that don't involve the path crossing itself. And in those cases, we can't tell, and don't throw an exception.
For super-geeky technical detail about the current implementation of black holes, see Simon Marlow's talk from the recent Haskell Implementors Workshop, "Scheduling Lazy Evaluation on Multicore" at the bottom of http://haskell.org/haskellwiki/HaskellImplementorsWorkshop/2010.

In some, limited cases, the compiler can determine such a loop exists as part of its other control flow analyses, and at that point replaces the looping term with code that throws an appropriate exception. This cannot be done in all cases, of course, but only in some of the more obvious cases, where it falls out naturally from other work the compiler is doing.
As for why Haskell finds this more often than other languages:
These cases do not occur in languages which are strict such as C. These loops specifically happen when a lazy variable's computation depends on its own value.
Languages such as C have very specific semantics in loops; ie, what order to do what in. As such, they are forced to actually execute the loop. Haskell, however defines a special value _|_ ("the bottom"), which is used to represent erroneous values. Values which are strict on themselves - ie, they depend on their own value to compute - are _|_. The result of pattern-matching on _|_ can either be an infinite loop or an exception; your compiler is choosing the latter here.
The Haskell compiler is very interested in performing strictness analysis - ie, proving that a certain expression depends on certain other expressions - in order to perform certain optimizations. This loop analysis falls out naturally as an edge case in the strictness analyzer which must be handled in one way or another.

Related

Why are the variables "i" and "j" considered dead in the control flow graph?

I was going through the topic of induction variable elimination in the red dragon book, where I came across the following example.
Consider the control flow graph below :
Fig. 1 : Original Control Flow Graph
Now the authors apply strength reduction to the above graph to obtain the graph below:
Fig 2: Control flow graph after applying strength reduction
Example 10.4. After reduction in strength is applied to the inner loops around B2 and B3, the only use of i and j is to determine the outcome of the test in block B4. We know that the values of i and t2 satisfy the relationship t2 = 4*i, while those of j and t4 satisfy the relationship t4 = 4* j, so the test t2>=t4 is equivalent to i> = j. Once this replacement is made, i in block B2 and j in block B3 become dead variables and the assignments to them in these blocks become dead code that can be eliminated, resulting in the flow graph shown in Fig. 3 below. □
Fig 3: Flow graph after induction variable elimination
What I do not get is the claim that "i in block B2 and j in block B3 become dead variables". But if we consider the following graph along the green path in Fig. 2 :
The variables i and j are probably alive in the blocks B2 and B3 respectively, if we go along the path in green as shown and count for the use of i and j (in their respective blocks) on the right-hand side of their assignment. That particular use is a
The variables are no longer live because they have no observable effect.
They are incremented and decremented, but the values are never consulted for any purpose. They are not printed out. No control flow depends on them. No other variable is computed using their values. If they weren't incremented and decremented, nobody would notice.
Eliminating them will not affect program output in any way. So they should be eliminated.
As a more formal definition of liveness, we can start with the following:
A variable is live (at a point in the program) if that value of the variable will become observable (by being made visible outside of the execution of the program, see below).
A variable is also live if its current value is used in the computation of a live value.
That recursive definition excludes the use of a not-otherwise-used variable only for the computation of the value of itself or of other variables which are not live. It's simply a more precise way of saying what I said in the first part of the answer: an assignment is irrelevant if eliminating it would make no observable difference in the execution of the program.
The precise definition of "observable effect" will vary according to computation model, but it basically means that the value is in some way communicated to the world outside of the program execution. For example a value is live if it is printed on the console or written to a file (including being used as the name of a file to be created, because file directories are also files). It's live if it is stored in a database, or causes a light to blink. The C standard includes in the category of observable behaviour reading and writing volatile memory, which is a way of encapsulating CPUs which use loads and stores of specific memory addresses as a way of sending and receiving data from peripherals.
There's an old philosophical riddle: If a tree falls in an uninhabited forest, does it make a sound? If we ignore the anthoropocentricity of that question, it seems reasonable to answer, "No", as did many 19th century scientists. "Sound", they said, is not just a vibration of the air, but rather the result of the atmospheric vibration causing a neural reaction in an ear. (Certainly, it is possible to imagine a forest without any animate life at all, not just human life, so the philosopher can take refuge in that defense.) And that's basically where this model of computational liveness ends up: a computation is observable if it could be observed by someone. [Note 1]
Now, that's still open to interpretation because someone might, for example, "observe" a computation by measuring the amount of time that the computation took. In that sense, all optimisations should be observable, because they are pointless if they don't shorten computation time.
If we take that to be part of observable behaviour, then no useful optimisation is possible. So in most cases, this is not a particularly useful definition of observability. But there are a very few use cases in which preserving the amount of time a computation uses is necessary. The classic such case is countering a security attacks which deduce the value of what should be secret variables by timing various different uses of the value. If you were writing code designed to maintain a highly-confidential secret -- say, the password keys required to access a bank account -- then you might want to include loops in some control flows which have no computational purpose whatsoever, but rather are intended to take exactly the same amount of time as a different control flow which uses the same secret value.
For a more playful example, when I was much younger and computers used much more electricity to do much slower computations, we noticed that you could "listen" to the execution of a program by tuning a radio to pick up the electromagnetic vibrations being produced by the CPU. Then you could write different kinds of pointless loops to make different musical notes or rhythmic artefacts. This same kind of pointless loop can be used in a microcontroller in order to producing a blinking display, or even to directly drive an audio speaker. So there are definitely cases where you would want the compiler to not eliminate "useless" code.
Despite that, it is probably not a good idea to reject all optimisation techniques in order to enable predictable execution times. Most of the time, we would really prefer for our programs to work as fast as possible, or to consume the minimum amount of non-renewable energy; in other words, to avoid doing unnecessary work. But since there are use cases where optimisation can affect behaviour which is not normally considered observable, the compiler needs to provide the programmer with a mechanism to turn optimisation off in particular pieces of code. Those are not the cases being discussed by Aho&c, and with good reason.
Notes:
George Berkeley, writing in 1710:
… it seems no less evident that the various Sensations or Ideas imprinted on the Sense, however blended or combined together (that is, whatever Objects they compose) cannot exist otherwise than in a Mind perceiving them…
Some philosophers of the time posited the necessity of the existence of an omniscient God, in order to avoid the chaos which Berkeley summons up in which the objects in his writing studio suddenly cease to exist when he closes his eyes and are recreated in a blink when he opens them again. In this argument, God, who continually sees all, guarantees the continuity of existence of the objects in Bishop Berkeley's studio. That has always struck me as a peculiarly menial purpose for a deity. (Surely She could delegate such a mundane task to a subordinate.) But to each their own.
For more references and a little discussion, you can start here on Wikipedia. Or just listen to Bruce Cockburn's beautiful environmental anthem.

Does memory dependence speculation prevent BN_consttime_swap from being constant-time?

Context
The function BN_consttime_swap in OpenSSL is a thing of beauty. In this snippet, condition has been computed as 0 or (BN_ULONG)-1:
#define BN_CONSTTIME_SWAP(ind) \
do { \
t = (a->d[ind] ^ b->d[ind]) & condition; \
a->d[ind] ^= t; \
b->d[ind] ^= t; \
} while (0)
…
BN_CONSTTIME_SWAP(9);
…
BN_CONSTTIME_SWAP(8);
…
BN_CONSTTIME_SWAP(7);
The intention is that so as to ensure that higher-level bignum operations take constant time, this function either swaps two bignums or leaves them in place in constant time. When it leaves them in place, it actually reads each word of each bignum, computes a new word that is identical to the old word, and write that result back to the original location.
The intention is that this will take the same time as if the bignums had effectively been swapped.
In this question, I assume a modern, widespread architecture such as those described by Agner Fog in his optimization manuals. Straightforward translation of the C code to assembly (without the C compiler undoing the efforts of the programmer) is also assumed.
Question
I am trying to understand whether the construct above characterizes as a “best effort” sort of constant-time execution, or as perfect constant-time execution.
In particular, I am concerned about the scenario where bignum a is already in the L1 data cache when the function BN_consttime_swap is called, and the code just after the function returns start working on the bignum a right away. On a modern processor, enough instructions can be in-flight at the same time for the copy not to be technically finished when the bignum a is used. The mechanism allowing the instructions after the call to BN_consttime_swap to work on a is memory dependence speculation. Let us assume naive memory dependence speculation for the sake of the argument.
What the question seems to boil down to is this:
When the processor finally detects that the code after BN_consttime_swap read from memory that had, contrary to speculation, been written to inside the function, does it cancel the speculative execution as soon as it detects that the address had been written to, or does it allow itself to keep it when it detects that the value that has been written is the same as the value that was already there?
In the first case, BN_consttime_swap looks like it implements perfect constant-time. In the second case, it is only best-effort constant-time: if the bignums were not swapped, execution of the code that comes after the call to BN_consttime_swap will be measurably faster than if they had been swapped.
Even in the second case, this is something that looks like it could be fixed for the foreseeable future (as long as processors remain naive enough) by, for each word of each of the two bignums, writing a value different from the two possible final values before writing either the old value again or the new value. The volatile type qualifier may need to be involved at some point to prevent an ordinary compiler to over-optimize the sequence, but it still sounds possible.
NOTE: I know about store forwarding, but store forwarding is only a shortcut. It does not prevent a read being executed before the write it is supposed to come after. And in some circumstances it fails, although one would not expect it to in this case.
Straightforward translation of the C code to assembly (without the C compiler undoing the efforts of the programmer) is also assumed.
I know it's not the thrust of your question, and I know that you know this, but I need to rant for a minute. This does not even qualify as a "best effort" attempt to provide constant-time execution. A compiler is licensed to check the value of condition, and skip the whole thing if condition is zero. Obfuscating the setting of condition makes this less likely to happen, but is no guarantee.
Purportedly "constant-time" code should not be written in C, full stop. Even if it is constant time today, on the compilers that you test, a smarter compiler will come along and defeat you. One of your users will use this compiler before you do, and they will not be aware of the risk to which you have exposed them. There are exactly three ways to achieve constant time that I am aware of: dedicated hardware, assembly, or a DSL that generates machine code plus a proof of constant-time execution.
Rant aside, on to the actual architecture question at hand: assuming a stupidly naive compiler, this code is constant time on the µarches with which I am familiar enough to evaluate the question, and I expect it to broadly be true for one simple reason: power. I expect that checking in a store queue or cache if a value being stored matches the value already present and conditionally short-circuiting the store or avoiding dirtying the cache line on every store consumes more energy than would be saved in the rare occasion that you get to avoid some work. However, I am not a CPU designer, and do not presume to speak on their behalf, so take this with several tablespoons of salt, and please consult one before assuming this to be true.
This blog post, and the comments made by the author, Henry, on the subject of this question should be considered as authoritative as anyone should allowed to expect. I will reproduce the latter here for archival:
I didn’t think the case of overwriting a memory location with the same value had a practical use. I think the answer is that in current processors, the value of the store is irrelevant, only the address is important.
Out here in academia, I’ve heard of two approaches to doing memory disambiguation: Address-based, or value-based. As far as I know, current processors all do address-based disambiguation.
I think the current microbenchmark has some evidence that the value isn’t relevant. Many of the cases involve repeatedly storing the same value into the same location (particularly those with offset = 0). These were not abnormally fast.
Address-based schemes uses a store queue and a load queue to track outstanding memory operations. Loads check the store queue to for an address match (Should this load do store-to-load forwarding instead of reading from cache?), while stores check the load queue (Did this store clobber the location of a later load I allowed to execute early?). These checks are based entirely on addresses (where a store and load collided). One advantage of this scheme is that it’s a fairly straightforward extension on top of store-to-load forwarding, since the store queue search is also used there.
Value-based schemes get rid of the associative search (i.e., faster, lower power, etc.), but requires a better predictor to do store-to-load forwarding (Now you have to guess whether and where to forward, rather than searching the SQ). These schemes check for ordering violations (and incorrect forwarding) by re-executing loads at commit time and checking whether their values are correct. In these schemes, if you have a conflicting store (or made some other mistake) that still resulted in the correct result value, it would not be detected as an ordering violation.
Could future processors move to value-based schemes? I suspect they might. They were proposed in the mid-2000s(?) to reduce the complexity of the memory execution hardware.
The idea behind constant-time implementation is not to actually perform everything in constant time. That will never happen on an out-of-order architecture.
The requirement is that no secret information can be revealed by timing analysis.
To prevent this there are basically two requirements:
a) Do not use anything secret as a stop condition for a loop, or as a predicate to a branch. Failing to do so will open you to a branch prediction attack https://eprint.iacr.org/2006/351.pdf
b) Do not use anything secret as an index to memory access. This leads to cache timing attacks http://www.daemonology.net/papers/htt.pdf
As for your code: assuming that your secret is "condition" and possibly the contents of a and b the code is perfectly constant time in the sense that its execution does not depend on the actual contents of a, b and condition. Of course the locality of a and b in memory will affect the execution time of the loop, but not the CONTENTS which are secret.
That is assuming of course condition was computed in a constant time manner.
As for C optimizations: the compiler can only optimize code based on information it knows. If "condition" is truly secret the compiler should not be able to discern it contents and optimize. If it can be deducted from your code then the compiler will most likely make optimization for the 0 case.

What does "loops must be folded to ensure termination" mean?

I came across "loops must be folded to enusre termination" in a paper on formal methods (abstract interpretation to be precise). I am clear on what termination means, but I do not know what a folded loop is, nor how to perform folding on a loop.
Could someone please explain to me what a folded loop is please? And if it is not implicit in, or does not follow immediately for the definition of a folded loop, how this ensures termination?
Thanks
Folding a loop is the opposite action from the better-known loop unfolding, which itself is better known as loop unrolling. Given a loop, to unfold it means to repeat the body several times, so that the loop test is executed less often. When the number of executions of the loop in advance, the loop can be completely unfolded, leaving a simple sequence of instructions. For example, this loop
for i := 1 to 4 do
writeln(i);
can be unfolded to
writeln(1);
writeln(2);
writeln(3);
writeln(4);
See C++ loop unfolding, bounds for another example with partial unfolding.
The metaphor is that the program is folded on itself many times over; unfolding means removing some or all of these folds.
In some static analysis techniques, it is difficult to deal with loops, because finding a precondition for the entry point of the loop (i.e. finding a loop invariant) requires a fixpoint computation which is unsolvable in general. Therefore some analyses unfold loops; this requires having a reasonable bound on the number of iterations, which limits the scope of programs that can be analyzed.
In other static analysis techniques, finding a good invariant is a critical part of the analysis. It doesn't help to unfold the loop, in fact partially unfolding the loop would make the loop body larger and so would make it more difficult to determine a good invariant; and completely unfolding the loop would be impractical or impossible if the number of iterations was large or unbounded. Without seeing the paper, I find the statement a bit surprising, because the code could have been written with the unfolded form, but there can be programs that the analysis only works on in a more folded form.
I have no knowledge of abstract interpretation, so I'll take the functional programming approach to folding. :-)
In functional programming, a fold is an operation applied to a list to do something with each element, updating a value each iteration. For example, you can implement map this way (in Scheme):
(define (map1 func lst)
(fold-right (lambda (elem result)
(cons (func elem) result))
'() lst))
What that does is that it starts with an empty list (let's call that the result), and then for each element of the list, from the right-hand-side moving leftward, you call func on the element and cons its result onto the result list.
The key here, as far as termination goes, is that with a fold, the loop is guaranteed to terminate as long as the list is finite, since you're iterating to the next element of the list each time, and if the list is finite, then eventually there will be no next element.
Contrast this with a more C-style for loop, that doesn't operate on a list, but instead have the form for (init; test; update). The test is not guaranteed to ever return false, and so the loop is not guaranteed to complete.

How to call a structured language that cannot loop or a functional language that cannot return

I created a special-purpose "programming language" that deliberately (by design) cannot evaluate the same piece of code twice (ie. it cannot loop). It essentially is made to describe a flowchart-like process where each element in the flowchart is a conditional that performs a different test on the same set of data (without being able to modify it). Branches can split and merge, but never in a circular fashion, ie. the flowchart cannot loop back onto itself. When arriving at the end of a branch, the current state is returned and the program exits.
When written down, a typical program superficially resembles a program in a purely functional language, except that no form of recursion is allowed and functions can never return anything; the only way to exit a function is to call another function, or to invoke a general exit statement that returns the current state. A similar effect could also be achieved by taking a structured programming language and removing all loop statements, or by taking an "unstructured" programming language and forbidding any goto or jmp statement that goes backwards in the code.
Now my question is: is there a concise and accurate way to describe such a language? I don't have any formal CS background and it is difficult for me to understand articles about automata theory and formal language theory, so I'm a bit at a loss. I know my language is not Turing complete, and through great pain, I managed to assure myself that my language probably can be classified as a "regular language" (ie. a language that can be evaluated by a read-only Turing machine), but is there a more specific term?
Bonus points if the term is intuitively understandable to an audience that is well-versed in general programming concepts but doesn't have a formal CS background. Also bonus points if there is a specific kind of machine or automaton that evaluates such a language. Oh yeah, keep in mind that we're not evaluating a stream of data - every element has (read-only) access to the full set of input data. :)
I believe that your language is sufficiently powerful to encode precisely the star-free languages. This is a subset of that regular languages in which no expression contains a Kleene star. In other words, it's the language of the empty string, the null set, and individual characters that is closed under concatenation and disjunction. This is equivalent to the set of languages accepted by DFAs that don't have any directed cycles in them.
I can attempt a proof of this here given your description of your language, though I'm not sure it will work precisely correctly because I don't have full access to your language. The assumptions I'm making are as follows:
No functions ever return. Once a function is called, it will never return control flow to the caller.
All calls are resolved statically (that is, you can look at the source code and construct a graph of each function and the set of functions it calls). In other words, there aren't any function pointers.
The call graph is acyclic; for any functions A and B, then exactly one of the following holds: A transitively calls B, B transitively calls A, or neither A nor B transitively call one another.
More generally, the control flow graph is acyclic. Once an expression evaluates, it never evaluates again. This allows us to generalize the above so that instead of thinking of functions calling other functions, we can think of the program as a series of statements that all call one another as a DAG.
Your input is a string where each letter is scanned once and only once, and in the order in which it's given (which seems reasonable given the fact that you're trying to model flowcharts).
Given these assumptions, here's a proof that your programs accept a language iff that language is star-free.
To prove that if there's a star-free language, there's a program in your language that accepts it, begin by constructing the minimum-state DFA for that language. Star-free languages are loop-free and scan the input exactly once, and so it should be easy to build a program in your language from the DFA. In particular, given a state s with a set of transitions to other states based on the next symbol of input, you can write a function that
looks at the next character of input and then calls the function encoding the state being transitioned to. Since the DFA has no directed cycles, the function calls have no directed cycles, and so each statement will be executed exactly once. We now have that (∀ R. is a star-free language → &Exists; a program in your language that accepts it).
To prove the reverse direction of implication, we essentially reverse this construction and create an ε-NFA with no cycles that corresponds to your program. Doing a subset construction on this NFA to reduce it to a DFA will not introduce any cycles, and so you'll have a star-free language. The construction is as follows: for each statement si in your program, create a state qi with a transition to each of the states corresponding to the other statements in your program that are one hop away from that statement. The transitions to those states will be labeled with the symbols of input consumed making each of the decisions, or ε if the transition occurs without consuming any input. This shows that (∀ programs P in your language, &exists; a star-free language R the accepts just the strings accepted by your language).
Taken together, this shows that your programs have identically the power of the star-free languages.
Of course, the assumptions I made on what your programs can do might be too limited. You might have random-access to the input sequence, which I think can be handled with a modification of the above construction. If you can potentially have cycles in execution, then this whole construction breaks. But, even if I'm wrong, I still had a lot of fun thinking about this, and thank you for an enjoyable evening. :-)
Hope this helps!
I know this question is somewhat old, but for posterity, the phrase you are looking for is "decision tree". See http://en.wikipedia.org/wiki/Decision_tree_model for details. I believe this captures exactly what you have done and has a pretty descriptive name to boot!

Halting in non-Turing-complete languages

The halting problem cannot be solved for Turing complete languages and it can be solved trivially for some non-TC languages like regexes where it always halts.
I was wondering if there are any languages that has both the ability to halt and not halt but admits an algorithm that can determine whether it halts.
The halting problem does not act on languages. Rather, it acts on machines
(i.e., programs): it asks whether a given program halts on a given input.
Perhaps you meant to ask whether it can be solved for other models of
computation (like regular expressions, which you mention, but also like
push-down automata).
Halting can, in general, be detected in models with finite resources (like
regular expressions or, equivalently, finite automata, which have a fixed
number of states and no external storage). This is easily accomplished by
enumerating all possible configurations and checking whether the machine enters
the same configuration twice (indicating an infinite loop); with finite
resources, we can put an upper bound on the amount of time before we must see
a repeated configuration if the machine does not halt.
Usually, models with infinite resources (unbounded TMs and PDAs, for instance),
cannot be halt-checked, but it would be best to investigate the models and
their open problems individually.
(Sorry for all the Wikipedia links, but it actually is a very good resource for
this kind of question.)
Yes. One important class of this kind are primitive recursive functions. This class includes all of the basic things you expect to be able to do with numbers (addition, multiplication, etc.), as well as some complex classes like #adrian has mentioned (regular expressions/finite automata, context-free grammars/pushdown automata). There do, however, exist functions that are not primitive recursive, such as the Ackermann function.
It's actually pretty easy to understand primitive recursive functions. They're the functions that you could get in a programming language that had no true recursion (so a function f cannot call itself, whether directly or by calling another function g that then calls f, etc.) and has no while-loops, instead having bounded for-loops. A bounded for-loop is one like "for i from 1 to r" where r is a variable that has already been computed earlier in the program; also, i cannot be modified within the for-loop. The point of such a programming language is that every program halts.
Most programs we write are actually primitive recursive (I mean, can be translated into such a language).
The short answer is yes, and such languages can even be extremely useful.
There was a discussion about it a few months ago on LtU:
http://lambda-the-ultimate.org/node/2846

Resources