Is a SQL Server recursive CTE considered a loop? - sql-server

I was under the impression that recursive CTEs were set based, but in a recent SO post someone mentioned that they are loops.
Are recursive CTEs set based? Am I wrong to assume that a set based operation cannot be a loop?

If it is recursive it is still considered a loop.
Although one statement is set based, calling it over and over can be considered a loop. This is an argument about the definition or wording based on the context being used. They are set based statements but the processing is considered in simple terms a looping process.
For those interested here is a nice little write up about performance with CTE's:
http://explainextended.com/2009/11/18/sql-server-are-the-recursive-ctes-really-set-based/

They are set based. Recursive sets are still sets.
But all set operations are, if you look with a powerful enough magnifier glass, loops. Ultimately the code runs on CPUs and CPUs execute a stream of serial instructions that operate on discrete regions of memory. In other words, there is no set oriented hardware. Being 'set oriented' is a logical concept. The fact that all SQL operations are ultimately implemented using some form of a loop is an implementation detail.

I think the distinction that needs to be done is "tail recursion" versus "general recursion".
All tail recursions can be implemented as loops - without the need for a Stack.
General recursion support can also be implemented as a Loop - but with a stack.
Recursive CTEs are Tail recursions and hence essentially a Loop. The only difference is the terminating condition is handled by SQL semantics/execution engine. The output from each Loop iteration is UNIONed or whatever set op you specify.

Related

Is there any use for recursions?

I've recently learned about tail-recursions as a way to make a recursion that doesn't crash when you give it too big of a number to work with. I realised that I could easily rewrite a tail-recursion as a while loop and have it do basically exactly the same thing, which lead me wondering - is there any use for recursions when you can do everything with a normal loop?
Yes, recursion code looks smaller and is easier to understand, but it also has a chance of completely crashing, while a simple loop cannot crash doing the same task.
I'll take for example the Haskell language, it is Purely functional:
Every function in Haskell is a function in the mathematical sense
(i.e., "pure"). Even side-effecting IO operations are but a
description of what to do, produced by pure code. There are no
statements or instructions, only expressions which cannot mutate
variables (local or global) nor access state like time or random
numbers.
So, in haskell a recursive function is tail recursive if the final result of the
recursive call is the final result of the function itself. If the
result of the recursive call must be further processed (say, by adding
1 to it, or consing another element onto the beginning of it), it is
not tail recursive. (see here)
On the other hand, in many programming languages, calling a function uses stack space, so a function that is tail recursive can build up a large stack of calls to itself, which wastes memory. Since in a tail call, the containing function is about to return, its environment can actually be discarded and the recursive call can be entered without creating a new stack frame. This trick is called tail call elimination or tail call optimisation and allows tail-recursive functions to recur indefinitely.
It's been a long while since I posted this question and my opinion on the topic has changed. Here's why:
I learned Haskell, and it's a language that fixes everything bad about recursion - recursive definitions and algorithms are turned into normal looping algorithms and most of the time you don't even use recursion directly and instead use map, fold, filter, or a combination of those. And with everything bad removed, the good sides of functional programming start to shine through - everything is closer to its mathematical definition, not obscured by clunky loops and variables.
To someone else who is struggling to understand why recursion is great, go learn Haskell. It has a lot of other very interesting features like being lazy (values are evaluated only when they're requested), static (variables can never be modified), pure (functions cannot do anything other than take input and return output, so no printing to the console), strongly typed with a very expressive type system, filled with mind-blowing abstractions like Functor, Monad, State, and much more. I can almost say it's life-changing.

Are variables in parallel do loop ensured to be updated?

I have read this article: Parallel Programming in Fortran 95 using OpenMP
Where it reads on pages 11 and 12 that :
real(8) :: A(1000), B(1000)
! $OMP PARALLEL DO
do i = 1, 1000
B(i) = 10 * i
A(i) = A(i) + B(i)
enddo
! $OMP END PARALLEL DO
Might not work since the matrix B's values are not ensured until ! $OMP END (PARALLEL) DO. To me this is crucial. I have some loops with a lot of statements that depend on previous statements in a do loop and I thought this would be natural. I get that B(j) cannot be ensured in iteration i given that i/=j but in the same iteration I thought it was as a given. Am I correct or have I misunderstood? If it is this way, is there a command to ensure that at least within the iteration the values of variables are updated for each statement before the next?
I have tried some simple loops that seems to be working, just as if it was serial code, but I have some other code where it seems a bit more random : works with /O3 but not /O0, the code is quite large and a bit hard to read so I won't post it here...)
It looks very strange. If it was like that most of the code that you will see that uses OpenMP would be non-conforming. You will see things like this all over my codebase and I believe that the claim is bogus. Unfortunately there is no direct citation of the relevant piece of the specification there and it is hard to search what had in mind.
I would even say that features like atomic and the critical sections would loose their sense if it was as the author claims.
Without seeing the code that is random for you, we can't say anything, better maybe not mention it at all if you do not plan to show it.
The statement in the referenced article is wrong.
Have a look at the paper "The OpenMP Memory Model", which explains the OpenMP memory model quite well.
Every thread is allowed to have its own "temporary view" on the shared part of the memory and the flow in both directions between that "view" and the "memory" may be delayed (although an update can be forced by flush calls etc.). But there are no restrictions within the same view. And since every iteration is guaranteed to be executed by only one thread, you can expect normal behavior within a single iteration. So the given example is guaranteed to work as expected.

Simple vs. Nested

Are simple loops as powerful as nested loops in terms of Turing completeness?
In terms of Turing completeness, yes they are.
Proof: It's possible to write a Brainf*** interpreter using a simple loop, for example here:
http://www.hevanet.com/cristofd/brainfuck/sbi.c
For loops with a fixed number of steps (LOOP, FOR and similar): Imagine the whole purpose of a loop is to count to n. Why should make it a difference if I loop i times in an outer loop and j times in an inner loop as opposed n = i * j in just a single loop?
Assume that no WHILE, GOTO or similar constructs are allowed in a program (just assignment, IF, and fixed loops). Then all these programs end after a finite number of steps.
The next step to more expressibility is to allow loops, where the number of iterations is e.g. determined by a condition, and it is not sure, whether this condition is ever satisfied (e.g. WHILE). Then is may happen, that a program won't halt. (This type of expressiveness is also known as Turing-completeness).
Corresponding to these two forms of programs are two kinds of functions, which were historically developed around the same time and which are called primitive recursive functions and μ-recursive functions.
The number of nestings doesn't play a role in this.

What is the best way of determining a loop invariant?

When using formal aspects to create some code is there a generic method of determining a loop invariant or will it be completely different depending on the problem?
It has already been pointed out that one same loop can have several invariants, and that Calculability is against you. It doesn't mean that you cannot try.
You are, in fact, looking for an inductive invariant: the word invariant may also be used for a property that is true at each iteration but for which is it not enough to know that it hold at one iteration to deduce that it holds at the next. If I is an inductive invariant, then any consequence of I is an invariant, but may not be an inductive invariant.
You are probably trying to get an inductive invariant to prove a certain property (post-condition) of the loop in some defined circumstances (pre-conditions).
There are two heuristics that work quite well:
start with what you have (pre-conditions), and weaken until you have an inductive invariant. In order to get an intuition how to weaken, apply one or several forward loop iterations and see what ceases to be true in the formula you have.
start with what you want (post-conditions) and strengthen until you have an inductive invariant. To get the intuition how to strengthen, apply one or several loop iterations backwards and see what needs to be added so that the post-condition can be deduced.
If you want the computer to help you in your practice, I can recommend the Jessie deductive verification plug-in for C programs of Frama-C. There are others, especially for Java and JML annotations, but I am less familiar with them. Trying out the invariants you think of is much faster than working out if they work on paper. I should point out that verifying that a property is an inductive invariant is also undecidable, but modern automatic provers do great on many simple examples. If you decide to go that route, get as many as you can from the list: Alt-ergo, Simplify, Z3.
With the optional (and slightly difficult to install) library Apron, Jessie can also infer some simple invariants automatically.
It's actually trivial to generate loop invariants. true is a good one for instance. It fulfills all three properties you want:
It holds before loop entry
It holds after each iteration
It holds after loop termination
But what you're after is probably the strongest loop invariant. Finding the strongest loop invariant however, is sometimes even an undecidable task. See article Inadequacy of Computable Loop Invariants.
I don't think it's easy to automate this. From wiki:
Because of the fundamental similarity of loops and recursive programs, proving partial correctness of loops with invariants is very similar to proving correctness of recursive programs via induction. In fact, the loop invariant is often the inductive property one has to prove of a recursive program that is equivalent to a given loop.
I've written about writing loop invariants in my blog, see Verifying Loops Part 2. The invariants needed to prove a loop correct typically comprise 2 parts:
A generalisation of the state that is intended when the loop terminates.
Extra bits needed to ensure that the loop body is well-formed (e.g. array indices in bounds).
(2) is straightforward. To derive (1), start with a predicate expressing the desired state after termination. Chances are it contains a 'forall' or 'exists' over some range of data. Now change the bounds of the 'forall' or 'exists' so that (a) they depend on variables modified by the loop (e.g. loop counters), and (b) so that the invariant is trivially true when the loop is first entered (usually by making the range of the 'forall' or 'exists' empty).
There are a number of heuristics for finding loop invariants. One good book about this is "Programming in the 1990s" by Ed Cohen. It's about how to find a good invariant by manipulating the postcondition by hand. Examples are: replace a constant by a variable, strengthen invariant, ...

Which is faster for large "for" loop: function call or inline coding?

I have programmed an embedded software (using C of course) and now I'm considering ways to improve the running time of the system. The most important single module in my system is one very large nested for loop module.
That module consists of two nested for loops that loops max 122500 times. That's not very much yet, but the problem is that inside that nested for loop I have a function call to a function that is in another source file. That specific function consists mostly of two another nested for loops which loops always 22500 times. So now I have to make a function call 122500 times.
I have made that function that is to be called a lot lighter and shorter (yet still works as it should) and now I started to think that would it be faster to rip off that function call and write that process directly inside those first two for loops?
The processor in that system is ARM7TDMI and its frequency is 55MHz. The system itself isn't very time critical so it doesn't have to be real time capable. However the faster it can process its duties the better.
Also would it be also faster to use while loops instead of fors? And any piece of advice about how to improve the running time is appreciated.
-zaplec
TRY IT AND SEE!!
It'll almost certainly make a difference. Function call overhead isn't usually that much of an issue, but at over 100K repetitions it starts to add up.
...But whether or not it makes any real-world difference is something only you can answer, after trying it and timing the results.
As for for vs while... it shouldn't matter unless you actually change the behavior when changing the loop. If in doubt, make your compiler spit out assembler code for both and compare... or just change it and time it.
You need to be careful in the optimizations you make because you aren't always clear on which optimizations the compiler is making for you. Pre-optimization is a common mistake people make. Is it important that your code is readable and easily maintained or slightly faster? Like others have suggested, the best approach is to benchmark the different ways and see if there is a noticeable difference.
If you don't believe your compiler does much in the way of optimization I would look at some older concepts in optimizing C (searches on SO or google should provide some good links).
The ARM processor has an instruction pipeline (cache). When the processor encounters a branch (call) instruction, it must clear the pipeline and reload, thus wasting some time. One objective when optimizing for speed is to reduce the number of reloads to the instruction pipeline. This means reducing branch instructions.
As others have stated in SO, compile your code with optimization set for speed, and profile. I prefer to look at the assembly language listing as well (either printed from the compiler or displayed interwoven in the debugger). Use this as a baseline. If you can't profile, you can use assembly instruction counting as a rough estimate.
The next step is to reduce the number of branches; or the number times a branch is taken. Unrolling loops helps to reduce the number of times a branch is taken. Inlining helps reduce the number of branches. Before applying this fine-tuning techniques, review the design and code implementation to see if branches can be reduced. For example, reduce the number of "if" statements by using Boolean arithmetic or using Karnaugh Maps. My favorite is reducing requirements and eliminating code that doesn't need to be executed.
In the code implementation, move code that doesn't change outside of the for or while loops. Some loops may be reduce to equations (example, replacing a loop of additions with a multiplication). Also, reduce the quantity of iterations, by asking "does this loop really need to be executed this many times").
Another technique is to optimize for Data Oriented Design. Also check this reference.
Just remember to set a limit for optimizing. This is where you decide any more optimization is not generating any ROI or customer satisfaction. Also, apply optimizations in stages; which will allow you to have a deliverable when your manager asks for one.
Run a profiler on your code. If you are just guessing at where you are spending your time, you are probably wrong. A profiler will show what function is taking the most time and you can focus on that. You could be doing something in the function that takes longer than the function call itself. Did you look to see if you can change floating operations to integer, or integer math to shifts? You can spend a lot of time fiddling with things that don't make much difference. Run a profiler on your code and know for sure that the things you are changing will make a difference.
For function vs. inline, unfortunately there is no easy answer. I.e. it depends. See this FAQ. For "for" vs. "while", I wouldn't think there is any significant difference in performance.
In general, a function call should have more overhead than inlining. You really should profile however, as this can be affected quite a bit by your compiler (especially the compile/optimization settings). Some compilers will automatically inline code for example.

Resources