Loop counter & pointers

Loop counter & pointers - c

I'm writing a small specialized C99 library for graphs and I often get loops of the form:
for(int i = 0; i < graph->nvertices; ++i) {
// ...
}
I'm wondering if it's a good practice, especially in the case of tigh loop. At first I thought the compiler would be smart enough to look at 'graph->nvertices' only once instead of looking at it at every iteration, but it seems impossible as graph->nvertices could change inside the loop. Is is smarter and faster to write:
const int N = graph->nvertices;
for(int i = 0; i < N; ++i) {
// ...
}
It seems faster as it doesn't need look at the pointer more than once, but it does require the creation of a new variable.
Note: I guess this is the kind of situation where it's nice to be able to read a little assembly code to see what the compiler is actually doing, if someone has a nice reference I'm open to suggestions.

Try using higher optimization settings, some compilers should be able to optimize that away for you. You can also iterate backwards, only initializing the counter with the expression:
for (int i = graph->nvertices; i >= 0; --i)
..
However, you're going to wreak havoc on cache performance. I think the method you suggested is most straightforward, which lends itself to be clear to the compiler and to the next person reading your code.

I tend to do these optimizations myself. Sometimes the compiler can infer that nvertices doesn't change along the whole loop, but what if you have call to other functions that may change the value? The compiler cannot infer it and may not optimize the code.
Also, the best way always is profiling your code to see the comparison between the two approaches.

i use this usually.
const int N = graph->nvertices;
int i = 0;
for(; i < N; ++i) {
// ...
}

The answer may depend on your compiler, but most compilers will produce an assembly listing for you that you can study. Just make sure you list the release build.
However, if I was really concerned about every bit of performance, I might create the separate count variable as you suggest.
In the end, I doubt it would make a noticeable amount of difference.

You asked for a way to look at the assembler code. For this you can use the objdump program like this:
objdump -d executable
To filter out the main-function, use:
objdump -d executable | sed -n '/<main>/,/^$/p'

This seems a fairly easy thing to test, simply by looking at the output of the compiler. Whole program optimization usually catches these low level optimizations.
On the other hand, I usually won't make these types of optimizations even if there is a slight increase in speed, since I find the first form easier to read and maintain.

I'd go for localizing the scope of the variable. The optimizer will then figure out all by itself:
for(size_t i = 0, n = graph->nvertices; i < n; ++i) {
// ...
}

Related

What is more efficient when using a for loop?

In a lot of for loops that work with arrays I see i < arr.length() as the 2nd statement
Would this:
for(i = 0; i < arr.length(); i++){
//do something
}
be LESS efficient than this:
size = arr.length();
for(i = 0; i < size; i++){
//do something
}?
Or is the difference so small I shouldn't care?
This question only applies to languages that need to use a function in order to find the length of an array/list, unlike java, for example, which is object-oriented and arrays have a length propriety arr.length

The first version can be slower regarding the language, the compiler, its configuration and the loop condition.
Indeed, in C, C++ and Java for example the function is called at each iteration and thus the resulting program can be slower when optimizations are disabled. Other languages, such as Python, have a different kind of loops which walk through given iterables / ranges (eg. list, generators) and the value is evaluated once.
In your case, no performance differences should be seen if optimizations are enabled since the function call arr.length() is usually a constant (in the context of the loop execution) and most compilers are good enough to detect this and to produce a fast equivalent code (you can see a simple example in C++ here). However, please note that interpreters generally do not perform such optimizations.

for loop macro coding style

One of my tutors at university suggests using macros to reduce repetition in c99 code, like this.
#define foreach(a, b, c) for (int a = b; a < c; a++)
#define for_i foreach(i, 0, n)
#define for_j foreach(j, 0, n)
#define for_ij for_i for_j
Which can be used like this:
for_ij { /*do stuff*/; }
for_i { /*do stuff*/; }
Another tutor with industrial background discourages its use, claiming it was seen as an anti-pattern at his former employer (but he does not know the reason behind this) . In fact by grepping through source code of large projects one rarely finds those constructs outside of short academic examples.
My question is: why is this construct so rarely used in practice? Is it dangerous to some extent?

This is such a perfect illustration of the gap between academia and the real world, it is hard to believe. But it looks too weird to be made up.
To answer your question: NO this kind of construction is not used in practice and it is risky as it hides information, using implicit variables and conditions.
Here are some questions the casual reader will ponder about:
what is the upper bound? It is not so obvious it should be n.
what is the actual range for i?
it is 0 based?
does it include n or stop before?
C programmers are very good at reading idiomatic constructions such as
for (int i = 0; i < n; i++) {
...
}
Hiding the details of this loop logic saves nothing, is counter-productive, error prone and should be outlawed by local coding conventions. The only reason it isn't in my teams is nobody came up with such a weird idea.
You may need to use caution with the university tutor who wants these abbreviations used, possibly an APL nostalgic, and avoid conflict. You might want to play some code golfing with him, there is a stack exchange dedicated to that and he will love how people waste countless hours shaving bytes from their source codes...
But you should follow the other tutor's advice and write clear and explicit code, carefully indented and spaced out, with meaningful names where it matters and short ones where their use is obvious. Favor idiomatic constructions where possible as it makes the code more readable, less error prone and better fit for optimizing compilers.

The problem is that the reduction in repetition (and thus the improvement in readability) is fairly trivial. You're reducing
for (int i = 0; i < n; i++)
for (int j = 0; j < n; j++)
to
for_ij
which is not much of an improvement, as the variable names i, j, and n are all short so repeating them is not an issue.
At the same time, you're hiding the parameter n -- the latter loop is implicitly dependent on n, even though n appear nowhere in for_ij or (likely) in the body of the for loop either. Hiding important information makes reading code harder -- in order to read this, you need to know the critical importance of n, which you can only see by searching through two levels of macro indirection.

for array loop use int or uint8_t?

If I am looping through an array, should I use int or uint8_t?
for (int i = 0; i < 100; i++) {}
or
for (uint8_t i = 0; i < 100; i++) {}
Which is better when working with embedded systems?

The first choice is much better:
for (int i = 0; i < 100; i++) { }
The alternative is sometimes encouraged by local coding rules, it is a bad recommendation:
it tends to generate more code if you use i in the loop body as an index or in an expression.
it can also lead to more bugs especially if you later modify your code and the upper boundary exceeds 255 and is read from a unsigned variable.
a good embedded compiler will take advantage of the small range if it is pertinent anyway.
Embedded systems have plenty of resources nowadays, but if you are targeting a very limited chipset, you may want to check both alternatives and make an informed decision.

Use int. Just use int. That's what it's for. If you had to think about and pick the size in bits for every variable, that'd be a nuisance, that'd indicate you were using a poor language, that wasn't doing its job. It would be as bad as if you had to pick the machine instructions to evaluate every expression you wrote. But C is a good language. Let it generate your code efficiently. Let it worry about the exact sizes of variables when you don't care, which ought to be most of the time. Just use int.

depends on the chip you are targeting
on an 8bit chip, it could make a difference using an 8bit type, however even on an 8 bit chip they often have one register that is of a bigger size.
In some cases it might help with stack usage.
In general I suggest using a unsigned for a loop anyways, but it's up to you to make informed choices on the bit width you need.

Use a pointer to array type and use post increment per loop. Then define magic number as constant plus max offset for sizeof() to array limit. Store address once per instantiating.
If you are over thinking it then you may as well go to assembly.
Or you can apply the unix philosophy and get it working. Optimization is another step.

Why don't people use (int i = length(); i > -1; i--) in for loops instead of creating a new length integer?

Throughout all tutorials and books I've read whilst learning programming the practice for iterating through arrays has always been with for loops using:
int len = array.length();
for(int i = 0; i < len; i++) {//loop code here}
Is there a reason why people don't use
for(int i = length(); i > -1; i--) {//loop code here}
From what I can see the code is shorter, easier to read and doesn't create unnecessary variables. I can see that iterating through arrays from 0 to end may be needed in some situations, but direction doesn't make a difference in most cases.

direction doesn't make a difference in most cases
True, in many cases you'll get the same result in the end, assuming there aren't any exceptions.
But in terms of thinking about what the code does, it's usually a lot simpler to think about it from start to finish. That's what we do all the time in the rest of our lives.
Oh, and you've got a bug in your code - you almost certainly don't want length() as the initial value - you probably want array.length() - 1 as otherwise it starts off being an invalid index into the array (and that's only after fixing length() to array.length()). The fact that you've got this bug demonstrates my point: your code is harder to reason about quickly than the code you dislike.
Making code easier to read and understand is much, much more important in almost every case than the tiny, almost-always-insignificant code of an extra variable. (You haven't specified the language, but in many cases in my own code I'd just call array.length() on every iteration, and let optimization sort it out.)

The reason is readability. Enumerating from 0 and up makes most sense to the most of us, therefore it is the default choice for most developers, unless the algorithm explicitly needs reverse iteration.
Declaring an extra integer variable costs virtually nothing in all common programming platforms today.

Depends on the platform you use. In Java for instance, compiler optimization is so good, I wouldn't be surprised if it doesn't make any noticeable difference. I used to benchmark all kinds of different tricks to see what really is faster and what isn't. Rule of thumb is, if you don't have strong evidence that you are wasting resources, don't try to outsmart Java.

First reason is how you think about it. Generally, as humans, we do iterate from the beginning to the end, not the other way around.
Second reason is that this syntax stems from languages like C. In such languages, going from the end to the beginning was downright impossible in some cases, the most obvious example being the string (char*), for which you usually were supplied with the start address and you had to figure out its length by enumerating until you found 0.

Deeper understanding on loops (for, while, do while, foreach, recursion, etc.)

If givin some situation that you can do a loop of a certain event or function that needed to be solved using loops, where you can achieve these by any kind of loop. How can we determine the difference between each loops that can be used based on their speed, efficiency, and memory usage? Like for example, you have these loops
for(int i=0;i<10;i++) {
array[i] = i+1;
}
int i = 0;
while(i<10) {
array[i] = i+1;
i++;
}
The above example have the same output, of course you cannot see the difference between them when executed since this is just a small process, what if you are processing an enormous loop that's eating up your memory? Which loop is better to use? Is there a proper measure when to use what loop?
Edit:
for pakore answer,
From your answer I can safely say that If I reorder my variables where most of the variables that are dependent are far from each other (with other lines inbetween) could have been more efficient. Say for example,
a=0;
b=1;
c=5;
d=1;
for(int i=0; i<10;i++)
{
a=b*CalculateAge()+d;
m=c+GetAvarage(a);
d++;
}
and
a=0;
b=1;
c=5;
d=1;
for(int i=0; i<10;i++)
{
a=b*CalculateAge()+d;
d++;
m=c+GetAvarage(a);
}
The later is more efficient than the first one since in the later I have called an outside method on the first line and the second line is independent from the result of the first line than on the third line.
Since the first example will wait for the result of the first line before executing the second line and on the third, and the second example has already executed the second line while waiting for the result of the first line.
Conclusion:
An optimized loop doesn't mater what kind of loop you are using. As what pakore explained, and polygenelubricants, the main thing that you can mind in your loop is how your code is written inside. It is the compilers job to optimize your code, it also helps if you optimize your code according to its dependency with each variable as what pakore explain below.

Well, it's difficult to explain all the logic behind a loop here.
The compiler will do amazing things for you in order to optimize the loops, so it does not matter if you use while or for because the compiler will translate to assembler anyway.
In order to have a deeper understanding you should learn some assembler and then how a basic processor works, how it reads the instructions and how it processes them.
In order to improve pipelining, it's better to place statements with the same variables far away from each other. This way, while one statement is calculated, the processor can take the next statement if it's independent from the first one and start calculating it.
For example:
a=0;
b=3;
c=5;
m=8;
i=0;
while(i<10){
a=a+b*c;
b=b*10+a;
m=m*5;
i++;
}
We have a dependency here between a and b and the statements are right next to each other. But we see that m and i are independent from the rest, so we can do:
a=0;
b=3;
c=5;
m=8;
i=0;
while(i<10){
a=a+b*c;
m=m*5;
i++;
b=b*10+a;
}
So while a is being calculated, we can start calculating m and i. Most of the times the compiler detects this and does it automatically (it's calle code reordering). Some times for small loops the compiler copy and pastes the code inside the loop as many times as it's needed, because it's faster not to have control variables.
My suggestion is that let the compiler take care about these things and focus on the costs of the algorithms you are using, it's better to reduce from O(n!) to O(logn) than to do micro-optimizations inside the loops.
Update according to the question modified
Well, the dependencies have to be write/write or read/write dependency. If it's read/read dependency there's no problem (because the value does not change). Have a look at the [Data Dependency article] (http://en.wikipedia.org/wiki/Data_dependency).
In your example, there's no difference between the two codes, m depends on c and b but these two are never written, so the compiler knows their value before getting into the loop. This is a called a read/read dependency, and it's not a dependency itself.
If you had written:
...
m=c+GetAvarage(a);
...
Then we would have a write/read dependency (we have to write in a and then read from a, so we have to wait until a is calculated) and the optimization you did would be good.
But once again, the compiler does this for you and many other things. It's difficult to say that a micro-optimization in the high level code is gonna have a real impact in the assembler code, because maybe the compiler is doing that already for you, or maybe is reordering the code for you, or maybe is doing a thousand other things better than we can think of at first glance.
But anyway, it's good just to know how things work under the carpet :)
Update to add some links
Have a look at these links to have a further understand about what the compiler can do to improve your code performance:
Loop unwinding
Dependency Analysis
Automatic parallelization
Vectorization

I will explain each loop case one by one,
1.For loop: When you are sure that you do certain no of iterations then go for for loop.
2.While loop: When you are not sure of no of iterations then go for while loop or you want to loop through until the condition is false.
3.do-while:This is same as while loop except that loop is executed for atleast once.
Having said that , it is also ok to write one loop for another case.
4.Recursion: If you understand recursion correctly, recursion leads to elegant solutions.
recursion is a bit slower than straight forward iteration.
There is no performance differences between for,while,do-while.If any,they are negligible.

You should write the most natural, idiomatic and readable code that clearly conveys your intention. In most scenarios, no one loop is so superiorly performing over the other that you'd sacrifice any of the above for little gain in speed.
Modern compilers for most mainstream languages are really smart at optimizing your code, and can especially target precisely the kinds of good readable codes that people should write. The more complicated your code is, the harder it is for humans to understand, and the harder it can be for compilers to optimize.
Most compilers can optimize tail recursion away, allowing you to express your algorithm recursively (which is the most natural form in some scenarios), but essentially executing it iteratively. Otherwise, a recursion may be slower than an iterative solution, but you should consider all factors before doing this optimization.
If a working, correct, but perhaps a tad slower recursive solution can be written quickly, then it's often preferrable to a complicated iterative solution that may be faster but may not be obviously correct and/or harder to maintain.
Do not optimize prematurely.

Any decent compiler will generate the same code.
To test this I created a file named loops-f.c:
void f(int array[])
{
for(int i=0; i<10; i++) {
array[i] = i+1;
}
}
and a file named loops-g.c:
void g(int array[])
{
int i = 0;
while(i<10) {
array[i] = i+1;
i++;
}
}
I compiled the files to assembly (gcc -std=c99 -S loops-f.c loops-g.c) and then I compared the generated assembly code (diff -u loops-f.s loops-g.s):
--- loops-f.s 2010-08-06 10:57:11.377196516 +0300
+++ loops-g.s 2010-08-06 10:57:11.389197986 +0300
## -1,8 +1,8 ##
- .file "loops-f.c"
+ .file "loops-g.c"
.text
-.globl f
- .type f, #function
-f:
+.globl g
+ .type g, #function
+g:
.LFB0:
.cfi_startproc
pushq %rbp
## -30,6 +30,6 ##
ret
.cfi_endproc
.LFE0:
- .size f, .-f
+ .size g, .-g
.ident "GCC: (GNU) 4.4.4 20100630 (Red Hat 4.4.4-10)"
.section .note.GNU-stack,"",#progbits
As you can see the code is practically identical.

the names of the loops themselves give an idea about the usage.
When you just have to perform an operation, no questions asked, a for loop does well.
In case you are iterating over the data structure and have a constraint, like a break condition or something like that, a while or do while loop should be chosen.

Its just a matter of personal preference and coding style - the preferred style also depends a lot on the language you are coding in.
For example in python the preferred way to do the above loop looks a little like:
for i in range(0, 9):
array[i] = i + 1
(In fact in Python you can do the above in a single line:
array = range(1,10)
But its just an example...)
Of the above two my preference would be with the first one.
As for performance, you are unlikely to see a difference.

It would surely depend on the language you are using, but remember that the slowest part of any code is the person coding it, so I would suggest pick a standard for each situation and stick with it, then when you come to update code you don't need to think about this each time.
If you are trying to make savings in ways such as this then you are either already running at near 100% efficiency or are perhaps looking in the wrong place to quicken up your code?

In the book "Computer organization and design: the hardware/software interface" of Patterson and Hennessy authors transform the loops above to assembly, and both loops have the same assembly code in MIPS.
Differences emerge if your compiler compiles both loops in different assembly statements if not they have same performance.