Whats the difference between these two FOR loops? - c

Im learning C and saw the first loop listed below in the book im reading. Im curious whats the difference between the two as I am used to using the second one and cant figure out the difference even though they return different results.
for(i = 0; i < 10; ++i){}
for(i = 0; i <= 10; i++){}

The first one iterates to 9, the second iterates to 10. That's all.
The pre-/post- increment operation makes no difference.
Un-optimized code generated for both versions:
for(int i = 0; i < 10; ++i)
00E517AE mov dword ptr [i],0
00E517B5 jmp wmain+30h (0E517C0h)
00E517B7 mov eax,dword ptr [i]
00E517BA add eax,1
00E517BD mov dword ptr [i],eax
00E517C0 cmp dword ptr [i],0Ah
00E517C4 jge wmain+53h (0E517E3h)
{
}
for(int i = 0; i <= 10; i++)
00E517E3 mov dword ptr [i],0
00E517EA jmp wmain+65h (0E517F5h)
00E517EC mov eax,dword ptr [i]
00E517EF add eax,1
00E517F2 mov dword ptr [i],eax
00E517F5 cmp dword ptr [i],0Ah
00E517F9 jg wmain+88h (0E51818h)
{
}
So, even here, there is no performance penalty. The fact that i++ is slower than ++i is just not true (at least in this context, where it doesn't make a difference). It would be slower for, say int y = i++, but in this case, the two would do different things, which is not the case here. The performance issue might have been valid on compilers from 20 years ago, but not anymore.

The pre-/post- increment operation will work when you will use them while assging values.
Say
i=10;
j = i++;
Here value of i will be 11 but, value of j will be 10. because i will increment after values is assigned to j ie Post Increment
i=10;
j = ++i;
Here value of i will be 11 and value of j will also be 11. because i will increment before value is assigned to j ie Pre Increment

In the first you have pre increment, and the second has a post increment.
The only thing is the condition, i.e in the first you are checking upto 9 and in the second its upto 10.
In both the loops the increment operator makes no difference in this case

first one will run 10 times. the second one will run 11 times.

Most people have already stated that the number of iterations differs by one, and that the pre- and post increment do not make any difference here.
For c, I would say the first loop is what you more commonly come across. I think the reason for that is that c uses zero-based arrays, and thus the maximum value of the array (or string, as that's an array of chars) is not used as an index in the array (that would be out of bounds). Thus, when looping through an array of length 10 in this example, the first loop would be the more logical one, since you can safely use i as an index for the array. The second loop would result in an error (probably a segmentation fault).
You say you're used to the second one. I don't know why you're used to that, but I assume some other programming language or the fact that in math, loops (summations and such) run up to the limit inclusive (but often then start at one). Zero-based indices can be slightly frustrating in such cases.
In short, in my experience, you'll find the first loop more often, but there are plenty of use-cases for the second.
As for the ++i versus i++: I'm inclined to the latter, since this part of the for statement happens at the end of the loop. A postfix notation feels therefore more logical. But once again, that doesn't really matter.

Related

Why does this piece of code not give error?

void main() {
int i, j=6;
for(; i=j ; j-=2 )
printf("%d",j);
}
By following regular pattern, there should be a condition after first semicolon, but here it is initialization,so this should have given an error.
How is this even a valid format?
But the output is 642
First, let me correct the terminology, the i=j is an assignment, not an initialization.
That said, let's analyze the for loop syntax first.
for ( clause-1 ; expression-2 ; expression-3 ) statement
So, the expression-2 should be an "expression".
Now, coming to the syntax for statement having assignment operator
assignment-expression: conditional-expression unary-expression
assignment-operator assignment-expression
So, as the spec C11 mentions in chapter §6.5.16, the assignment operation is also an expression which fits perfectly for the expression-2 part in for loop syntax.
Regarding the result,
An
assignment expression has the value of the left operand after the assignment,
so, i=j will basically assign the value of j to i and then, the value of i will be used for condition checking, (i.e., non-zero or zero as TRUE or FALSE).
TL;DR Syntactically, there's no issue with the code, so no error is generated by your compiler.
Also, for a hosted environment, void main() should be int main(void) to be conforming to the standard.
i=j is also an expression, the value of which is the value of i after the assignment. So it can serve as a condition.
You'd normally see this type of cleverness used like this:
if ((ptr = some_complex_function()) != NULL)
{
/* Use ptr */
}
Where some programmers like to fold the assignment and check into one line of code. How good or bad this is for readability is a matter of opinion.
Your code does not contain a syntax error, hence the compiler accepts it and generates code to produce 642.
The condition i=j is interpreted as (i = j) != 0.
To prevent this and many similar error patterns, enable more compiler warnings and make them fatal with:
gcc -Wall -W -Werror
If you use clang, use clang -Weverything -Werror
This is a very good question.
To really understand this, you better know how C codes are executed in computer:
First, the compiler will compile the C code into assembly code, then assembly codes will be translated into machine code, which can run in main memory directly.
As for your code:
void main() {
int i, j=6;
for(; i=j ; j-=2 )
printf("%d",j);
}
To figure out why the result is 642, We want to see its assembly code.
Using VS debugging mode, we can see:
Especially look at this:
010217D0 mov eax,dword ptr [j]
010217D3 mov dword ptr [i],eax
010217D6 cmp dword ptr [i],0
010217DA je main+4Fh (010217EFh)
The four lines of assembly code correponding to the C code "i=j", it means, first move the value of j to register eax, then move the value of register eax to i(since computer can not directly move the value of j to i, it just use register eax as a bridge), then compare the value of i with 0, if they are equal, jump to 010217EFh, the loop ends; if not, the loop continues.
So actually it's first an assignment, then a comparision to decide whether the loop is over; as 6 declines to 0 ,the loop finally stops, I hope this can help you understand why the result is 642 :D

Why does this for loop exit on some platforms and not on others?

I have recently started to learn C and I am taking a class with C as the subject. I'm currently playing around with loops and I'm running into some odd behaviour which I don't know how to explain.
#include <stdio.h>
int main()
{
int array[10],i;
for (i = 0; i <=10 ; i++)
{
array[i]=0; /*code should never terminate*/
printf("test \n");
}
printf("%d \n", sizeof(array)/sizeof(int));
return 0;
}
On my laptop running Ubuntu 14.04, this code does not break. It runs to completion. On my school's computer running CentOS 6.6, it also runs fine. On Windows 8.1, the loop never terminates.
What's even more strange is that when I edit the condition of the for loop to: i <= 11, the code only terminates on my laptop running Ubuntu. It never terminates in CentOS and Windows.
Can anyone explain what's happening in the memory and why the different OSes running the same code give different outcomes?
EDIT: I know the for loop goes out of bounds. I'm doing it intentionally. I just can't figure out how the behaviour can be different across different OSes and computers.
On my laptop running Ubuntu 14.04, this code does not break it runs to completion. On my school's computer running CentOS 6.6, it also runs fine. On Windows 8.1, the loop never terminates.
What is more strange is when I edit the conditional of the for loop to: i <= 11, the code only terminates on my laptop running Ubuntu. CentOS and Windows never terminates.
You've just discovered memory stomping. You can read more about it here: What is a “memory stomp”?
When you allocate int array[10],i;, those variables go into memory (specifically, they're allocated on the stack, which is a block of memory associated with the function). array[] and i are probably adjacent to each other in memory. It seems that on Windows 8.1, i is located at array[10]. On CentOS, i is located at array[11]. And on Ubuntu, it's in neither spot (maybe it's at array[-1]?).
Try adding these debugging statements to your code. You should notice that on iteration 10 or 11, array[i] points at i.
#include <stdio.h>
int main()
{
int array[10],i;
printf ("array: %p, &i: %p\n", array, &i);
printf ("i is offset %d from array\n", &i - array);
for (i = 0; i <=11 ; i++)
{
printf ("%d: Writing 0 to address %p\n", i, &array[i]);
array[i]=0; /*code should never terminate*/
}
return 0;
}
The bug lies between these pieces of code:
int array[10],i;
for (i = 0; i <=10 ; i++)
array[i]=0;
Since array only has 10 elements, in the last iteration array[10] = 0; is a buffer overflow. Buffer overflows are UNDEFINED BEHAVIOR, which means they might format your hard drive or cause demons to fly out of your nose.
It is fairly common for all stack variables to be laid out adjacent to each other. If i is located where array[10] writes to, then the UB will reset i to 0, thus leading to the unterminated loop.
To fix, change the loop condition to i < 10.
In what should be the last run of the loop,you write to array[10], but there are only 10 elements in the array, numbered 0 through 9. The C language specification says that this is “undefined behavior”. What this means in practice is that your program will attempt to write to the int-sized piece of memory that lies immediately after array in memory. What happens then depends on what does, in fact, lie there, and this depends not only on the operating system but more so on the compiler, on the compiler options (such as optimization settings), on the processor architecture, on the surrounding code, etc. It could even vary from execution to execution, e.g. due to address space randomization (probably not on this toy example, but it does happen in real life). Some possibilities include:
The location wasn't used. The loop terminates normally.
The location was used for something which happened to have the value 0. The loop terminates normally.
The location contained the function's return address. The loop terminates normally, but then the program crashes because it tries to jump to the address 0.
The location contains the variable i. The loop never terminates because i restarts at 0.
The location contains some other variable. The loop terminates normally, but then “interesting” things happen.
The location is an invalid memory address, e.g. because array is right at the end of a virtual memory page and the next page isn't mapped.
Demons fly out of your nose. Fortunately most computers lack the requisite hardware.
What you observed on Windows was that the compiler decided to place the variable i immediately after the array in memory, so array[10] = 0 ended up assigning to i. On Ubuntu and CentOS, the compiler didn't place i there. Almost all C implementations do group local variables in memory, on a memory stack, with one major exception: some local variables can be placed entirely in registers. Even if the variable is on the stack, the order of variables is determined by the compiler, and it may depend not only on the order in the source file but also on their types (to avoid wasting memory to alignment constraints that would leave holes), on their names, on some hash value used in a compiler's internal data structure, etc.
If you want to find out what your compiler decided to do, you can tell it to show you the assembler code. Oh, and learn to decipher assembler (it's easier than writing it). With GCC (and some other compilers, especially in the Unix world), pass the option -S to produce assembler code instead of a binary. For example, here's the assembler snippet for the loop from compiling with GCC on amd64 with the optimization option -O0 (no optimization), with comments added manually:
.L3:
movl -52(%rbp), %eax ; load i to register eax
cltq
movl $0, -48(%rbp,%rax,4) ; set array[i] to 0
movl $.LC0, %edi
call puts ; printf of a constant string was optimized to puts
addl $1, -52(%rbp) ; add 1 to i
.L2:
cmpl $10, -52(%rbp) ; compare i to 10
jle .L3
Here the variable i is 52 bytes below the top of the stack, while the array starts 48 bytes below the top of the stack. So this compiler happens to have placed i just before the array; you'd overwrite i if you happened to write to array[-1]. If you change array[i]=0 to array[9-i]=0, you'll get an infinite loop on this particular platform with these particular compiler options.
Now let's compile your program with gcc -O1.
movl $11, %ebx
.L3:
movl $.LC0, %edi
call puts
subl $1, %ebx
jne .L3
That's shorter! The compiler has not only declined to allocate a stack location for i — it's only ever stored in the register ebx — but it hasn't bothered to allocate any memory for array, or to generate code to set its elements, because it noticed that none of the elements are ever used.
To make this example more telling, let's ensure that the array assignments are performed by providing the compiler with something it isn't able to optimize away. An easy way to do that is to use the array from another file — because of separate compilation, the compiler doesn't know what happens in another file (unless it optimizes at link time, which gcc -O0 or gcc -O1 doesn't). Create a source file use_array.c containing
void use_array(int *array) {}
and change your source code to
#include <stdio.h>
void use_array(int *array);
int main()
{
int array[10],i;
for (i = 0; i <=10 ; i++)
{
array[i]=0; /*code should never terminate*/
printf("test \n");
}
printf("%zd \n", sizeof(array)/sizeof(int));
use_array(array);
return 0;
}
Compile with
gcc -c use_array.c
gcc -O1 -S -o with_use_array1.c with_use_array.c use_array.o
This time the assembler code looks like this:
movq %rsp, %rbx
leaq 44(%rsp), %rbp
.L3:
movl $0, (%rbx)
movl $.LC0, %edi
call puts
addq $4, %rbx
cmpq %rbp, %rbx
jne .L3
Now the array is on the stack, 44 bytes from the top. What about i? It doesn't appear anywhere! But the loop counter is kept in the register rbx. It's not exactly i, but the address of the array[i]. The compiler has decided that since the value of i was never used directly, there was no point in performing arithmetic to calculate where to store 0 during each run of the loop. Instead that address is the loop variable, and the arithmetic to determine the boundaries was performed partly at compile time (multiply 11 iterations by 4 bytes per array element to get 44) and partly at run time but once and for all before the loop starts (perform a subtraction to get the initial value).
Even on this very simple example, we've seen how changing compiler options (turn on optimization) or changing something minor (array[i] to array[9-i]) or even changing something apparently unrelated (adding the call to use_array) can make a significant difference to what the executable program generated by the compiler does. Compiler optimizations can do a lot of things that may appear unintuitive on programs that invoke undefined behavior. That's why undefined behavior is left completely undefined. When you deviate ever so slightly from the tracks, in real-world programs, it can be very hard to understand the relationship between what the code does and what it should have done, even for experienced programmers.
Unlike Java, C doesn't do array boundary check, i.e, there's no ArrayIndexOutOfBoundsException, the job of making sure the array index is valid is left to the programmer. Doing this on purpose leads to undefined behavior, anything could happen.
For an array:
int array[10]
indexes are only valid in the range 0 to 9. However, you are trying to:
for (i = 0; i <=10 ; i++)
access array[10] here, change the condition to i < 10
You have a bounds violation, and on the non-terminating platforms, I believe you are inadvertently setting i to zero at the end of the loop, so that it starts over again.
array[10] is invalid; it contains 10 elements, array[0] through array[9], and array[10] is the 11th. Your loop should be written to stop before 10, as follows:
for (i = 0; i < 10; i++)
Where array[10] lands is implementation-defined, and amusingly, on two of your platforms, it lands on i, which those platforms apparently lay out directly after array. i is set to zero and the loop continues forever. For your other platforms, i may be located before array, or array may have some padding after it.
You declare int array[10] means array has index 0 to 9 (total 10 integer elements it can hold). But the following loop,
for (i = 0; i <=10 ; i++)
will loop 0 to 10 means 11 time. Hence when i = 10 it will overflow the buffer and cause Undefined Behavior.
So try this:
for (i = 0; i < 10 ; i++)
or,
for (i = 0; i <= 9 ; i++)
It is undefined at array[10], and gives undefined behavior as described before. Think about it like this:
I have 10 items in my grocery cart. They are:
0: A box of cereal
1: Bread
2: Milk
3: Pie
4: Eggs
5: Cake
6: A 2 liter of soda
7: Salad
8: Burgers
9: Ice cream
cart[10] is undefined, and may give an out of bounds exception in some compilers. But, a lot apparently don't. The apparent 11th item is an item not actually in the cart. The 11th item is pointing to, what I'm going to call, a "poltergeist item." It never existed, but it was there.
Why some compilers give i an index of array[10] or array[11] or even array[-1] is because of your initialization/declaration statement. Some compilers interpret this as:
"Allocate 10 blocks of ints for array[10] and another int block. to make it easier, put them right next to each other."
Same as before, but move it a space or two away, so that array[10] doesn't point to i.
Do the same as before, but allocate i at array[-1] (because an index of an array can't, or shouldn't, be negative), or allocate it at a completely different spot because the OS can handle it, and it's safer.
Some compilers want things to go quicker, and some compilers prefer safety. It's all about the context. If I was developing an app for the ancient BREW OS (the OS of a basic phone), for example, it wouldn't care about safety. If I was developing for an iPhone 6, then it could run fast no matter what, so I would need an emphasis on safety. (Seriously, have you read Apple's App Store Guidelines, or read up on the development of Swift and Swift 2.0?)
Since you created an array of size 10, for loop condition should be as follows:
int array[10],i;
for (i = 0; i <10 ; i++)
{
Currently you are trying to access the unassigned location from the memory using array[10] and it is causing the undefined behavior. Undefined behavior means your program will behave undetermined fashion, so it can give different outputs in each execution.
Well, C compiler traditionally does not check for bounds. You can get a segmentation fault in case you refer to a location that does not "belong" to your process. However, the local variables are allocated on stack and depending on the way the memory is allocated, the area just beyond the array (array[10]) may belong to the process' memory segment. Thus, no segmentation fault trap is thrown and that is what you seem to experience. As others have pointed out, this is undefined behavior in C and your code may be considered erratic. Since you are learning C, you are better off getting into the habit of checking for bounds in your code.
Beyond the possibility that memory might be laid out so that an attempt to write to a[10] actually overwrites i, it would also be possible that an optimizing compiler might determine that the loop test cannot be reached with a value of i greater than ten without code having first accessed the non-existent array element a[10].
Since an attempt to access that element would be undefined behavior, the compiler would have no obligations with regard to what the program might do after that point. More specifically, since the compiler would have no obligation to generate code to check the loop index in any case where it might be greater than ten, it would have no obligation to generate code to check it at all; it could instead assume that the <=10 test will always yield true. Note that this would be true even if the code would read a[10] rather than writing it.
When you iterate past i==9 you assign zero to the 'array items' which are actually located past the array, so you're overwritnig some other data. Most probably you overwrite the i variable, which is located after a[]. That way you simply reset the i variable to zero and thus restart the loop.
You could discover that yourself if you printed i in the loop:
printf("test i=%d\n", i);
instead of just
printf("test \n");
Of course that result strongly depends on the memory allocation for your variables, which in turn depends on a compiler and its settings, so it is generally Undefined Behavior — that's why results on different machines or different operating systems or on different compilers may differ.
the error is in portion array[10] w/c is also address of i (int array[10],i;).
when array[10] is set to 0 then the i would be 0 w/c resets the entire loop and
causes the infinite loop.
there will be infinite loop if array[10] is between 0-10.the correct loop should be for (i = 0; i <10 ; i++) {...}
int array[10],i;
for (i = 0; i <=10 ; i++)
array[i]=0;
I will suggest something that I dint find above:
Try assigning array[i] = 20;
I guess this should terminate the code everywhere.. (given you keep i< =10 or ll)
If this runs you can firmly decide that the answers specified here already are correct [the answer related to memory stomping one for ex.]
There are two things wrong here. The int i is actually an array element, array[10], as seen on the stack. Because you have allowed the indexing to actually make array[10] = 0, the loop index, i, will never exceed 10. Make it for(i=0; i<10; i+=1).
i++ is, as K&R would call it, 'bad style'. It is incrementing i by the size of i, not 1. i++ is for pointer math and i+=1 is for algebra. While this depends on the compiler, it is not a good convention for portability.

is it good or bad to reuse the variables?

I wonder if it is good or bad (or does not matter) if I reuse the variable names as much as possible? for example
int main(void){
//...
int x=0;
//..
x = atoi(char_var);
//..
for (x=0; x<12; x++){
//...
}
//..
x = socket(...)
if(x<0){
//...
}
for(x=0;x<100;x++{
//...
}
return 0;
}
Another variables could be used instead of x above (might be better for readability), but I wonder if it would provide me any benefit for the binary size, performance, or anything else?
In general it's very poor practice to reuse variable names for different purposes - if someone else needs to maintain your code later on the person will have to find these "context switches" in your code where x now suddenly means something other than what it meant before that line of code.
You may get some memory savings but that's so small compared to the problems it introduces that it's advised against. (Read the edit below, too.)
Typically it's also recommended not to use 1-character variable names for other than loop counters. One could argue that x could also be an X coordinate but I'd use some prefix or a longer name in that case. Single-letter variable names are too short to give meaningful hints about the purpose of a variable.
Edit: as several comments (and some of the other answers) pointed out, the potential memory savings (if any) depend on how good the compiler is. Well-written optimizing compilers may realize that two variables have no overlapping lifetimes so they only allocate one variable slot anyway. The end result would be no run-time gain and still less maintainable source code. This just reinforces the argument: don't reuse variables.
As with almost everything in programming, it depends on the situation.
If you're reusing the same variable for different purposes, then it makes your code less readable and you shouldn't be doing it. If the purpose is the same (e.g. loop counters), then you can reuse with no problem since this isn't making your code less readable.
Reusing a variable will avoid reserving space in the stack, which results in a faster (you don't waste time reserving space in stack and pushing the value) and less memory consuming (you're not storing it in the stack) program. But this benefits are absolutely negligible in the whole program context, and also relative to architecture, language and compiler. So I would worry more about readability than this tiny benefits.
Bad. For simple types like ints, passed by value, the compiler will be able to figure out when they are unneeded and reuse the space.
For example, I compiled the following C++ code in Visual Studio 2010 using 32-bit Release mode:
for (int i = 0; i < 4; ++i)
{
printf("%d\n", i);
}
for (int j = 0; j < 4; ++j)
{
printf("%d\n", j);
}
and got the following assembler output:
; 5 : for (int i = 0; i < 4; ++i)
mov edi, DWORD PTR __imp__printf
xor esi, esi
npad 6
$LL6#main:
; 6 : {
; 7 : printf("%d\n", i);
push esi
push OFFSET ??_C#_03PMGGPEJJ#?$CFd?6?$AA#
call edi
inc esi
add esp, 8
cmp esi, 4
jl SHORT $LL6#main
; 8 : }
; 9 :
; 10 : for (int j = 0; j < 4; ++j)
xor esi, esi
$LL3#main:
; 11 : {
; 12 : printf("%d\n", j);
push esi
push OFFSET ??_C#_03PMGGPEJJ#?$CFd?6?$AA#
call edi
inc esi
add esp, 8
cmp esi, 4
jl SHORT $LL3#main
; 13 : }
You can see that the compiler is using the esi register for both i and j.
int x=0;
//..
x = atoi(char_var);
//..
int x = 0;
You cannot redeclare x in the same scope. If you are not redeclaring it but using it for different purposes, you are free to do this. But it's a bad practice and should be avoided as it decreases code readability. Also you should find meaningful names for your variables for the same reasons.
You can re-use it but i don't think it will bring a any significant benefit to your programm and it will make your code less readable.
Put it this way - how would you like it if I wrote a big pile of undocumented, complex code in such a way and, then, you get the job of maintaining/enhancing it.
Please do not do such a thing, ever :)
In general for any language, if you reuse variable names, and then you decide to refactor part of your code into another method, you end up having to add or edit declarations.
int i;
for(i = 0; i < 10; ++i) {
printf("%d\t%d\n", i , i * i);
}
for(i = 0; i < 10; ++i) {
printf("%d\t%d\n", i , i * i * i);
}
Suppose you take the second loop and move it to a print_cubes method. You will not be able to just cut and paste the for loop, as i will have no declaration there. A good IDE might be able to insert the declaration, but it might worry about the side-effects on i in the code you've typed.
In general, compilers can consolidate used variables by what are called graph-coloring algorithms. Consider this variant:
for(int i = 0; i < 10; ++i) { // BLOCK 1
printf("%d\t%d\n", i , i * i);
} // END BLOCK 1
for(int j = 0; j < 10; ++j) { // BLOCK 2
printf("%d\t%d\n", j , j * j * j);
} // END BLOCK 2
The compiler lists the used variables: i, j. It lists the blocks being used: BLOCK 1, BLOCK 2. The parent function is also a block, but i and j are visible only in BLOCK 1 and BLOCK 2. So, it makes a graph of the variables, and connects them only if they are visible in the same block. It then tries to compute the minimum number of colors needed to color each vertex without giving two adjacent vertices the same color, similar to the Haken-Appel Four Color Theorem. Here; only one color is needed.
It is better to reuse variables in terms of memory.
But be careful you don't need the value in a variable before reusing it.
Other than that, you shouldn't use always the variable. It is important to keep a clean and readable code. So I advice you to choose different variable with names depending on the context so that your code don't become confusing.
You should also take a look at dynamic memory allocation in C, which is very useful to manage memory and variables.
https://en.wikipedia.org/wiki/C_dynamic_memory_allocation
Only drawback is readability of your code.
Reusing variables you are saving memory.
Speed is not affected (unless you have to use more instructions in order to reuse variable).

What's the best way to set all elements of array to a value?

I have an array of ints and I'd like to set all values in the array to 'x' everytime a function is called.
I've looked at memset but that would only work for an array of bytes I think.
I could do the obvious for loop, but I'm guessing there is a standard lib function out there that will accomplish this better. Anyone know?
Just loop it, pretty much. Or memset to 0, if you know the value is zero (similar for other values for which you have knowledge of the bit representation). There won't be a standard lib solution, since the standard lib can't know of particular user types.
If you're on an x86 system, you can use some assembly for this. For instance, in gcc:
__asm__(
"rep stosb"
: "=a"('x'), "=c"(count), "=D"(array)
);
Should do the trick.
rep stosb takes the value in AL and assigns it to consecutive memory locations pointed to by ES:EDI. The number of the locations is specified in ECX.
As an aside, in recent processors Intel has made many efforts to improve the performance of MOVSB and STOSB, so this is a good way to go about it.
In addition to memset and looping (which are both O(n) time), it can be actually done in O(1) - but at the cost of triple the amount of memory, and more expensive look ups later on.
This article describes how it can be done.
The idea is to maintain additional stack (logically, implemented as array+ pointer to top) and array, the additional array will indicate when it was first initialized (a number from 0 to n) and the stack will indicate which elements were already modified.
When you access array[i], if stack[additionalArray[i]] == i && i < top the value of the array is array[i]. Otherwise - it is the "initialized" value.
When doing array[i] = x, if it was not initialized yet (as seen before), you should set additionalArray[i] = stack[top] and increase top.
This results in O(1) initialization, but as said it requires additional memory and each access is more expansive.
Below logic will helps you.
...
int a[100] = {0};
int b = 5;
memset_ex(a, 100, &b, sizeof(int));
...
memset_ex(void *buf, int buf_size, void *value, int size_of_type)
{
int i = 0;
for(i = 0; i <= (buf_size - size_of_type); i +=size_of_type)
{
memcpy((buf + i), value, size_of_type);
}
}

Is it faster to count down than it is to count up?

Our computer science teacher once said that for some reason it is faster to count down than to count up.
For example if you need to use a FOR loop and the loop index is not used somewhere (like printing a line of N * to the screen)
I mean that code like this:
for (i = N; i >= 0; i--)
putchar('*');
is faster than:
for (i = 0; i < N; i++)
putchar('*');
Is it really true? And if so, does anyone know why?
Is it really true? and if so does anyone know why?
In ancient days, when computers were still chipped out of fused silica by hand, when 8-bit microcontrollers roamed the Earth, and when your teacher was young (or your teacher's teacher was young), there was a common machine instruction called decrement and skip if zero (DSZ). Hotshot assembly programmers used this instruction to implement loops. Later machines got fancier instructions, but there were still quite a few processors on which it was cheaper to compare something with zero than to compare with anything else. (It's true even on some modern RISC machines, like PPC or SPARC, which reserve a whole register to be always zero.)
So, if you rig your loops to compare with zero instead of N, what might happen?
You might save a register
You might get a compare instruction with a smaller binary encoding
If a previous instruction happens to set a flag (likely only on x86 family machines), you might not even need an explicit compare instruction
Are these differences likely to result in any measurable improvement on real programs on a modern out-of-order processor? Highly unlikely. In fact, I'd be impressed if you could show a measurable improvement even on a microbenchmark.
Summary: I smack your teacher upside the head! You shouldn't be learning obsolete pseudo-facts about how to organize loops. You should be learning that the most important thing about loops is to be sure that they terminate, produce correct answers, and are easy to read. I wish your teacher would focus on the important stuff and not mythology.
Here's what might happen on some hardware depending on what the compiler can deduce about the range of the numbers you're using: with the incrementing loop you have to test i<N each time round the loop. For the decrementing version, the carry flag (set as a side effect of the subtraction) may automatically tell you if i>=0. That saves a test per time round the loop.
In reality, on modern pipelined processor hardware, this stuff is almost certainly irrelevant as there isn't a simple 1-1 mapping from instructions to clock cycles. (Though I could imagine it coming up if you were doing things like generating precisely timed video signals from a microcontroller. But then you'd write in assembly language anyway.)
In the Intel x86 instruction set, building a loop to count down to zero can usually be done with fewer instructions than a loop that counts up to a non-zero exit condition. Specifically, the ECX register is traditionally used as a loop counter in x86 asm, and the Intel instruction set has a special jcxz jump instruction that tests the ECX register for zero and jumps based on the result of the test.
However, the performance difference will be negligible unless your loop is already very sensitive to clock cycle counts. Counting down to zero might shave 4 or 5 clock cycles off each iteration of the loop compared to counting up, so it's really more of a novelty than a useful technique.
Also, a good optimizing compiler these days should be able to convert your count up loop source code into count down to zero machine code (depending on how you use the loop index variable) so there really isn't any reason to write your loops in strange ways just to squeeze a cycle or two here and there.
Yes..!!
Counting from N down to 0 is slightly faster that Counting from 0 to N in the sense of how hardware will handle comparison..
Note the comparison in each loop
i>=0
i<N
Most processors have comparison with zero instruction..so the first one will be translated to machine code as:
Load i
Compare and jump if Less than or Equal zero
But the second one needs to load N form Memory each time
load i
load N
Sub i and N
Compare and jump if Less than or Equal zero
So it is not because of counting down or up.. But because of how your code will be translated into machine code..
So counting from 10 to 100 is the same as counting form 100 to 10
But counting from i=100 to 0 is faster than from i=0 to 100 - in most cases
And counting from i=N to 0 is faster than from i=0 to N
Note that nowadays compilers may do this optimization for you (if it is smart enough)
Note also that pipeline can cause Belady's anomaly-like effect (can not be sure what will be better)
At last: please note that the 2 for loops you have presented are not equivalent.. the first prints one more * ....
Related:
Why does n++ execute faster than n=n+1?
In C to psudo-assembly:
for (i = 0; i < 10; i++) {
foo(i);
}
turns into
clear i
top_of_loop:
call foo
increment i
compare 10, i
jump_less top_of_loop
while:
for (i = 10; i >= 0; i--) {
foo(i);
}
turns into
load i, 10
top_of_loop:
call foo
decrement i
jump_not_neg top_of_loop
Note the lack of the compare in the second psudo-assembly. On many architectures there are flags that are set by arithmatic operations (add, subtract, multiply, divide, increment, decrement) which you can use for jumps. These often give you what is essentially a comparison of the result of the operation with 0 for free. In fact on many architectures
x = x - 0
is semantically the same as
compare x, 0
Also, the compare against a 10 in my example could result in worse code. 10 may have to live in a register, so if they are in short supply that costs and may result in extra code to move things around or reload the 10 every time through the loop.
Compilers can sometimes rearrange the code to take advantage of this, but it is often difficult because they are often unable to be sure that reversing the direction through the loop is semantically equivalent.
Count down faster in case like this:
for (i = someObject.getAllObjects.size(); i >= 0; i--) {…}
because someObject.getAllObjects.size() executes once at the beginning.
Sure, similar behaviour can be achieved by calling size() out of the loop, as Peter mentioned:
size = someObject.getAllObjects.size();
for (i = 0; i < size; i++) {…}
What matters much more than whether you're increasing or decreasing your counter is whether you're going up memory or down memory. Most caches are optimized for going up memory, not down memory. Since memory access time is the bottleneck that most programs today face, this means that changing your program so that you go up memory might result in a performance boost even if this requires comparing your counter to a non-zero value. In some of my programs, I saw a significant improvement in performance by changing my code to go up memory instead of down it.
Skeptical? Just write a program to time loops going up/down memory. Here's the output that I got:
Average Up Memory = 4839 mus
Average Down Memory = 5552 mus
Average Up Memory = 18638 mus
Average Down Memory = 19053 mus
(where "mus" stands for microseconds) from running this program:
#include <chrono>
#include <iostream>
#include <random>
#include <vector>
using namespace std;
//Sum all numbers going up memory.
template<class Iterator, class T>
inline void sum_abs_up(Iterator first, Iterator one_past_last, T &total) {
T sum = 0;
auto it = first;
do {
sum += *it;
it++;
} while (it != one_past_last);
total += sum;
}
//Sum all numbers going down memory.
template<class Iterator, class T>
inline void sum_abs_down(Iterator first, Iterator one_past_last, T &total) {
T sum = 0;
auto it = one_past_last;
do {
it--;
sum += *it;
} while (it != first);
total += sum;
}
//Time how long it takes to make num_repititions identical calls to sum_abs_down().
//We will divide this time by num_repitions to get the average time.
template<class T>
chrono::nanoseconds TimeDown(vector<T> &vec, const vector<T> &vec_original,
size_t num_repititions, T &running_sum) {
chrono::nanoseconds total{0};
for (size_t i = 0; i < num_repititions; i++) {
auto start_time = chrono::high_resolution_clock::now();
sum_abs_down(vec.begin(), vec.end(), running_sum);
total += chrono::high_resolution_clock::now() - start_time;
vec = vec_original;
}
return total;
}
template<class T>
chrono::nanoseconds TimeUp(vector<T> &vec, const vector<T> &vec_original,
size_t num_repititions, T &running_sum) {
chrono::nanoseconds total{0};
for (size_t i = 0; i < num_repititions; i++) {
auto start_time = chrono::high_resolution_clock::now();
sum_abs_up(vec.begin(), vec.end(), running_sum);
total += chrono::high_resolution_clock::now() - start_time;
vec = vec_original;
}
return total;
}
template<class Iterator, typename T>
void FillWithRandomNumbers(Iterator start, Iterator one_past_end, T a, T b) {
random_device rnd_device;
mt19937 generator(rnd_device());
uniform_int_distribution<T> dist(a, b);
for (auto it = start; it != one_past_end; it++)
*it = dist(generator);
return ;
}
template<class Iterator>
void FillWithRandomNumbers(Iterator start, Iterator one_past_end, double a, double b) {
random_device rnd_device;
mt19937_64 generator(rnd_device());
uniform_real_distribution<double> dist(a, b);
for (auto it = start; it != one_past_end; it++)
*it = dist(generator);
return ;
}
template<class ValueType>
void TimeFunctions(size_t num_repititions, size_t vec_size = (1u << 24)) {
auto lower = numeric_limits<ValueType>::min();
auto upper = numeric_limits<ValueType>::max();
vector<ValueType> vec(vec_size);
FillWithRandomNumbers(vec.begin(), vec.end(), lower, upper);
const auto vec_original = vec;
ValueType sum_up = 0, sum_down = 0;
auto time_up = TimeUp(vec, vec_original, num_repititions, sum_up).count();
auto time_down = TimeDown(vec, vec_original, num_repititions, sum_down).count();
cout << "Average Up Memory = " << time_up/(num_repititions * 1000) << " mus\n";
cout << "Average Down Memory = " << time_down/(num_repititions * 1000) << " mus"
<< endl;
return ;
}
int main() {
size_t num_repititions = 1 << 10;
TimeFunctions<int>(num_repititions);
cout << '\n';
TimeFunctions<double>(num_repititions);
return 0;
}
Both sum_abs_up and sum_abs_down do the same thing (sum the vector of numbers) and are timed the same way with the only difference being that sum_abs_up goes up memory while sum_abs_down goes down memory. I even pass vec by reference so that both functions access the same memory locations. Nevertheless, sum_abs_up is consistently faster than sum_abs_down. Give it a run yourself (I compiled it with g++ -O3).
It's important to note how tight the loop that I'm timing is. If a loop's body is large (has a lot of code) then it likely won't matter whether its iterator goes up or down memory since the time it takes to execute the loop's body will likely completely dominate. Also, it's important to mention that with some rare loops, going down memory is sometimes faster than going up it. But even with such loops it was never the case that going up memory was always slower than going down (unlike small-bodied loops that go up memory, for which the opposite is frequently true; in fact, for a small handful of loops I've timed, the increase in performance by going up memory was 40+%).
The point is, as a rule of thumb, if you have the option, if the loop's body is small, and if there's little difference between having your loop go up memory instead of down it, then you should go up memory.
FYI vec_original is there for experimentation, to make it easy to change sum_abs_up and sum_abs_down in a way that makes them alter vec while not allowing these changes to affect future timings. I highly recommend playing around with sum_abs_up and sum_abs_down and timing the results.
On some older CPUs there are/were instructions like DJNZ == "decrement and jump if not zero". This allowed for efficient loops where you loaded an initial count value into a register and then you could effectively manage a decrementing loop with one instruction. We're talking 1980s ISAs here though - your teacher is seriously out of touch if he thinks this "rule of thumb" still applies with modern CPUs.
Is it faster to count down than up?
Maybe. But far more than 99% of the time it won't matter, so you should use the most 'sensible' test for terminating the loop, and by sensible, I mean that it takes the least amount of thought by a reader to figure out what the loop is doing (including what makes it stop). Make your code match the mental (or documented) model of what the code is doing.
If the loop is working it's way up through an array (or list, or whatever), an incrementing counter will often match up better with how the reader might be thinking of what the loop is doing - code your loop this way.
But if you're working through a container that has N items, and are removing the items as you go, it might make more cognitive sense to work the counter down.
A bit more detail on the 'maybe' in the answer:
It's true that on most architectures, testing for a calculation resulting in zero (or going from zero to negative) requires no explicit test instruction - the result can be checked directly. If you want to test whether a calculation results in some other number, the instruction stream will generally have to have an explicit instruction to test for that value. However, especially with modern CPUs, this test will usually add less than noise-level additional time to a looping construct. Particularly if that loop is performing I/O.
On the other hand, if you count down from zero, and use the counter as an array index, for example, you might find the code working against the memory architecture of the system - memory reads will often cause a cache to 'look ahead' several memory locations past the current one in anticipation of a sequential read. If you're working backwards through memory, the caching system might not anticipate reads of a memory location at a lower memory address. In this case, it's possible that looping 'backwards' might hurt performance. However, I'd still probably code the loop this way (as long as performance didn't become an issue) because correctness is paramount, and making the code match a model is a great way to help ensure correctness. Incorrect code is as unoptimized as you can get.
So I would tend to forget the professor's advice (of course, not on his test though - you should still be pragmatic as far as the classroom goes), unless and until the performance of the code really mattered.
Bob,
Not until you are doing microoptimizations, at which point you will have the manual for your CPU to hand. Further, if you were doing that sort of thing, you probably wouldn't be needing to ask this question anyway. :-) But, your teacher evidently doesn't subscribe to that idea....
There are 4 things to consider in your loop example:
for (i=N;
i>=0; //thing 1
i--) //thing 2
{
putchar('*'); //thing 3
}
Comparison
Comparison is (as others have indicated) relevant to particular processor architectures. There are more types of processors than those that run Windows. In particular, there might be an instruction that simplifies and speeds up comparisons with 0.
Adjustment
In some cases, it is faster to adjust up or down. Typically a good compiler will figure it out and redo the loop if it can. Not all compilers are good though.
Loop Body
You are accessing a syscall with putchar. That is massively slow. Plus, you are rendering onto the screen (indirectly). That is even slower. Think 1000:1 ratio or more. In this situation, the loop body totally and utterly outweighs the cost of the loop adjustment/comparison.
Caches
A cache and memory layout can have a large effect on performance. In this situation, it doesn't matter. However, if you were accessing an array and needed optimal performance, it would behoove you to investigate how your compiler and your processor laid out memory accessses and to tune your software to make the most of that. The stock example is the one given in relation to matrix multiplication.
It can be faster.
On the NIOS II processor I'm currently working with, the traditional for loop
for(i=0;i<100;i++)
produces the assembly:
ldw r2,-3340(fp) %load i to r2
addi r2,r2,1 %increase i by 1
stw r2,-3340(fp) %save value of i
ldw r2,-3340(fp) %load value again (???)
cmplti r2,r2,100 %compare if less than equal 100
bne r2,zero,0xa018 %jump
If we count down
for(i=100;i--;)
we get an assembly that needs 2 instructions less.
ldw r2,-3340(fp)
addi r3,r2,-1
stw r3,-3340(fp)
bne r2,zero,0xa01c
If we have nested loops, where the inner loop is executed a lot, we can have a measurable difference:
int i,j,a=0;
for(i=100;i--;){
for(j=10000;j--;){
a = j+1;
}
}
If the inner loop is written like above, the execution time is: 0.12199999999999999734 seconds.
If the inner loop is written the traditional way, the execution time is: 0.17199999999999998623 seconds. So the loop counting down is about 30% faster.
But: this test was made with all GCC optimizations turned off. If we turn them on, the compiler is actually smarter than this handish optimization and even keeps the value in a register during the whole loop and we would get an assembly like
addi r2,r2,-1
bne r2,zero,0xa01c
In this particular example the compiler even notices, that variable a will allways be 1 after the loop execution and skips the loops alltogether.
However I experienced that sometimes if the loop body is complex enough, the compiler is not able to do this optimization, so the safest way to always get a fast loop execution is to write:
register int i;
for(i=10000;i--;)
{ ... }
Of course this only works, if it does not matter that the loop is executed in reverse and like Betamoo said, only if you are counting down to zero.
regardless of the direction always use the prefix form (++i instead of i++)!
for (i=N; i>=0; --i)
or
for (i=0; i<N; ++i)
Explanation: http://www.eskimo.com/~scs/cclass/notes/sx7b.html
Furthermore you can write
for (i=N; i; --i)
But i would expect modern compilers to be able to do exactly these optimizations.
It is an interesting question, but as a practical matter I don't think it's important and does not make one loop any better than the other.
According to this wikipedia page: Leap second, "...the solar day becomes 1.7 ms longer every century due mainly to tidal friction." But if you are counting days until your birthday, do you really care about this tiny difference in time?
It's more important that the source code is easy to read and understand. Those two loops are a good example of why readability is important -- they don't loop the same number of times.
I would bet that most programmers read (i = 0; i < N; i++) and understand immediately that this loops N times. A loop of (i = 1; i <= N; i++), for me anyway, is a little less clear, and with (i = N; i > 0; i--) I have to think about it for a moment. It's best if the intent of the code goes directly into the brain without any thinking required.
Strangely, it appears that there IS a difference. At least, in PHP. Consider following benchmark:
<?php
print "<br>".PHP_VERSION;
$iter = 100000000;
$i=$t1=$t2=0;
$t1 = microtime(true);
for($i=0;$i<$iter;$i++){}
$t2 = microtime(true);
print '<br>$i++ : '.($t2-$t1);
$t1 = microtime(true);
for($i=$iter;$i>0;$i--){}
$t2 = microtime(true);
print '<br>$i-- : '.($t2-$t1);
$t1 = microtime(true);
for($i=0;$i<$iter;++$i){}
$t2 = microtime(true);
print '<br>++$i : '.($t2-$t1);
$t1 = microtime(true);
for($i=$iter;$i>0;--$i){}
$t2 = microtime(true);
print '<br>--$i : '.($t2-$t1);
Results are interesting:
PHP 5.2.13
$i++ : 8.8842368125916
$i-- : 8.1797409057617
++$i : 8.0271911621094
--$i : 7.1027431488037
PHP 5.3.1
$i++ : 8.9625310897827
$i-- : 8.5790238380432
++$i : 5.9647901058197
--$i : 5.4021768569946
If someone knows why, it would be nice to know :)
EDIT: Results are the same even if you start counting not from 0, but other arbitrary value. So there is probably not only comparison to zero which makes a difference?
What your teacher have said was some oblique statement without much clarification.
It is NOT that decrementing is faster than incrementing but you can create much much faster loop with decrement than with increment.
Without going on at length about it, without need of using loop counter etc - what matters below is just speed and loop count (non zero).
Here is how most people implement loop with 10 iterations:
int i;
for (i = 0; i < 10; i++)
{
//something here
}
For 99% of cases it is all one may need but along with PHP, PYTHON, JavaScript there is the whole world of time critical software (usually embedded, OS, games etc) where CPU ticks really matter so look briefly at assembly code of:
int i;
for (i = 0; i < 10; i++)
{
//something here
}
after compilation (without optimisation) compiled version may look like this (VS2015):
-------- C7 45 B0 00 00 00 00 mov dword ptr [i],0
-------- EB 09 jmp labelB
labelA 8B 45 B0 mov eax,dword ptr [i]
-------- 83 C0 01 add eax,1
-------- 89 45 B0 mov dword ptr [i],eax
labelB 83 7D B0 0A cmp dword ptr [i],0Ah
-------- 7D 02 jge out1
-------- EB EF jmp labelA
out1:
The whole loop is 8 instructions (26 bytes). In it - there are actually 6 instructions (17 bytes) with 2 branches. Yes yes I know it can be done better (its just an example).
Now consider this frequent construct which you will often find written by embedded developer:
i = 10;
do
{
//something here
} while (--i);
It also iterates 10 times (yes I know i value is different compared with shown for loop but we care about iteration count here).
This may be compiled into this:
00074EBC C7 45 B0 01 00 00 00 mov dword ptr [i],1
00074EC3 8B 45 B0 mov eax,dword ptr [i]
00074EC6 83 E8 01 sub eax,1
00074EC9 89 45 B0 mov dword ptr [i],eax
00074ECC 75 F5 jne main+0C3h (074EC3h)
5 instructions (18 bytes) and just one branch. Actually there are 4 instruction in the loop (11 bytes).
The best thing is that some CPUs (x86/x64 compatible included) have instruction that may decrement a register, later compare result with zero and perform branch if result is different than zero. Virtually ALL PC cpus implement this instruction. Using it the loop is actually just one (yes one) 2 byte instruction:
00144ECE B9 0A 00 00 00 mov ecx,0Ah
label:
// something here
00144ED3 E2 FE loop label (0144ED3h) // decrement ecx and jump to label if not zero
Do I have to explain which is faster?
Now even if particular CPU does not implement above instruction all it requires to emulate it is a decrement followed by conditional jump if result of previous instruction happens to be zero.
So regardless of some cases that you may point out as an comment why I am wrong etc etc I EMPHASISE - YES IT IS BENEFICIAL TO LOOP DOWNWARDS if you know how, why and when.
PS. Yes I know that wise compiler (with appropriate optimisation level) will rewrite for loop (with ascending loop counter) into do..while equivalent for constant loop iterations ... (or unroll it) ...
No, that's not really true. One situation where it could be faster is when you would otherwise be calling a function to check the bounds during every iteration of a loop.
for(int i=myCollection.size(); i >= 0; i--)
{
...
}
But if it's less clear to do it that way, it's not worthwhile. In modern languages, you should use a foreach loop when possible, anyway. You specifically mention the case where you should use a foreach loop -- when you don't need the index.
The point is that when counting down you don't need to check i >= 0 separately to decrementing i. Observe:
for (i = 5; i--;) {
alert(i); // alert boxes showing 4, 3, 2, 1, 0
}
Both the comparison and decrementing i can be done in the one expression.
See other answers for why this boils down to fewer x86 instructions.
As to whether it makes a meaningful difference in your application, well I guess that depends on how many loops you have and how deeply nested they are. But to me, it's just as readable to do it this way, so I do it anyway.
Now, I think you had enough assembly lectures:) I would like to present you another reason for top->down approach.
The reason to go from the top is very simple. In the body of the loop, you might accidentally change the boundary, which might end in incorrect behaviour or even non-terminating loop.
Look at this small portion of Java code (the language does not matter I guess for this reason):
System.out.println("top->down");
int n = 999;
for (int i = n; i >= 0; i--) {
n++;
System.out.println("i = " + i + "\t n = " + n);
}
System.out.println("bottom->up");
n = 1;
for (int i = 0; i < n; i++) {
n++;
System.out.println("i = " + i + "\t n = " + n);
}
So my point is you should consider prefering going from the top down or having a constant as a boundary.
At an assembler level a loop that counts down to zero is generally slightly faster than one that counts up to a given value. If the result of a calculation is equal to zero most processors will set a zero flag. If subtracting one makes a calculation wrap around past zero this will normally change the carry flag (on some processors it will set it on others it will clear it), so the comparison with zero comes essentially for free.
This is even more true when the number of iterations is not a constant but a variable.
In trivial cases the compiler may be able to optimise the count direction of a loop automatically but in more complex cases it may be that the programmer knows that the direction of the loop is irrelevant to the overall behaviour but the compiler cannot prove that.

Resources