GCC: vectorization difference between two similar loops - c

When compiling with gcc -O3, why does the following loop not vectorize (automatically):
#define SIZE (65536)
int a[SIZE], b[SIZE], c[SIZE];
int foo () {
int i, j;
for (i=0; i<SIZE; i++){
for (j=i; j<SIZE; j++) {
a[i] = b[i] > c[j] ? b[i] : c[j];
}
}
return a[0];
}
when the following one does?
#define SIZE (65536)
int a[SIZE], b[SIZE], c[SIZE];
int foov () {
int i, j;
for (i=0; i<SIZE; i++){
for (j=i; j<SIZE; j++) {
a[i] += b[i] > c[j] ? b[i] : c[j];
}
}
return a[0];
}
The only difference is whether the result of the expression in the inner loop is assigned to a[i], or added to a[i].
For reference -ftree-vectorizer-verbose=6 gives the following output for the first (non-vectorizing) loop.
v.c:8: note: not vectorized: inner-loop count not invariant.
v.c:9: note: Unknown alignment for access: c
v.c:9: note: Alignment of access forced using peeling.
v.c:9: note: not vectorized: live stmt not supported: D.2700_5 = c[j_20];
v.c:5: note: vectorized 0 loops in function.
And the same output for the loop that vectorizes is:
v.c:8: note: not vectorized: inner-loop count not invariant.
v.c:9: note: Unknown alignment for access: c
v.c:9: note: Alignment of access forced using peeling.
v.c:9: note: vect_model_load_cost: aligned.
v.c:9: note: vect_model_load_cost: inside_cost = 1, outside_cost = 0 .
v.c:9: note: vect_model_simple_cost: inside_cost = 1, outside_cost = 1 .
v.c:9: note: vect_model_reduction_cost: inside_cost = 1, outside_cost = 6 .
v.c:9: note: cost model: prologue peel iters set to vf/2.
v.c:9: note: cost model: epilogue peel iters set to vf/2 because peeling for alignment is unknown .
v.c:9: note: Cost model analysis:
Vector inside of loop cost: 3
Vector outside of loop cost: 27
Scalar iteration cost: 3
Scalar outside cost: 7
prologue iterations: 2
epilogue iterations: 2
Calculated minimum iters for profitability: 8
v.c:9: note: Profitability threshold = 7
v.c:9: note: Profitability threshold is 7 loop iterations.
v.c:9: note: LOOP VECTORIZED.
v.c:5: note: vectorized 1 loops in function.

In the first case: the code overwrites the same memory location a[i] in each iteration. This inherently sequentializes the loop as the loop iterations are not independent.
(In reality, only the final iteration is actually needed. So the entire inner loop could be taken out.)
In the second case: GCC recognizes the loop as a reduction operation - for which it has special case handling to vectorize.
Compiler vectorization is often implemented as some sort of "pattern matching". Meaning the compiler analyzes code to see if it fits a certain pattern that it's able to vectorize. If it does, it gets vectorized. If it doesn't, then it doesn't.
This seems to be a corner case where the first loop doesn't fit any of the pre-coded patterns that GCC can handle. But the second case fits the "vectorizable reduction" pattern.
Here's the relevant part of GCC's source code that spits out that "not vectorized: live stmt not supported: " message:
http://svn.open64.net/svnroot/open64/trunk/osprey-gcc-4.2.0/gcc/tree-vect-analyze.c
if (STMT_VINFO_LIVE_P (stmt_info))
{
ok = vectorizable_reduction (stmt, NULL, NULL);
if (ok)
need_to_vectorize = true;
else
ok = vectorizable_live_operation (stmt, NULL, NULL);
if (!ok)
{
if (vect_print_dump_info (REPORT_UNVECTORIZED_LOOPS))
{
fprintf (vect_dump,
"not vectorized: live stmt not supported: ");
print_generic_expr (vect_dump, stmt, TDF_SLIM);
}
return false;
}
}
From just the line:
vectorizable_reduction (stmt, NULL, NULL);
It's clear that GCC is checking to see if it matches a "vectorizable reduction" pattern.

GCC vectorizer is probably not smart enough to vectorize the first loop. The addition case is easier to vectorize because a + 0 == a. Consider SIZE==4:
0 1 2 3 i
0 X
1 X X
2 X X X
3 X X X X
j
X denotes the combinations of i and j when a will be assigned to or increased. For the case of addition, we can compute the results of b[i] > c[j] ? b[i] : c[j] for, say, j==1 and i==0..4 and put it into vector D. Then we only need to zero D[2..3] and add resulting vector to a[0..3]. For the case of assignment, it is a little more trickier. We must not only zero D[2..3], but also zero A[0..1] and only then combine the results. I guess this is where the vectorizer is failing.

The first loop is equivalent to
#define SIZE (65536)
int a[SIZE], b[SIZE], c[SIZE];
int foo () {
int i, j;
for (i=0; i<SIZE; i++){
a[i] = b[i] > c[SIZE - 1] ? b[i] : c[SIZE - 1];
}
return a[0];
}
The problem with the original expression is that it really doesn't make that much sense, so it's not too surprising gcc isn't able to vectorize it.

First one just trivially changing a[] many times(temporary).
Second one uses the last value of a[] each time(not temporary).
Until a version of patch, you could use "volatile" variable to vectorize.
Use
int * c=malloc(sizeof(int));
to make it aligned;
v.c:9: note: Unknown alignment for access: c
Shows "c" having different storage area than b and a.
I assume "movaps"-like instructions for being "vectorized" (from SSE-AVX instruction list)
Here: http://gcc.gnu.org/projects/tree-ssa/vectorization.html#using
the 6th and 7th examples show similar difficulties.

Related

Break down the logic for a C for-loop into something I can implement in MIPS? (count number occurrences in an array)

Hi all i am converting my C code to MIPS but problem is here i couldn't make correct logic for this
for (int i=0;i<count;i++)
{
h[a[i]]++;
}
as far i make my own logic that is wrong
Assume That
a[]=$t1
b[]=$t2
li $s1,0
for:
bgt $s1,[size of array],end
lw $t3,($t1)
la $t1,4($t3)
lw $t4,($t2)
la $t2,4($t2)
sw $t3,0($t4)
add $s1,$s1,1
j for
i know this is wrong .. but in MIPS it gives me bad address error and exception 7 and 4 error
Also, how to get the array length into a MIPS register, like how C can calculate the number of elements for you without having to hard-code it?
int arr[]={1,3,5,7,9};
int n=sizeof(arr)/sizeof(arr[0]);
You need to understand what is being incremented.  One approach to gaining this understanding is decomposition: expressions can be decomposed using an approach like Three-Address Code.
Suggest then, to decompose this:
h[a[i]]++;
as follows:
int av = a[i];
h[av]++;
In this approach, we separate various operations & elements of the expression into their own lines of code, and reconnect them using an introduced variable, aka a temporary or short-lived variable.
and we can continue decomposition:
int av = a[i];
// decompose h[av]++;
int *hp = h + av;
*hp = *hp + 1;
and yet still:
int av = a[i];
int *hp = h + av;
// decompose *hp = *hp + 1;
int hv = *hp;
hv++;
*hp = hv;
That's about as decomposed as can be.  Hopefully, you can (1) translate that into MIPS assembly and (2) learn of the technique of decomposition.
Note that in the C language, when an array, a has int elements, in an expression such as a + i and h + j there is a hidden multiplication of the index (i and j) by sizeof(int), i.e. 4, called scaling that occurs before the addition — this scaling, automatic in C, must be explicitly done in assembly.
Control structures, i.e. structured statements can also be decomposed:
for (i = 0; i < count; i++) {
???
}
first decomposition:
i = 0;
while (i < count) {
???
i++;
}
and further into if-goto-label style:
i = 0;
loop1:
if (i >= count) goto endLoop1;
???
i++;
goto loop1;
endLoop1:
Notice how the loop exit condition is inverted (negated) — this because in if-goto-label, we are writing when to exit the loop rather than when to continue the loop as in C for/while, which represents the exact opposite.
(This is not always the case: a do-while condition says when to continue the loop as well, and in assembly, since that is the natural place to put the backward branch for the next iteration of the loop, the condition test for do-while and if-goto-label versions will have the same condition sense — they are not opposites — whereas a repeat-until in Pascal uses the opposite condition sense.)
Also NB: the inversion/negation/opposite of   <   is   >= .
Decomposition is simplification by transformations of logical equivalence.  As long as we maintain logical equivalence, the decomposition will run the same as the original.

Make this for-loop more efficient?

edit - -
This code will be run with optimizations off
full transparency this is a homework assignment.
I’m having some trouble figuring out how to optimize this code...
My instructor went over unrolling and splitting but neither seems to greatly reduce the time needed to execute the code. Any help would be appreciated!
for (i = 0; i < N_TIMES; i++) {
// You can change anything between this comment ...
int j;
for (j = 0; j < ARRAY_SIZE; j++) {
sum += array[j];
}
// ... and this one. But your inner loop must do the *same
// number of additions as this one does.
}
Assuming you mean same number of additions to sum at runtime (rather than same number of additions in the source code), unrolling could give you something like:
for (j = 0; j + 5 < ARRAY_SIZE; j += 5) {
sum += array[j] + array[j+1] + array[j+2] + array[j+3] + array[j+4];
}
for (; j < ARRAY_SIZE; j++) {
sum += array[j];
}
Alternatively, since you're adding the same values each time through the outer loop, you don't need to process it N_TIMES times, just do this:
for (i = 0; i < N_TIMES; i++) {
// You can change anything between this comment ...
int j;
for (j = 0; j < ARRAY_SIZE; j++) {
sum += array[j];
}
sum *= N_TIMES;
break;
// ... and this one. But your inner loop must do the *same
// number of additions as this one does.
}
This requires that the initial value of sum is zero, which is likely but there's actually nothing in your question that mandates this, so I include it as a pre-condition for this method.
Except by cheating*, this inner loop is essentially non-optimizable. Because you must fetch all the array elements and perform all the additions anyway.
The body of the loop performs:
a conditional branch on j;
a fetch of array[j];
the accumulation to a scalar variable;
the incrementation of j.
As said, 2. to 4. are inescapable.Then all you can do is reducing the number of conditional branches by loop unrolling (this turns the conditional branch in an unconditional one, at the expense of the number of iterations becoming fixed).
It is no surprise that you don't see a big difference. Modern processors are "loop aware", meaning that branch prediction is well tuned to such loops so that the cost of the branches is pretty low.
Cheating:
As others said, you can completely bypass the outer loop. This is just exploiting a flaw in the exercise statement.
As optimizations must be turned off, using inline assembly, pragmas, vector instructions or intrinsics should be banned as well (not mentioning automatic parallelization).
There is a possibility to pack two ints in a long long. If the sum doesn't overflow, you will perform two additions at a time. But is this legal ?
One might think of an access pattern that favors cache utilization. But here there is no hope as the array is fully traversed on every loop and there is no possibility of reuse of the values fetched.
First of all, unless you are explicitly compiling with -O0, your compiler has already likely optimized this loop much further than you could possibly expect.
Including unrolling, and on top of unrolling also vectorization and more. Trying to optimize this by hand is something you should never, absolutely never do. At most you will successfully make the code harder to read and understand, while most likely not even being able to match the compiler in terms of performance.
As to why there is no measurable gain? Possibly because you already hit a bottleneck, even with the "non optimized" version. For ARRAY_SIZE greater than your processors cache even the compiler optimized version is already limited by memory bandwidth.
But for completeness, let's just assume you have not hit that bottleneck, and that you actually had turned optimizations almost off (so no more than -O1), and optimize for that.
for (i = 0; i < N_TIMES; i++) {
// You can change anything between this comment ...
int j;
int tmpSum[4] = {0,0,0,0};
for (j = 0; j < ARRAY_SIZE; j+=4) {
tmpSum[0] += array[j+0];
tmpSum[1] += array[j+1];
tmpSum[2] += array[j+2];
tmpSum[3] += array[j+3];
}
sum += tmpSum[0] + tmpSum[1] + tmpSum[2] + tmpSum[3];
if(ARRAY_SIZE % 4 != 0) {
j -= 4;
for (; j < ARRAY_SIZE; j++) {
sum += array[j];
}
}
// ... and this one. But your inner loop must do the *same
// number of additions as this one does.
}
There is pretty much only one factor left which still could have reduced the performance, for a smaller array.
Not the overhead for the loop, so plain unrolling would had been pointless with a modern processor. Don't even bother, you won't beat the branch prediction.
But the latency between two instructions, until a value written by one instruction may be read again by the next instruction still applies. In this case, sum is constantly written and read all over again, and even if sum is cached in a register, this delay still applies and the processors pipeline had to wait.
The way around that, is to have multiple independent additions going on simultaneously, and finally just combine the results. This is by the way also an optimization which most modern compilers do know how to perform.
On top of that, you could now also express the first loop with vector instructions - once again also something the compiler would have done. At this point you are running into instruction latency again, so you will likely have to introduce one more set of temporaries, so that you now have two independent addition streams each using vector instructions.
Why the requirement of at least -O1? Because otherwise the compiler won't even place tmpSum in a register, or will try to express e.g. array[j+0] as a sequence of instructions for performing the addition first, rather than just using a single instruction for that. Hardly possible to optimize in that case, without using inline assembly directly.
Or if you just feel like (legit) cheating:
const int N_TIMES = 1000;
const int ARRAY_SIZE = 1024;
const int array[1024] = {1};
int sum = 0;
__attribute__((optimize("O3")))
__attribute__((optimize("unroll-loops")))
int fastSum(const int array[]) {
int j;
int tmpSum;
for (j = 0; j < ARRAY_SIZE; j++) {
tmpSum += array[j];
}
return tmpSum;
}
int main() {
int i;
for (i = 0; i < N_TIMES; i++) {
// You can change anything between this comment ...
sum += fastSum(array);
// ... and this one. But your inner loop must do the *same
// number of additions as this one does.
}
return sum;
}
The compiler will then apply pretty much all the optimizations described above.

this is a program to find prime numbers from 2 to 100 in C [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I am unable to understand how is it working. Can somebody explain me this code?
#include <stdio.h>
int main () {
/* local variable definition */
int i, j;
for(i = 2; i<100; i++) {
for(j = 2; j <= (i/j); j++) {
if(!(i%j)) break; // if factor found, not prime
}
if(j > (i/j)) printf("%d is prime", i);
}
return 0;
}
1.#include <stdio.h> is a header that defines three variable types, several macros, and various functions for performing input and output. In other words, it's basically a C-Library being referenced to add some other externally defined logic, besides the code below, like the size_t variable, which is the result of the sizeof keyword for example. That's just one example of what the the stdio.h header does but you can see more info here: https://www.tutorialspoint.com/c_standard_library/stdio_h.htm
2.int main() is an integer function (int) that uses a deprecated declaration style main(), meaning you shouldn't it anymore because it's outdated by other functions, and the main() function in particular is a function that takes an unspecified number of arguments (integers in this case) and then runs some operations with those integers.
Next, the curly braces are what contain all the logic inside of the int main() function. Then inside of it, on the line int i, j; , two local variables are declared (i and j) to be later used as placeholders for some integers that will be plugged into the function.
Below that, for(i = 2; i<100; i++) indicates there is a loop that sets the i variable to 2, then after the semi-colon i<100 means that the loop will continue to execute again and again as long as the variable i is less than 100. After yet another semi-colon, i++ means that each time that the loop runs, the variable i will increment by 1. So it starts at 2, then 3, then 4, etc, until i reaches 100 and the loop stops executing.
Next, for(j = 2; j <= (i/j); j++) is another loop inside of the first loop, but this time the loop is using the variable j as a placeholder/counter instead of the variable i (the variable used by the previous loop), which surrounds this loop starting with "for(j..." . This loop also setsj to 2 (the same way the surrounding loop set i to 2); as long as j is less than or equal to (i divided by j) the loop will continue to execute; and j will increment (increase) by one each time that the loop is run, the same way that i does in the loop that surrounds this one.
if(!(i%j)) break; // if factor found, not prime this line means that the loop will also stop executing (break) if the remainder of i divided by j does not equal zero.
if(j > (i/j)) printf("%d is prime", i); This line means that if j is greater than i divided by j that the loop will write/output the text to stdout (std out is the standard output device, a pointer to a FILE stream that represents the default output device for the application).
Lastly, the last return 0; line indicates a return from the function and the final curly brace encloses the functions logic/code. The main function also should return 0(also EXIT_SUCCESS) to identify that the program has executed successfully and -1 otherwise (also EXIT_FAILURE).
Additional Note - Loops in every programming language I've seen personally tend to have a few things in common:
i. An init counter, a value where the loop will initialize (start counting), inside the loop's parentheses and before the first semi-colon.
ii. A test counter, which will be evaluated each time that the loop continues, and if it evaluates to TRUE the loop will continue usually but if it evaluates to false then the loop will end. This is the part of the loop after the first semi-colon but before the second semi-colon.
iii. An increment/decrement counter, which increases or decreases the loop by some value each time that the loop is run. This is the part of the loop inside the parentheses, after the second semi-colon. If there is no increment counter or test counter that causes the loop to exit/break at some point, then this is known as an infinite loop. This is a very bad thing in programming because it will cause just about any computer program to crash since it will execute and consume computing resources indefinitely. Not good :)
Disclaimer: I don't actually code in C but the language has so many similarities with programming languages I do use, that I'm guessing this answer is very close if not 100% correct. Curious to hear some input from an expert C programmer though!
Your code is looping over all integers from 2 to 99, holding the actual number in i.
for(i = 2; i<100; i++)
Then, for every number i, the code is looping again over all integers from 2 to (i/j).
for(j = 2; j <= (i/j); j++)
Your loop's finishing condition is mathematically equivalent to j being smaller than the square root of i, since any larger integer j would already contain itself a smaller factor of i.
(To check this, get a paper and rewrite the inequality so hat i is the sole part of the right hand side of your condition.)
Then it checks whether or not j divides i.
(i%j)
If j is a factor of i, then i modulo j is zero and hence
if (!(i%j))
evaluates to true (since 0 is evualted as false and ! negotiates this) and you can break out of the loop since i has a divisor not being 1 or i, hence i is not a prime.
On the other hand, if the inner loop is finished, you have found a prime since it has only 1 and i as divisor.
Needles to say that this bruteforce approach is very slow for large integers (you won't crack RSA with that), but does this clarify your questions?
#include <stdio.h>
int main () {
/* local variable definition */
int i, j;
// Loop from 2 to 99; i will hold the number we are checking to
// see if it is prime
for(i = 2; i<100; i++) {
// now loop through the numbers checking each one to see if
// it is a factor of i (if it is, then i isn't prime). This
// loop stops when j^2 is greater than or equal to the number
// we are checking
for(j = 2; j <= (i/j); j++) {
// i % j (i modulus j) is 0 iff j is a factor of i. This
// if test relies on the fact that 0 is false in C (and
// all nonzero values are true)
if(!(i%j)) break; // if factor found, not prime
}
// this tests to see if we exited the above loop by failing
// the test in the for() statement, or whether we exited the
// loop via the break statement. If we made it through all
// iterations of the loop, then we found no factors, and the
// number is prime.
//
// note that a \n is missing at the end of the printf format
// string. The output will be "2 is prime3 is prime5..."
if(j > (i/j)) printf("%d is prime", i);
}
// returns from main() with a value of 0, which will result in
// the program exiting with an exit code of 0. An explicit
// exit(0) is better form here, but this is not incorrect.
return 0;
}

Dereferencing a post-incremented pointer vs incrementing pointer and then dereferencing? [duplicate]

In C, what is the difference between using ++i and i++, and which should be used in the incrementation block of a for loop?
++i will increment the value of i, and then return the incremented value.
i = 1;
j = ++i;
(i is 2, j is 2)
i++ will increment the value of i, but return the original value that i held before being incremented.
i = 1;
j = i++;
(i is 2, j is 1)
For a for loop, either works. ++i seems more common, perhaps because that is what is used in K&R.
In any case, follow the guideline "prefer ++i over i++" and you won't go wrong.
There's a couple of comments regarding the efficiency of ++i and i++. In any non-student-project compiler, there will be no performance difference. You can verify this by looking at the generated code, which will be identical.
The efficiency question is interesting... here's my attempt at an answer:
Is there a performance difference between i++ and ++i in C?
As #OnFreund notes, it's different for a C++ object, since operator++() is a function and the compiler can't know to optimize away the creation of a temporary object to hold the intermediate value.
i++ is known as post increment whereas ++i is called pre increment.
i++
i++ is post increment because it increments i's value by 1 after the operation is over.
Let’s see the following example:
int i = 1, j;
j = i++;
Here value of j = 1, but i = 2. Here the value of i will be assigned to j first, and then i will be incremented.
++i
++i is pre increment because it increments i's value by 1 before the operation.
It means j = i; will execute after i++.
Let’s see the following example:
int i = 1, j;
j = ++i;
Here the value of j = 2 but i = 2. Here the value of i will be assigned to j after the i incremention of i.
Similarly, ++i will be executed before j=i;.
For your question which should be used in the incrementation block of a for loop? the answer is, you can use any one... It doesn't matter. It will execute your for loop same number of times.
for(i=0; i<5; i++)
printf("%d ", i);
And
for(i=0; i<5; ++i)
printf("%d ", i);
Both the loops will produce the same output. I.e., 0 1 2 3 4.
It only matters where you are using it.
for(i = 0; i<5;)
printf("%d ", ++i);
In this case output will be 1 2 3 4 5.
i++: In this scenario first the value is assigned and then increment happens.
++i: In this scenario first the increment is done and then value is assigned
Below is the image visualization and also here is a nice practical video which demonstrates the same.
++i increments the value, then returns it.
i++ returns the value, and then increments it.
It's a subtle difference.
For a for loop, use ++i, as it's slightly faster. i++ will create an extra copy that just gets thrown away.
Please don't worry about the "efficiency" (speed, really) of which one is faster. We have compilers these days that take care of these things. Use whichever one makes sense to use, based on which more clearly shows your intent.
The only difference is the order of operations between the increment of the variable and the value the operator returns.
This code and its output explains the the difference:
#include<stdio.h>
int main(int argc, char* argv[])
{
unsigned int i=0, a;
printf("i initial value: %d; ", i);
a = i++;
printf("value returned by i++: %d, i after: %d\n", a, i);
i=0;
printf("i initial value: %d; ", i);
a = ++i;
printf(" value returned by ++i: %d, i after: %d\n",a, i);
}
The output is:
i initial value: 0; value returned by i++: 0, i after: 1
i initial value: 0; value returned by ++i: 1, i after: 1
So basically ++i returns the value after it is incremented, while i++ return the value before it is incremented. At the end, in both cases the i will have its value incremented.
Another example:
#include<stdio.h>
int main ()
int i=0;
int a = i++*2;
printf("i=0, i++*2=%d\n", a);
i=0;
a = ++i * 2;
printf("i=0, ++i*2=%d\n", a);
i=0;
a = (++i) * 2;
printf("i=0, (++i)*2=%d\n", a);
i=0;
a = (i++) * 2;
printf("i=0, (i++)*2=%d\n", a);
return 0;
}
Output:
i=0, i++*2=0
i=0, ++i*2=2
i=0, (++i)*2=2
i=0, (i++)*2=0
Many times there is no difference
Differences are clear when the returned value is assigned to another variable or when the increment is performed in concatenation with other operations where operations precedence is applied (i++*2 is different from ++i*2, as well as (i++)*2 and (++i)*2) in many cases they are interchangeable. A classical example is the for loop syntax:
for(int i=0; i<10; i++)
has the same effect of
for(int i=0; i<10; ++i)
Efficiency
Pre-increment is always at least as efficient as post-increment: in fact post-increment usually involves keeping a copy of the previous value around and might add a little extra code.
As others have suggested, due to compiler optimisations many times they are equally efficient, probably a for loop lies within these cases.
Rule to remember
To not make any confusion between the two operators I adopted this rule:
Associate the position of the operator ++ with respect to the variable i to the order of the ++ operation with respect to the assignment
Said in other words:
++ before i means incrementation must be carried out before assignment;
++ after i means incrementation must be carried out after assignment:
The reason ++i can be slightly faster than i++ is that i++ can require a local copy of the value of i before it gets incremented, while ++i never does. In some cases, some compilers will optimize it away if possible... but it's not always possible, and not all compilers do this.
I try not to rely too much on compilers optimizations, so I'd follow Ryan Fox's advice: when I can use both, I use ++i.
The effective result of using either in a loop is identical. In other words, the loop will do the same exact thing in both instances.
In terms of efficiency, there could be a penalty involved with choosing i++ over ++i. In terms of the language spec, using the post-increment operator should create an extra copy of the value on which the operator is acting. This could be a source of extra operations.
However, you should consider two main problems with the preceding logic.
Modern compilers are great. All good compilers are smart enough to realize that it is seeing an integer increment in a for-loop, and it will optimize both methods to the same efficient code. If using post-increment over pre-increment actually causes your program to have a slower running time, then you are using a terrible compiler.
In terms of operational time-complexity, the two methods (even if a copy is actually being performed) are equivalent. The number of instructions being performed inside of the loop should dominate the number of operations in the increment operation significantly. Therefore, in any loop of significant size, the penalty of the increment method will be massively overshadowed by the execution of the loop body. In other words, you are much better off worrying about optimizing the code in the loop rather than the increment.
In my opinion, the whole issue simply boils down to a style preference. If you think pre-increment is more readable, then use it. Personally, I prefer the post-incrment, but that is probably because it was what I was taught before I knew anything about optimization.
This is a quintessential example of premature optimization, and issues like this have the potential to distract us from serious issues in design. It is still a good question to ask, however, because there is no uniformity in usage or consensus in "best practice."
++i: is pre-increment the other is post-increment.
i++: gets the element and then increments it.
++i: increments i and then returns the element.
Example:
int i = 0;
printf("i: %d\n", i);
printf("i++: %d\n", i++);
printf("++i: %d\n", ++i);
Output:
i: 0
i++: 0
++i: 2
++i (Prefix operation): Increments and then assigns the value
(eg): int i = 5, int b = ++i
In this case, 6 is assigned to b first and then increments to 7 and so on.
i++ (Postfix operation): Assigns and then increments the value
(eg): int i = 5, int b = i++
In this case, 5 is assigned to b first and then increments to 6 and so on.
Incase of for loop: i++ is mostly used because, normally we use the starting value of i before incrementing in for loop. But depending on your program logic it may vary.
i++ and ++i
This little code may help to visualize the difference from a different angle than the already posted answers:
int i = 10, j = 10;
printf ("i is %i \n", i);
printf ("i++ is %i \n", i++);
printf ("i is %i \n\n", i);
printf ("j is %i \n", j);
printf ("++j is %i \n", ++j);
printf ("j is %i \n", j);
The outcome is:
//Remember that the values are i = 10, and j = 10
i is 10
i++ is 10 //Assigns (print out), then increments
i is 11
j is 10
++j is 11 //Increments, then assigns (print out)
j is 11
Pay attention to the before and after situations.
for loop
As for which one of them should be used in an incrementation block of a for loop, I think that the best we can do to make a decision is use a good example:
int i, j;
for (i = 0; i <= 3; i++)
printf (" > iteration #%i", i);
printf ("\n");
for (j = 0; j <= 3; ++j)
printf (" > iteration #%i", j);
The outcome is:
> iteration #0 > iteration #1 > iteration #2 > iteration #3
> iteration #0 > iteration #1 > iteration #2 > iteration #3
I don't know about you, but I don't see any difference in its usage, at least in a for loop.
The following C code fragment illustrates the difference between the pre and post increment and decrement operators:
int i;
int j;
Increment operators:
i = 1;
j = ++i; // i is now 2, j is also 2
j = i++; // i is now 3, j is 2
Shortly:
++i and i++ works same if you are not writing them in a function. If you use something like function(i++) or function(++i) you can see the difference.
function(++i) says first increment i by 1, after that put this i into the function with new value.
function(i++) says put first i into the function after that increment i by 1.
int i=4;
printf("%d\n",pow(++i,2));//it prints 25 and i is 5 now
i=4;
printf("%d",pow(i++,2));//it prints 16 i is 5 now
Pre-crement means increment on the same line. Post-increment means increment after the line executes.
int j = 0;
System.out.println(j); // 0
System.out.println(j++); // 0. post-increment. It means after this line executes j increments.
int k = 0;
System.out.println(k); // 0
System.out.println(++k); // 1. pre increment. It means it increments first and then the line executes
When it comes with OR, AND operators, it becomes more interesting.
int m = 0;
if((m == 0 || m++ == 0) && (m++ == 1)) { // False
// In the OR condition, if the first line is already true
// then the compiler doesn't check the rest. It is a
// technique of compiler optimization
System.out.println("post-increment " + m);
}
int n = 0;
if((n == 0 || n++ == 0) && (++n == 1)) { // True
System.out.println("pre-increment " + n); // 1
}
In Array
System.out.println("In Array");
int[] a = { 55, 11, 15, 20, 25 };
int ii, jj, kk = 1, mm;
ii = ++a[1]; // ii = 12. a[1] = a[1] + 1
System.out.println(a[1]); // 12
jj = a[1]++; // 12
System.out.println(a[1]); // a[1] = 13
mm = a[1]; // 13
System.out.printf("\n%d %d %d\n", ii, jj, mm); // 12, 12, 13
for (int val: a) {
System.out.print(" " + val); // 55, 13, 15, 20, 25
}
In C++ post/pre-increment of pointer variable
#include <iostream>
using namespace std;
int main() {
int x = 10;
int* p = &x;
std::cout << "address = " << p <<"\n"; // Prints the address of x
std::cout << "address = " << p <<"\n"; // Prints (the address of x) + sizeof(int)
std::cout << "address = " << &x <<"\n"; // Prints the address of x
std::cout << "address = " << ++&x << "\n"; // Error. The reference can't reassign, because it is fixed (immutable).
}
I assume you understand the difference in semantics now (though honestly I wonder why
people ask 'what does operator X mean' questions on stack overflow rather than reading,
you know, a book or web tutorial or something.
But anyway, as far as which one to use, ignore questions of performance, which are
unlikely important even in C++. This is the principle you should use when deciding
which to use:
Say what you mean in code.
If you don't need the value-before-increment in your statement, don't use that form of the operator. It's a minor issue, but unless you are working with a style guide that bans one
version in favor of the other altogether (aka a bone-headed style guide), you should use
the form that most exactly expresses what you are trying to do.
QED, use the pre-increment version:
for (int i = 0; i != X; ++i) ...
The difference can be understood by this simple C++ code below:
int i, j, k, l;
i = 1; //initialize int i with 1
j = i+1; //add 1 with i and set that as the value of j. i is still 1
k = i++; //k gets the current value of i, after that i is incremented. So here i is 2, but k is 1
l = ++i; // i is incremented first and then returned. So the value of i is 3 and so does l.
cout << i << ' ' << j << ' ' << k << ' '<< l << endl;
return 0;
The Main Difference is
i++ Post(After Increment) and
++i Pre (Before Increment)
post if i =1 the loop increments like 1,2,3,4,n
pre if i =1 the loop increments like 2,3,4,5,n
In simple words the difference between both is in the steps take a look to the image below.
Example:
int i = 1;
int j = i++;
The j result is 1
int i = 1;
int j = ++i;
The j result is 2
Note: in both cases i values is 2
You can think of the internal conversion of that as multiple statements:
// case 1
i++;
/* you can think as,
* i;
* i= i+1;
*/
// case 2
++i;
/* you can think as,
* i = i+i;
* i;
*/
a=i++ means a contains the current i value.
a=++i means a contains the incremented i value.

What is the difference between ++i and i++?

In C, what is the difference between using ++i and i++, and which should be used in the incrementation block of a for loop?
++i will increment the value of i, and then return the incremented value.
i = 1;
j = ++i;
(i is 2, j is 2)
i++ will increment the value of i, but return the original value that i held before being incremented.
i = 1;
j = i++;
(i is 2, j is 1)
For a for loop, either works. ++i seems more common, perhaps because that is what is used in K&R.
In any case, follow the guideline "prefer ++i over i++" and you won't go wrong.
There's a couple of comments regarding the efficiency of ++i and i++. In any non-student-project compiler, there will be no performance difference. You can verify this by looking at the generated code, which will be identical.
The efficiency question is interesting... here's my attempt at an answer:
Is there a performance difference between i++ and ++i in C?
As #OnFreund notes, it's different for a C++ object, since operator++() is a function and the compiler can't know to optimize away the creation of a temporary object to hold the intermediate value.
i++ is known as post increment whereas ++i is called pre increment.
i++
i++ is post increment because it increments i's value by 1 after the operation is over.
Let’s see the following example:
int i = 1, j;
j = i++;
Here value of j = 1, but i = 2. Here the value of i will be assigned to j first, and then i will be incremented.
++i
++i is pre increment because it increments i's value by 1 before the operation.
It means j = i; will execute after i++.
Let’s see the following example:
int i = 1, j;
j = ++i;
Here the value of j = 2 but i = 2. Here the value of i will be assigned to j after the i incremention of i.
Similarly, ++i will be executed before j=i;.
For your question which should be used in the incrementation block of a for loop? the answer is, you can use any one... It doesn't matter. It will execute your for loop same number of times.
for(i=0; i<5; i++)
printf("%d ", i);
And
for(i=0; i<5; ++i)
printf("%d ", i);
Both the loops will produce the same output. I.e., 0 1 2 3 4.
It only matters where you are using it.
for(i = 0; i<5;)
printf("%d ", ++i);
In this case output will be 1 2 3 4 5.
i++: In this scenario first the value is assigned and then increment happens.
++i: In this scenario first the increment is done and then value is assigned
Below is the image visualization and also here is a nice practical video which demonstrates the same.
++i increments the value, then returns it.
i++ returns the value, and then increments it.
It's a subtle difference.
For a for loop, use ++i, as it's slightly faster. i++ will create an extra copy that just gets thrown away.
Please don't worry about the "efficiency" (speed, really) of which one is faster. We have compilers these days that take care of these things. Use whichever one makes sense to use, based on which more clearly shows your intent.
The only difference is the order of operations between the increment of the variable and the value the operator returns.
This code and its output explains the the difference:
#include<stdio.h>
int main(int argc, char* argv[])
{
unsigned int i=0, a;
printf("i initial value: %d; ", i);
a = i++;
printf("value returned by i++: %d, i after: %d\n", a, i);
i=0;
printf("i initial value: %d; ", i);
a = ++i;
printf(" value returned by ++i: %d, i after: %d\n",a, i);
}
The output is:
i initial value: 0; value returned by i++: 0, i after: 1
i initial value: 0; value returned by ++i: 1, i after: 1
So basically ++i returns the value after it is incremented, while i++ return the value before it is incremented. At the end, in both cases the i will have its value incremented.
Another example:
#include<stdio.h>
int main ()
int i=0;
int a = i++*2;
printf("i=0, i++*2=%d\n", a);
i=0;
a = ++i * 2;
printf("i=0, ++i*2=%d\n", a);
i=0;
a = (++i) * 2;
printf("i=0, (++i)*2=%d\n", a);
i=0;
a = (i++) * 2;
printf("i=0, (i++)*2=%d\n", a);
return 0;
}
Output:
i=0, i++*2=0
i=0, ++i*2=2
i=0, (++i)*2=2
i=0, (i++)*2=0
Many times there is no difference
Differences are clear when the returned value is assigned to another variable or when the increment is performed in concatenation with other operations where operations precedence is applied (i++*2 is different from ++i*2, as well as (i++)*2 and (++i)*2) in many cases they are interchangeable. A classical example is the for loop syntax:
for(int i=0; i<10; i++)
has the same effect of
for(int i=0; i<10; ++i)
Efficiency
Pre-increment is always at least as efficient as post-increment: in fact post-increment usually involves keeping a copy of the previous value around and might add a little extra code.
As others have suggested, due to compiler optimisations many times they are equally efficient, probably a for loop lies within these cases.
Rule to remember
To not make any confusion between the two operators I adopted this rule:
Associate the position of the operator ++ with respect to the variable i to the order of the ++ operation with respect to the assignment
Said in other words:
++ before i means incrementation must be carried out before assignment;
++ after i means incrementation must be carried out after assignment:
The reason ++i can be slightly faster than i++ is that i++ can require a local copy of the value of i before it gets incremented, while ++i never does. In some cases, some compilers will optimize it away if possible... but it's not always possible, and not all compilers do this.
I try not to rely too much on compilers optimizations, so I'd follow Ryan Fox's advice: when I can use both, I use ++i.
The effective result of using either in a loop is identical. In other words, the loop will do the same exact thing in both instances.
In terms of efficiency, there could be a penalty involved with choosing i++ over ++i. In terms of the language spec, using the post-increment operator should create an extra copy of the value on which the operator is acting. This could be a source of extra operations.
However, you should consider two main problems with the preceding logic.
Modern compilers are great. All good compilers are smart enough to realize that it is seeing an integer increment in a for-loop, and it will optimize both methods to the same efficient code. If using post-increment over pre-increment actually causes your program to have a slower running time, then you are using a terrible compiler.
In terms of operational time-complexity, the two methods (even if a copy is actually being performed) are equivalent. The number of instructions being performed inside of the loop should dominate the number of operations in the increment operation significantly. Therefore, in any loop of significant size, the penalty of the increment method will be massively overshadowed by the execution of the loop body. In other words, you are much better off worrying about optimizing the code in the loop rather than the increment.
In my opinion, the whole issue simply boils down to a style preference. If you think pre-increment is more readable, then use it. Personally, I prefer the post-incrment, but that is probably because it was what I was taught before I knew anything about optimization.
This is a quintessential example of premature optimization, and issues like this have the potential to distract us from serious issues in design. It is still a good question to ask, however, because there is no uniformity in usage or consensus in "best practice."
++i: is pre-increment the other is post-increment.
i++: gets the element and then increments it.
++i: increments i and then returns the element.
Example:
int i = 0;
printf("i: %d\n", i);
printf("i++: %d\n", i++);
printf("++i: %d\n", ++i);
Output:
i: 0
i++: 0
++i: 2
++i (Prefix operation): Increments and then assigns the value
(eg): int i = 5, int b = ++i
In this case, 6 is assigned to b first and then increments to 7 and so on.
i++ (Postfix operation): Assigns and then increments the value
(eg): int i = 5, int b = i++
In this case, 5 is assigned to b first and then increments to 6 and so on.
Incase of for loop: i++ is mostly used because, normally we use the starting value of i before incrementing in for loop. But depending on your program logic it may vary.
i++ and ++i
This little code may help to visualize the difference from a different angle than the already posted answers:
int i = 10, j = 10;
printf ("i is %i \n", i);
printf ("i++ is %i \n", i++);
printf ("i is %i \n\n", i);
printf ("j is %i \n", j);
printf ("++j is %i \n", ++j);
printf ("j is %i \n", j);
The outcome is:
//Remember that the values are i = 10, and j = 10
i is 10
i++ is 10 //Assigns (print out), then increments
i is 11
j is 10
++j is 11 //Increments, then assigns (print out)
j is 11
Pay attention to the before and after situations.
for loop
As for which one of them should be used in an incrementation block of a for loop, I think that the best we can do to make a decision is use a good example:
int i, j;
for (i = 0; i <= 3; i++)
printf (" > iteration #%i", i);
printf ("\n");
for (j = 0; j <= 3; ++j)
printf (" > iteration #%i", j);
The outcome is:
> iteration #0 > iteration #1 > iteration #2 > iteration #3
> iteration #0 > iteration #1 > iteration #2 > iteration #3
I don't know about you, but I don't see any difference in its usage, at least in a for loop.
The following C code fragment illustrates the difference between the pre and post increment and decrement operators:
int i;
int j;
Increment operators:
i = 1;
j = ++i; // i is now 2, j is also 2
j = i++; // i is now 3, j is 2
Shortly:
++i and i++ works same if you are not writing them in a function. If you use something like function(i++) or function(++i) you can see the difference.
function(++i) says first increment i by 1, after that put this i into the function with new value.
function(i++) says put first i into the function after that increment i by 1.
int i=4;
printf("%d\n",pow(++i,2));//it prints 25 and i is 5 now
i=4;
printf("%d",pow(i++,2));//it prints 16 i is 5 now
Pre-crement means increment on the same line. Post-increment means increment after the line executes.
int j = 0;
System.out.println(j); // 0
System.out.println(j++); // 0. post-increment. It means after this line executes j increments.
int k = 0;
System.out.println(k); // 0
System.out.println(++k); // 1. pre increment. It means it increments first and then the line executes
When it comes with OR, AND operators, it becomes more interesting.
int m = 0;
if((m == 0 || m++ == 0) && (m++ == 1)) { // False
// In the OR condition, if the first line is already true
// then the compiler doesn't check the rest. It is a
// technique of compiler optimization
System.out.println("post-increment " + m);
}
int n = 0;
if((n == 0 || n++ == 0) && (++n == 1)) { // True
System.out.println("pre-increment " + n); // 1
}
In Array
System.out.println("In Array");
int[] a = { 55, 11, 15, 20, 25 };
int ii, jj, kk = 1, mm;
ii = ++a[1]; // ii = 12. a[1] = a[1] + 1
System.out.println(a[1]); // 12
jj = a[1]++; // 12
System.out.println(a[1]); // a[1] = 13
mm = a[1]; // 13
System.out.printf("\n%d %d %d\n", ii, jj, mm); // 12, 12, 13
for (int val: a) {
System.out.print(" " + val); // 55, 13, 15, 20, 25
}
In C++ post/pre-increment of pointer variable
#include <iostream>
using namespace std;
int main() {
int x = 10;
int* p = &x;
std::cout << "address = " << p <<"\n"; // Prints the address of x
std::cout << "address = " << p <<"\n"; // Prints (the address of x) + sizeof(int)
std::cout << "address = " << &x <<"\n"; // Prints the address of x
std::cout << "address = " << ++&x << "\n"; // Error. The reference can't reassign, because it is fixed (immutable).
}
I assume you understand the difference in semantics now (though honestly I wonder why
people ask 'what does operator X mean' questions on stack overflow rather than reading,
you know, a book or web tutorial or something.
But anyway, as far as which one to use, ignore questions of performance, which are
unlikely important even in C++. This is the principle you should use when deciding
which to use:
Say what you mean in code.
If you don't need the value-before-increment in your statement, don't use that form of the operator. It's a minor issue, but unless you are working with a style guide that bans one
version in favor of the other altogether (aka a bone-headed style guide), you should use
the form that most exactly expresses what you are trying to do.
QED, use the pre-increment version:
for (int i = 0; i != X; ++i) ...
The difference can be understood by this simple C++ code below:
int i, j, k, l;
i = 1; //initialize int i with 1
j = i+1; //add 1 with i and set that as the value of j. i is still 1
k = i++; //k gets the current value of i, after that i is incremented. So here i is 2, but k is 1
l = ++i; // i is incremented first and then returned. So the value of i is 3 and so does l.
cout << i << ' ' << j << ' ' << k << ' '<< l << endl;
return 0;
The Main Difference is
i++ Post(After Increment) and
++i Pre (Before Increment)
post if i =1 the loop increments like 1,2,3,4,n
pre if i =1 the loop increments like 2,3,4,5,n
In simple words the difference between both is in the steps take a look to the image below.
Example:
int i = 1;
int j = i++;
The j result is 1
int i = 1;
int j = ++i;
The j result is 2
Note: in both cases i values is 2
You can think of the internal conversion of that as multiple statements:
// case 1
i++;
/* you can think as,
* i;
* i= i+1;
*/
// case 2
++i;
/* you can think as,
* i = i+i;
* i;
*/
a=i++ means a contains the current i value.
a=++i means a contains the incremented i value.

Resources