Array index pre-increment not working as expected in C? [duplicate] - c

This question already has answers here:
Why are these constructs using pre and post-increment undefined behavior?
(14 answers)
Order of operations for pre-increment and post-increment in a function argument? [duplicate]
(4 answers)
function parameter evaluation order
(5 answers)
Closed 4 months ago.
I'm new to C and I'm writting some simple stuff to get familiar. I just tried pre-increment on an array index and what I got was really not what I was expecting.
The code in question:
#include <stdio.h>
int main()
{
int array[5] = {1,2,3,4,5};
int i = 0;
printf("%d %d",array[i], array[++i]);
return 0;
}
which should display elements of array[0] and array[1], right?
But it doesn't, as the output is 2 2
What am I missing?

Here's the thing: You can't guarantee what order function arguments are evaluated in.
What this means is, in your printf call, there's no way to tell whether array[i] or array[++i] will be evaluated first. In your case it appears that array[++i] won, so what happens is that you end up printing array[1] twice.

That's not a good use of pre-increment.
If you want to look at array[i] and array[i+1], just do that:
printf("%d %d\n", array[i], array[i+1]);
Remember, i++ does not just take i's old value and add 1 to it. It takes i's old value, and adds 1 to it, and stores the new value back into i. When you were printing array[i] and array[i+1], did you really want to change the value of i?
If you did want to change the value of i — or, in any case, since that mention of array[i++] did change the value of i — you introduced undefined behavior. Once you've got an expression that's modifying the value of i, how do you know what array[i] will print? How do you know that array[i] won't end up using the new value of i? You probably assumed that things are evaluated from left to right, but it turns out that's not necessarily true. So weird, weird things can happen.
Here's an example that helps show how ++ works, and that's well-defined:
#include <stdio.h>
int main()
{
int array[5] = {1,2,3,4,5};
int i = 1;
array[i++] = 22;
array[++i] = 44;
for(i = 0; i < 5; i++)
printf("%d: %d\n", i, array[i]);
return 0;
}
Or, as you suggested in a comment, if you did want to modify i's value, perhaps to advance the variable i along the array, you sometimes have to take care to move the i++ into a separate statement. The simple rule is, if you're applying ++ or -- to a variable, you can only use that variable once in the same statement. So you can't have i++ + i++, since that has two i's in it and they're both modified. But you also can't have i + i++, or f(a[i], a[i++]), since those have a spot where you modify i, and a spot where you use i, and there's no way to know whether you use the old or the new value of i. (And, again, in C there's not a general left-to-right rule that helps you here.)

Related

How does the C expression s[i ++] = t[j ++] exactly work in order?

I'm new in C, stuck in understanding the expression s[i ++] = t[j ++], I don't know how it's possible that an array element gets accessed with a variable and then the variable increase itself and then the array element just accessed is again accessed with the original variable and then gets assigned to another array's element, I'm confused, I think to understand the exact process might involve some low-level knowledges, but I don't want to digress too far away, is there any way to understand it easily and clearly?
In C language, the expression i++ causes i to be incremented, but the expression i++ itself evaluates to the value i had before being incremented. So the expression s[i++] = t[j++] has the same behaviour as:
s[i] = t[j];
i = i + 1;
j = j + 1;
except that the precise order is not specified. For that last reason, the rule is that a variable should only be modified once: s[i++] = t[i++] would invoke Undefined Behaviour.
Like any other complicated-looking expression, it's easier to understand this if you break it down into parts.
The key is that innermost part (or "subexpression") i++. I assume you know what i++ does by itself, although in this example, we're hopefully going to get a deeper appreciation of what i++ is actually good for. Why would you want to "increment i, but return the old value"? What's the use of this? Well, the main use is that it's super useful for moving along an array.
Lets look at a simpler example. Suppose we have an array a that we want to store some numbers in. The most basic way is
int a[10];
a[0] = 12;
a[1] = 34;
a[2] = 5678;
Another very good way is to use a second variable like i to keep track of where we're storing:
i = 0;
a[i] = 12;
i = i + 1;
a[i] = 34;
i = i + 1;
a[i] = 5678;
i = i + 1;
I've written this out in "longhand", but of course in C, you would almost never write it this way, because the "C way" is the much more concise
i = 0;
a[i++] = 12;
a[i++] = 34;
a[i++] = 5678;
So first, make sure you understand that the "shorthand" and "longhand" forms work exactly the same way. Make sure you understand that when we say something like
a[i++] = 34;
what this means is "store 34 into the slot in array a indicated by i, and then update i to be one more than is used to be, so that it indicates the next slot."
In other words, we use an expression like a[i++] whenever we want to move along an array and do something with its elements, one by one, in order.
So far we were storing values into the array, but the idiom works just as well for fetching values out of an array. For example, this code prints those three elements, again one at a time, in order:
i = 0;
printf("%d\n", a[i++]);
printf("%d\n", a[i++]);
printf("%d\n", a[i++]);
My point is, again, that any time you see an expression like a[i++], you should think "we're moving along the array".
So now, finally, we can look at the expression you initially asked about:
s[i++] = t[j++];
Here we have two instances of the idiom. We're using i to move along the array s, and we're using j to move along the array t. We're fetching from t as we move along, and we're storing the values into s.
I don't know whether s and t are arrays of characters, or integers, or what. Also I don't know that s and t are truly arrays -- they might actually be pointers, pointing into some arrays. But I don't really have to know those things to know that the essential meaning of s[i++] = t[j++] is "copy elements from array t to array s, using j to keep where we are in t, and i to keep track of where we are in s".
[The above is an answer to your original question. The rest of this answer isn't directly related, but is essential to avoid inadvertently writing incorrect programs using ++ and --.]
As I said, the subexpression i++ and the idiom a[i++] are super useful for moving through arrays. But there are a couple things to beware of. (Actually it's just one thing, but it crops up in lots of different ways.)
Earlier I wrote the code
i = 0;
printf("%d\n", a[i++]);
printf("%d\n", a[i++]);
printf("%d\n", a[i++]);
to print the first three elements of the array a. But it prints them as bare, isolated numbers. What if I want to always see which array index each number comes from? That is, what if I'm tempted to write something like this:
i = 0;
printf("%d: %d\n", i, a[i++]); /* WRONG */
printf("%d: %d\n", i, a[i++]); /* WRONG */
printf("%d: %d\n", i, a[i++]); /* WRONG */
If I wrote this, my intent would be that I would see the obvious display
0: 12
1: 34
2: 56788
But when I actually tried it just now, I got this instead:
1: 12
2: 34
3: 5678
The numbers 12 and 34 and 5678 are right, but the indices 1, 2, and 3 are all wrong -- they're off by one! How did that happen?
And the answer is that although i++ is, as I said, "super useful", it turns out that there's a fine line between "super useful" and what's called undefined behavior.
That printf call
printf("%d: %d\n", i, a[i++]); /* WRONG */
looks fine, but it's not actually well-defined, because the compiler does not necessarily evaluate everything left-to-right, so it's not actually guaranteed that it will use the old value of i for the %d: part. The compiler might evaluate things from right to left, meaning that a[i++] will happen first, meaning that %d: will print the new value, instead -- which appears to be what happened when I tried it.
Here's another potential issue. Your original question was about
s[i++] = t[j++];
which, as we've seen, copies elements from t to s based on two possibly-different indices i and j. But what if we know we always want to copy t[1] to s[1], t[2] to s[2], t[3] to s[3], etc.? That is, what if we know that i and j will always be the same, so we don't even need separate i and j variables? How would we write that? Our first try might be
s[i++] = t[i++]; /* WRONG */
but that can't be right, because now we're incrementing i twice, and we'll probably do something totally broken like copying t[1] to s[2] and t[3] to s[4]. But if we want to only increment i once, should it be
s[i++] = t[i]; /* WRONG */
or
s[i] = t[i++]; /* WRONG */
But the answer is that neither of these will work. In expressions like these, which have i in one place and i++ in the other place, there's no way to tell whether i gets the old value or the new value. (In particular, there's no left-to-right or right-to-left rule that would tell us.)
So although expressions like i++ and a[i++] are indeed super useful, you have to be careful when you use them, to make sure you don't go over the edge and have too much happening at once, such that the evaluation order becomes undefined. Sometimes this means you have to back off, and not use the "super useful" idiom, after all. For example, a safe way to print those values would be
printf("%d: %d\n", i, a[i]); i++;
printf("%d: %d\n", i, a[i]); i++;
printf("%d: %d\n", i, a[i]); i++;
and a safe way to copy from t[1] to s[i] would be
s[i] = t[i]; i++;
You can read more in this answer about how to recognize well-defined expressions involving ++ and --, and how to avoid the undefined ones.
The evaluations of s[i++] and t[j++] are unsequenced relative to each other. Semantically, it's equivalent to:
t1 = i;
t2 = j;
s[t1] = t[t2];
i = i + 1;
j = j + 1;
with the caveat that the last three assignments can happen in any order, even simultaneously (either in parallel or interleaved)1. The compiler doesn't have to create temporaries, either - the whole thing can be evaluated as
s[i] = t[j];
i = i + 1;
j = j + 1;
Alternately, the side effects of i++ and j++ can be applied before the update to s:
t1 = j;
j = j + 1;
t2 = i;
i = i + 1;
s[t2] = t[t1];
The current values of i and j must be known before you can index into the arrays, and the value of t[j] must be known before it can be assigned to s[i], but beyond that there's no fixed order of evaluation or of the application of side effects.
This is why expressions like x = x++ or a = b++ * b++ or a[i] = i++ all invoke undefined behavior - there's no fixed order for evaluating or applying side effects, so the results can vary by compiler, compiler settings, even by the surrounding code, and the results don't have to be consistent from run to run.

Why is this program not behaving as it should?

I am a CS student preparing for a test in Introduction to CS in C language. I am solving some tests from the past and in one of them there was this question:
Look at the code below and state if there any errors. If so, correct them. Also if the program have any input write it down.
The code that came with the question is this:
#include <stdio.h>
#ifdef SIZE
#define MAX_SIZE (2*(SIZE))
#endif
int main() {
int i;
int arr[MAX_SIZE + 1];
for (i = 0; i < sizeof(arr)/sizeof(int);) {
arr[i] = i++;
printf("%d ", arr[i-1]);
}
return 0;
}
So of course I noticed that it wouldn't compile because the preprocessor would just skip the define part and MAX_SIZE would be undeclared variable.
Beside that there shouldn't be any other compiling errors, I did notice the arr[i] = i++ but that should work even if a bit unpredictably (in the test I will call it an error though), so to test my answer I copied it to my IDE and fixed it to my best of knowledge.
(I picked 5 randomly)
#include <stdio.h>
#define SIZE 5
#ifdef SIZE
#define MAX_SIZE (2*(SIZE))
#endif
int main()
{
int i;
int arr[MAX_SIZE + 1];
for (i = 0; i < sizeof(arr)/sizeof(int);)
{
arr[i] = i++;
printf("%d ", arr[i-1]);
}
return 0;
}
to my surprise the function went into an endless loop printing 8 all the time.
After some digging I figured out that for some reason when i is equal to 10 it stops increasing. Some more digging made me realize that it's only happening when SIZE is an odd number and it won't happen if the ++ is prefix (as in ++i).
If the changes mentioned above are applied, the output is as I expected, except that arr[0] doesn't get assigned.
All of that made me conclude that it is a compiler optimization error of some kind, bit I really would like it if someone could explain to me what is really happening here.
Using the value of a variable while also applying a side effect to it in the same statement is undefined behavior.
Therefore this:
arr[i] = i++;
Is an error, and invokes undefined behavior, meaning that the compiler is allowed to do anything. In your case, this happens to cause an endless loop. Using ++i instead of i++ is also an error invoking undefined behavior. The fact that it "seems to work" is just a coincidence.
In general, in order to be able to use the same variable more than once in the same statement, no side effect can be performed in that statement that alters the value of the variable.
From the C11 standard paragraph 6.5 point 2 (page 76):
If a side effect on a scalar object is unsequenced relative to either a different side effect on the same scalar object or a value computation using the value of the same scalar object, the behavior is undefined. If there are multiple allowable orderings of the subexpressions of an expression, the behavior is undefined if such an unsequenced side effect occurs in any of the orderings.
[...]
This paragraph renders undefined statement expressions such as
i = ++i + 1;
a[i++] = i;
while allowing
i = i + 1;
a[i] = i;
The problem is at this line:
arr[i] = i++;
This invokes undefined behavior. Change it to
arr[i] = i;
i++;
and it will work the way you want.
Let's take this code:
i=0;
arr[i] = i++;
According to how ++ works, it's reasonable that some array element gets the value 0. However, is it a[0] or a[1] that gets the value 0? i++ will return 0, but after it's done i will have the value 1. This is one of the many cases where C has undefined behavior, which basically means that anything might happen. The standard allows the compiler to produce whatever it wants.

understanding the behaviour of this C program [duplicate]

This question already has answers here:
Why are these constructs using pre and post-increment undefined behavior?
(14 answers)
Closed 9 years ago.
i begin Why the output produce 002?
Thanks for the edit and the answers but I'm still confused.
Here is the code:
#include <stdio.h>
int t[3],i;
int main()
{
for(i=0;i<3;i++)
t[i]=i++;
for(i=0;i<3;i++)
printf("%d",t[i]);
}
002
Can anyone help me understand why it is so?
It's because you incremented your i counter twice in your for-loop (once at t[i] = i++, once at the end statement of your for-loop, i++). That way, the for-loop is executed twice (not three times), once for i = 0 and once for i = 2. Hence your output.
To convince yourself, try adding prints as in
for(i=0;i<3;i++) {
printf("%i ", i);
t[i]=i++;
}
and see how many times your for loop gets executed and for which values of i it does.
Download a copy of the C standard (for example, google for N1570, that will give you the latest freely available version). In that document, look for "sequence point" and read what it says.
In short: Your program is rubbish and could do whatever it likes, for the reasons explained in this document. (Basically, the statement t [i] = i++; invokes undefined behaviour. This is a very special case of a general rule set in the C Standard. C++ and Objective-C have the same rule).
You are incrementing i twice in each iteration. In first iteration i = 0, so t[0] = 0 then you are doing i++ twice. In second iteration i=2 so t[2] = 2 and now loop is ending. t[1] is uninitialized.You are getting 002 because:the first 0 from t[0]the second 0 is garbage value or null value from where t[1] is stored in the memory (in your case it is 0 )and the last 2 from t[2]

Is this how postfix increment operator works [duplicate]

This question already has answers here:
Undefined behavior and sequence points
(5 answers)
Closed 9 years ago.
I am solving a problem that reads a set of text lines and prints the longest.
The problem is from K&R "The C programming language" section 1.9. The book provides a solution but I tried to solve it in my own way. The code below works but before I get it to work, I was getting a problem due to the line longestStr[i++] = tmpStr[j++] as I have previously used longestStr[i++] = tmpStr[i++] thinking that i would be incremented once the assignment was done. But that was not the case. Is this how postfix operator normally works?
#include <stdio.h>
#define MAXLINE 100
main()
{
int lineLength = 0, longestLine = 0;
int c, i, j;
char longestStr[MAXLINE];
char tmpStr[MAXLINE];
while((c = getchar()) != EOF)
{
tmpStr[lineLength++] = c;
if(c == '\n')
{
if( lineLength > longestLine)
{
longestLine = lineLength;
i = 0, j = 0;
while(i < lineLength)
{
longestStr[i++] = tmpStr[j++]; // I tried longestStr[i++] = tmpStr[i++] but it gives wrong result
}
}
lineLength = 0;
}
}
printf("Longest line is - %d long\n", longestLine-1);
for(i = 0; i < longestLine-1; i++)
printf("%c", longestStr[i]);
putchar('\n');
}
The postfix increments or decrements are operators that perform the increment or decrement after the use of the variable, such that if you wish to use a integer value to print out and add it at the same time you would use postfix.
int i = 0;
printf("%d",i++);
//prints out 0
Prefix increment/decrement however works in the opposite manner such that it performs the increment/decrement prior to the use of the variable, such that if you wish to increment/decrement a variable before printing it out, you would use prefix
int i = 0;
printf("%d",++i);
//prints out 1
The problem you have is not related to how the postfix operator works, but rather to what you intend to do in the code line which gives you a problem.
As asked in the comments, what is the meaning of the line as you wrote it initially?
longestStr[i++] = tmpStr[i++];
Because the C standard does not specify it, this line can be interpreted in several ways:
longestStr[i] = tmpStr[i+1];
i += 2;
or
longestStr[i+1] = tmpStr[i];
i += 2;
or
longestStr[i] = tmpStr[i];
i += 2;
In every case, you end up with a counter incremented twice, which will mess up your algorithm.
The correct way is to increment the counter in a separate line, or use the working solution (you are providing), which should be compiled down to the same code with any decent compiler.
Note that you should probably check for the counter not going beyond the maximum allowed line size MAXLINE (unless the problem states that this cannot happen with the input, but even in that case, it would be helpful for situations like yours, where the code wrongly increment the counter twice).
try to understand how post fix increment works.
f(i++) is equivalent to the operation 'call f on the current value of i, then increment i'
For example, when you use i++ twice, and initial value of i = 1, a(i++) = b(i++) means, a(1)=b(2) and the value of i after the operation is i=3.
If you want to eliminate one variable in what you are trying to do, you have to make sure you use increment only once. Do it like, a(i)=(b(i); i++
Whereas the postfix increment operator evaluates the original value, it is understandable to think the actual incrementating doesn't happen until sometime down the road. However this is not the case. You should conceptualize postfix as if the incrementing is happening while the original value is being returned.
The bigger issue though is that your statement increments i twice. This means i will increase by two with each iteration of your loop. Futhermore your index values will be off by one from each other, That's because if a variable changes during one part of statement evaluation, those changes become instantly visible to parts of the statement which have not yet evaluated.

Sequence points in C / function calls

I'm just learning some C, or rather, getting a sense of some of the arcane details. And I was using VTC advanced C programming in which I found that the sequence points are :
Semicolon
Comma
Logical OR / AND
Ternary operator
Function calls (Any expression used as an argument to a function call is finalized the call is made)
are all these correct ?. Regarding the last one I tried:
void foo (int bar) { printf("I received %d\n", bar); }
int main(void)
{
int i = 0;
foo(i++);
return 0;
}
And it didnt print 1, which according to what the VTC's guy said and if I undertood correctly, it should, right ?. Also, are these parens in the function call the same as the grouping parens ? (I mean, their precedence). Maybe it is because parens have higher precedence than ++ but I've also tried foo((i++)); and got the same result. Only doing foo(i = i + 1); yielded 1.
Thank you in advance. Please consider that I'm from South America so if I wasnt clear or nothing makes sense please, oh please, tell me.
Warmest regards,
Sebastian.
Your code is working like it should and it has nothing to do with sequence points. The point is that the postfix ++ operator returns the non-incremented value (but increments the variable by 1 nonetheless).
Basically:
i++ – Increment i by one and return the previous value
++i – Increment i by one and return the value after the increment
The position of the operator gives a slight hint for its semantics.
Sequence means i++ is evaluted before foo is invoked.
Consider this case (I am not printing bar!):
int i = 0;
void foo (int bar) { printf("i = %d\n", i); }
int main(void){
foo(i++);
return 0;
}
i = 1 must be printed.
C implements pass-by-value semantics. First i ++ is evaluated, and the value is kept, then i is modified (this may happen any time between the evaluation and the next sequence point), then the function is entered with the backup value as the argument.
The value passed into a function is always the same as the one you would see if using the argument expression in any other way. Other behavior would be fairly surprising, and make it difficult to refactor common subexpressions into functions.
When you do something like:
int i = 0, j;
j = i++;
the value of i is used first and then incremented. hence in your case the values of i which is 0 is used (hence passed to your function foo) and then incremented. the incremented values of i (now 1) will be available only for main as it is its local variable.
If you want to print 1 the do call foo this way:
foo(++i);
this will print 1. Rest you know, why!

Resources