Can an equality comparison of unrelated pointers evaluate to true? - c

Section 6.5.9 of the C standard regarding the == and != operators states the following:
2 One of the following shall hold:
both operands have arithmetic type;
both operands are pointers to qualified or unqualified versions of compatible types;
one operand is a pointer to an object type and the other is a pointer to a qualified or unqualified version of void; or
one operand is a pointer and the other is a null pointer constant.
...
6 Two pointers compare equal if and only if both are null pointers,
both are pointers to the same object (including a pointer to an object
and a subobject at its beginning) or function, both are pointers to
one past the last element of the same array object, or one is a
pointer to one past the end of one array object and the other is a
pointer to the start of a different array object that happens to
immediately follow the first array object in the address space.109)
7 For the purposes of these operators, a pointer to an object that is
not an element of an array behaves the same as a pointer to the first
element of an array of length one with the type of the object as its
element type.
Footnote 109:
109) Two objects may be adjacent in memory because they are adjacent
elements of a larger array or adjacent members of a structure with no
padding between them, or because the implementation chose to place
them so, even though they are unrelated. If prior invalid pointer
operations (such as accesses outside array bounds) produced undefined
behavior, subsequent comparisons also produce undefined behavior.
This would seem to indicate you could do the following:
int a;
int b;
printf("a precedes b: %d\n", (&a + 1) == &b);
printf("b precedes a: %d\n", (&b + 1) == &a);
This should be legal since we are using an address one element past the end of an array (which in this case is a single object treated as an array of size 1) without dereferencing it. More importantly, one of these two statements would be required to output 1 if one variable immediately followed the other in memory.
However, testing didn't seem to pan this out. Given the following test program:
#include <stdio.h>
struct s {
int a;
int b;
};
int main()
{
int a;
int b;
int *x = &a;
int *y = &b;
printf("sizeof(int)=%zu\n", sizeof(int));
printf("&a=%p\n", (void *)&a);
printf("&b=%p\n", (void *)&b);
printf("x=%p\n", (void *)x);
printf("y=%p\n", (void *)y);
printf("addr: a precedes b: %d\n", ((&a)+1) == &b);
printf("addr: b precedes a: %d\n", &a == ((&b)+1));
printf("pntr: a precedes b: %d\n", (x+1) == y);
printf("pntr: b precedes a: %d\n", x == (y+1));
printf(" x=%p, &a=%p\n", (void *)(x), (void *)(&a));
printf("y+1=%p, &b+1=%p\n", (void *)(y+1), (void *)(&b+1));
struct s s1;
x=&s1.a;
y=&s1.b;
printf("addr: s.a precedes s.b: %d\n", ((&s1.a)+1) == &s1.b);
printf("pntr: s.a precedes s.b: %d\n", (x+1) == y);
return 0;
}
Compiler is gcc 4.8.5, system is CentOS 7.2 x64.
With -O0, I get the following output:
sizeof(int)=4
&a=0x7ffe9498183c
&b=0x7ffe94981838
x=0x7ffe9498183c
y=0x7ffe94981838
addr: a precedes b: 0
addr: b precedes a: 0
pntr: a precedes b: 0
pntr: b precedes a: 1
x=0x7ffe9498183c, &a=0x7ffe9498183c
y+1=0x7ffe9498183c, &b+1=0x7ffe9498183c
addr: s.a precedes s.b: 1
We can see here that an int is 4 bytes and that the address of a is 4 bytes past the address of b, and that x holds the address of a while y holds the address of b. However the comparison &a == ((&b)+1) evaluates to false while the comparison (x+1) == y evaluates to true. I would expect both to be true as the addresses being compared appear identical.
With -O1, I get this:
sizeof(int)=4
&a=0x7ffca96e30ec
&b=0x7ffca96e30e8
x=0x7ffca96e30ec
y=0x7ffca96e30e8
addr: a precedes b: 0
addr: b precedes a: 0
pntr: a precedes b: 0
pntr: b precedes a: 0
x=0x7ffca96e30ec, &a=0x7ffca96e30ec
y+1=0x7ffca96e30ec, &b+1=0x7ffca96e30ec
addr: s.a precedes s.b: 1
pntr: s.a precedes s.b: 1
Now both comparisons evaluate to false even though (as before) the address being compared appear to be the same.
This seems to point to undefined behavior, but based on how I read the above passage it seems this should be allowed.
Note also that the comparison of the addresses of adjacent objects of the same type in a struct prints the expected result in all cases.
Am I misreading something here regarding what is allowed (meaning this is UB), or is this version of gcc non-conforming in this case?

Can an equality comparison of unrelated pointers evaluate to true?
Yes, but ...
int a;
int b;
printf("a precedes b: %d\n", (&a + 1) == &b);
printf("b precedes a: %d\n", (&b + 1) == &a);
There are, by my interpretation of the C standard, three possibilities:
a immediately precedes b
b immediately precedes a
neither a nor b immediately precedes the other (there could be a gap, or another object, between them)
I played around with this some time ago and concluded that GCC was performing an invalid optimization on the == operator for pointers, making it yield false even when the addresses are the same, so I submitted a bug report:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63611
That bug was closed as a duplicate of another report:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61502
The GCC maintainers who responded to these bug reports seem to be of the opinion that adjacency of two objects need not be consistent and that the comparison of their addresses might show them to be adjacent or not, within the same run of the program. As you can see from my comments on the second Bugzilla ticket, I strongly disagree. In my opinion, without consistent behavior of the == operator, the standard's requirements for adjacent objects is meaningless, and I think we have to assume that those words are not merely decorative.
Here's a simple test program:
#include <stdio.h>
int main(void) {
int x;
int y;
printf("&x = %p\n&y = %p\n", (void*)&x, (void*)&y);
if (&y == &x + 1) {
puts("y immediately follows x");
}
else if (&x == &y + 1) {
puts("x immediately follows y");
}
else {
puts("x and y are not adjacent");
}
}
When I compile it with GCC 6.2.0, the printed addresses of x and y differ by exactly 4 bytes at all optimization levels, but I get y immediately follows x only at -O0; at -O1, -O2, and -O3 I get x and y are not adjacent. I believe this is incorrect behavior, but apparently, it's not going to be fixed.
clang 3.8.1, in my opinion, behaves correctly, showing x immediately follows y at all optimization levels. Clang previously had a problem with this; I reported it:
https://bugs.llvm.org/show_bug.cgi?id=21327
and it was corrected.
I suggest not relying on comparisons of addresses of possibly adjacent objects behaving consistently.
(Note that relational operators (<, <=, >, >=) on pointers to unrelated objects have undefined behavior, but equality operators (==, !=) are generally required to behave consistently.)

int a;
int b;
printf("a precedes b: %d\n", (&a + 1) == &b);
printf("b precedes a: %d\n", (&b + 1) == &a);
is perfectly well-defined code, but probably more by luck than by judgement.
You are allowed to take the address of a scalar and set a pointer one past that address. So &a + 1 is valid, but &a + 2 is not. You are also allowed to compare the value of a pointer of the same type with the value of any other valid pointer using == and !=, although pointer arithmetic is only valid within arrays.
Your assertion that the address of a and b tells you about anything about how these are placed in memory is bunk. To be clear, you cannot "reach" b by pointer arithmetic on the address of a.
As for
struct s {
int a;
int b;
};
The standard guarantees that the address of the struct is the same as the address of a, but an arbitrary amount of padding is allowed to be inserted between a and b. Again, you can't reach the address of b by any pointer arithmetic on the address of a.

Can an equality comparison of unrelated pointers evaluate to true?
Yes. C specifies when this is true.
Two pointers compare equal if and only if ... or one is a pointer to one past the end of one array object and the other is a pointer to the start of a different array object that happens to immediately follow the first array object in the address space. C11dr §6.5.9 6
To be clear: adjacent variables in code do not need to be adjacent in memory, yet can be.
The below code demonstrates that it is possible. It uses a memory dump of a int* in addition to the conventional "%p" and (void*).
Yet OP's code and output not reflect this. Given the "compare equal if and only if" part of the above spec, IMO, OP's compilation is non-compliant. Adjacent in memory variables p,q , of the same type, either &p+1 == &q or &p == &q+1 must be true.
No opinion if the objects differ in type - OP does not ask that IAC.
void print_int_ptr(const char *prefix, int *p) {
printf("%s %p", prefix, (void *) p);
union {
int *ip;
unsigned char uc[sizeof (int*)];
} u = {p};
for (size_t i=0; i< sizeof u; i++) {
printf(" %02X", u.uc[i]);
}
printf("\n");
}
int main(void) {
int b = rand();
int a = rand();
printf("sizeof(int) = %zu\n", sizeof a);
print_int_ptr("&a =", &a);
print_int_ptr("&a + 1 =", &a + 1);
print_int_ptr("&b =", &b);
print_int_ptr("&b + 1 =", &b + 1);
printf("&a + 1 == &b: %d\n", &a + 1 == &b);
printf("&a == &b + 1: %d\n", &a == &b + 1);
return a + b;
}
Output
sizeof(int) = 4
&a = 0x28cc28 28 CC 28 00
&a + 1 = 0x28cc2c 2C CC 28 00 <-- same bit pattern
&b = 0x28cc2c 2C CC 28 00 <-- same bit pattern
&b + 1 = 0x28cc30 30 CC 28 00
&a + 1 == &b: 1 <-- compare equal
&a == &b + 1: 0

The authors of the Standard weren't trying to make it "language-lawyer-proof", and as a consequence, it is somewhat ambiguous. Such ambiguity will not generally be a problem when compiler writers make a bona fide effort to uphold the Principle of Least Astonishment, since there is a clear non-astonishing behavior, and any other behavior would have astonishing consequences. On the other hand, it does mean those compiler writers who are more interested in whether optimizations can be justified under any reading of the Standard than in whether they will be compatible with existing code can find interesting opportunities to justify incompatibility.
The Standard doesn't require that pointers' representations bear any relationship to the underlying physical architecture. It would be perfectly legitimate for a system to represent each pointer as a combination of a handle and an offset. A system which represented pointers in such fashion would be free to move the objects represented thereby around in physical storage as it saw fit. On such a system, the first byte of object #57 might follow immediately after the last byte of object #23 at one moment in time, but might be at some completely unrelated location at some other moment. I see nothing in the Standard that would prohibit such an implementation from reporting a "just past" pointer for object #23 as equal to a pointer to object #57 when the two objects happened to be adjacent, and as unequal when they happened not to be.
Further, under the as-if rule, an implementation that would be justified in moving objects around in such fashion and having a quirky equality operator, as a result, would be allowed to have a quirky equality operator whether or not it physically moved objects around in storage.
If, however, an implementation specifies how pointers are stored in RAM, and such definition would be inconsistent with the behavior described above, however, that would compel the implementation to implement the equality operator in a fashion consistent with that specification. Any compiler that wants to have a quirky equality operator must refrain from specifying a pointer-storage format that would be inconsistent with such behavior.
Further, the Standard would seem to imply that if code observes that if two pointers with defined values have identical representation, they must compare equal. Reading an object using a character type and then writing that same sequence of character-type values into another object should yield an object equivalent to the original; such equivalence is a fundamental feature of the language. If p is a pointer "just past" one object, and q is a pointer to another object, and their representations are copied to p2 and q2, respectively, then p1 must compare equal to p and q2 to q. If the decomposed character-type representations of p and q are equal, that would imply that q2 was written with the same sequence of character-type values as p1, which would, in turn, imply that all four pointers must be equal.
Consequently, while it would be allowable for a compiler to have quirky equality semantics for pointers which are never exposed to code that might observe their byte-level representation, such behavioral license would not extend to pointers which are thus exposed. If an implementation defines a directive or setting that invites compilers to have individual comparisons arbitrarily report equal or unequal when given pointers to the end of one object and the start of another whose placement would only be observable via such comparison, the implementation wouldn't have to worry about conformance in cases where pointer representations are observed. Otherwise, though, even in if there are cases where conforming implementations would be allowed to have quirky comparison semantics, that doesn't mean any quality implementations should do so unless invited unless a pointer just past the end of one object would naturally have a different representation from a pointer to the start of the next.

Related

Achieve the output in one statement

I was given this question by my school teacher. I was supposed to add in one statement in the C code and achieve this desired output.
I have tried but i am stuck. I think the main idea of this question is to establish the relationship between the int x[] and the y[] string as i increases from 0 to 6.
The code is below:
#include <stdio.h>
int main(){
int i, x[] = {-5,10,-10,-2,23,-20};
char y[20] = "goodbye";
char * p = y;
for (i=0;i<6;i++){
*(p + i) = //Fill in the one line statement here
}
y[6] = '\0';
printf("%s\n",p); //should print out "byebye"
}
As you can see the ascii value of the characters b is from 5 lesser than g and similarly for y it is 10 greater than o..so it will be (This meets the criteria of using x) (solution utilizing the values of x)
*(p+i) = (char)(*(p+i)+x[i]);
Yes one thing that is mentioned by rici is very important. *(p+i) is nothing other than p[i] - in fact it is much leaner to use and underneath it is still being calculated as *(p+i).
From standard 6.5.2.1p2 C11 N1570
A postfix expression followed by an expression in square brackets [] is a subscripted designation of an element of an array object. The definition of the subscript operator [] is that E1[E2] is identical to (*((E1)+(E2))). Because of the conversion rules that apply to the binary + operator, if E1 is an array object (equivalently, a pointer to the initial element of an array object) and E2 is an integer, E1[E2] designates the E2-th element of E1 (counting from zero).
The standard mentions this also. Being said this it would be as simple as
p[i]+=x[i];
Thoughts that came to my mind while solving.
It would be (things that came to my mind when I saw it very first time - this is establishing no relation between x and y).
*(p + i) = "byebye"[i];
String literals are basically arrays and it decays into pointer to the first element of it and then we do this *(decayed pointer + i). This will eventually assign the characters of "byebye" to the char array y.
Or something like this:- (too many hardcoded values - this does relate x and y)
*(p+i) = *(y+4+i%3);
Using a the modulus operation you can manipulate your loop to assign byebye to the 6 *char values in p.
This works because you are starting from y[4] which is 'b'.
The 6 in the for loop is your next hint. You need to iterate through bye twice. bye has 3 characters.
This gives you:
*(p + i) = y[4+(i%3)];

What means of this code in C qsort?

void qsort (void *a, size_t n, size_t es, int (*compare)(const void *, const void *)
where a is a start of array address, n is sizeof array, es is sizeof array element.
I read the source code of qsort in C that I can't understand. the code is as follows.
#define SWAPINT(a,es) swaptype = ((char*)a- (char*)0 % sizeof(long) || \
es % sizeof(long) ? 2: es == sizeof(long)? 0 : 1
I interpret this macro by,
if(((char*)a- (char*)0)% sizeof(long))==1 || es % sizeof(long)==1)
swaptype = 2;
else if(es== sizeof(long))
swaptype = 0;
else
swaptype = 1;
But I don't understand why type conversion is implemented, (char*)a.
And what means of this line?
(char*)a- (char*)0)% sizeof(long)==1
Wherever you found that code, you probably copied it incorrectly. I found some very similar code in libutil from Canu:
c.swaptype = ((char *)a - (char *)0) % sizeof(long) || \
es % sizeof(long) ? 2 : es == sizeof(long)? 0 : 1;
This code was likely illegitimally (because the terms of the copyright license are violated) copied from FreeBSD's libc:
//__FBSDID("$FreeBSD: src/lib/libc/stdlib/qsort.c,v 1.12 2002/09/10 02:04:49 wollman Exp $");
So I'm guessing you got it from a *BSD libc implementation. Indeedd FreeBSD's quicksort implementation contains the SWAPINIT macro (not SWAPINT):
#define SWAPINIT(TYPE, a, es) swaptype_ ## TYPE = \
((char *)a - (char *)0) % sizeof(TYPE) || \
es % sizeof(TYPE) ? 2 : es == sizeof(TYPE) ? 0 : 1;
After parsing, you should find that the above code is roughly the same as
condition_one = ((char *)a - (char *)0) % sizeof(long);
condition_two = es % sizeof(long);
condition_three = es == sizeof(long);
c.swaptype = (condition_one || condition_two) ? 2 : condition_three ? 0 : 1;
Note that condition_two, as a condition, is not the same as es % sizeof(long) == 1, but rather es % sizeof(long) != 0. Aside from that, your translation was correct.
The intent of these conditions seems to be as follows:
condition_one is true when a is not long-aligned.
condition_two is true when es is not a multiple of long.
condition_three is true when es is exactly long.
As a result,
swaptype == 2 is when you don't have enough guarantees about the elements to be clever about swapping,
swaptype == 1 is intended for arrays with elements that are aligned along long boundaries (note: but not necessarily aligned as longs!), and
swaptype == 0 is intended for arrays that match the previous description, that also have elements that are also long-sized.
There is explicit type conversion in this case, because a has type void*, for which type arithmetic is undefined. However, also note that ((char *)a - (char *)0) is undefined too:
When two pointers are subtracted, both shall point to elements of the same array object, or one past the last element of the array object; the result is the difference of the subscripts of the two array elements.
(C11 draft N1570, section 6.5.6, clause 9 on pages 93 and 94.)
It's not exactly spelled out in C11, but the null pointer is not part of the same array as the object pointed to by a, so the basic rules for pointer arithmetic are violated, so the behaviour is undefined.
The macros is trying to check for alignment portably in a language, C, which doesn't really allow for such a test. So we subtract the null pointer from our pointer to obtain an integer, then take modulus the size of a long. If the result is zero, the data is long-aligned and we can access as longs. If it is not, we can try some other scheme.
As remarked in the comments, the macro definition you present does not expand to valid C code because it involves computing (char*)0 % sizeof(long), where the left-hand operand of the % has type char *. That is not an integer type, but both operands of % are required to have integer type.
Additionally, the macro's expansion has unbalanced parentheses. That's not inherently wrong, but it makes that macro tricky to use. Furthermore, even where operator precedence yields a sensible result, usage of parentheses and extra whitespace can aid human interpretation of the code, at no penalty to execution speed, and negligible extra compilation cost.
So, I think the desired macro would be more like this:
#define SWAPINT(a,es) swaptype = ( \
((((char*)a - (char*)0) % sizeof(long)) || (es % sizeof(long))) \
? 2 \
: ((es == sizeof(long)) ? 0 : 1)) \
)
I'd consider instead writing the penultimate line as
: (es != sizeof(long))
to reduce the complexity of the expression at a slight cost to its comprehensibility. In any event, the intent appears to be to set swaptype to:
2 if a is not aligned on an n-byte boundary, where n is the number of bytes in a long, or if es is not an integer multiple of the size of a long; otherwise
1 if es is unequal to the size of a long; otherwise
0
That's similar, but not identical, to your interpretation. Note, however, that even this code has undefined behavior because of (char*)a - (char*)0. Evaluating that difference has defined behavior only if both pointers point into, or just past the end of, the same object, and (char *)0 does not point (in)to or just past the end of any object.
You asked specifically:
But I don't understand why type conversion is implemented, (char*)a.
That is performed because pointer arithmetic is defined in terms of the pointed-to type, so (1), a conforming program cannot perform arithmetic with a void *, and (2) the code wants the result of the subtraction to be in the same units as the result of the sizeof operator (bytes).
And what means of this line?
(char*)a- (char*)0)% sizeof(long)==1
That line does not appear in the macro you presented, and it is not a complete expression because of unbalanced parentheses. It appears to be trying to determine whether a points one past an n-byte boundary, where n is as defined above, but again, evaluating the pointer difference has undefined behavior. Note also that for an integer x, x % sizeof(long) == 1 evaluated in boolean context has different meaning than x % sizeof(long) evaluated in the same context. The latter makes more sense in the context you described.

Calculating the address: pointer + non-negative number

Pointers can only move in discrete steps.
int *p;
p = malloc(sizeof(int)*8);
Therefore, formally *(p+2) is calculated as *(p+2*sizeof(int)).
However If I actually code the above two, I get different results, which seems understandable.
*p = 123;
*(p+2) = 456;
printf("%d\n",*(p+2*(sizeof(int)))); \\0
printf("%d\n",*(p+2)); \\456
The question is, is this calculation implicit, done by the compiler at compile time?
The question is, is this calculation implicit, done by the compiler at
compile time?
Yes this is implicit, when you write ptr+n it actually advances forward n times as many bytes as size of pointee type (e.g. in case of int* - this is 4 bytes granted integer takes four bytes on your computer).
e.g.
int *x = malloc(4 * sizeof(int)); // say x points at 0x1000
x++; // x now points at 0x1004 if size of int is 4
You can read more on pointer arithmetic.
Therefore, formally *(p+2) is calculated as *(p+2*sizeof(int)).
No, *(p+2) is calculated as *(int*)((char*)p+2*sizeof(int)).
Even a brief look reveals that the only way for your statement to hold is if sizeof(int) == 1.

Why does a=(b++) have the same behavior as a=b++?

I am writing a small test app in C with GCC 4.8.4 pre-installed on my Ubuntu 14.04. And I got confused for the fact that the expression a=(b++); behaves in the same way as a=b++; does. The following simple code is used:
#include <stdint.h>
#include <stdio.h>
int main(int argc, char* argv[]){
uint8_t a1, a2, b1=10, b2=10;
a1=(b1++);
a2=b2++;
printf("a1=%u, a2=%u, b1=%u, b2=%u.\n", a1, a2, b1, b2);
}
The result after gcc compilation is a1=a2=10, while b1=b2=11. However, I expected the parentheses to have b1 incremented before its value is assigned to a1.
Namely, a1 should be 11 while a2 equals 10.
Does anyone get an idea about this issue?
However, I expected the parentheses to have b1 incremented before its value is assigned to a1
You should not have expected that: placing parentheses around an increment expression does not alter the application of its side effects.
Side effects (in this case, it means writing 11 into b1) get applied some time after retrieving the current value of b1. This could happen before or after the full assignment expression is evaluated completely. That is why a post-increment will remain a post-increment, with or without parentheses around it. If you wanted a pre-increment, place ++ before the variable:
a1 = ++b1;
Quoting from the C99:6.5.2.4:
The result of the postfix ++ operator is the value of the operand.
After the result is obtained, the value of the operand is incremented.
(That is, the value 1 of the appropriate type is added to it.) See the
discussions of additive operators and compound assignment for
information on constraints, types, and conversions and the effects of
operations on pointers. The side effect of updating the stored value
of the operand shall occur between the previous and the next sequence
point.
You can look up the C99: annex C to understand what the valid sequence points are.
In your question, just adding a parentheses doesn't change the sequence points, only the ; character does that.
Or in other words, you can view it like there's a temporary copy of b and the side-effect is original b incremented. But, until a sequence point is reached, all evaluation is done on the temporary copy of b. The temporary copy of b is then discarded, the side effect i.e. increment operation is committed to the storage,when a sequence point is reached.
Parentheses can be tricky to think about. But they do not mean, "make sure that everything inside happens first".
Suppose we have
a = b + c * d;
The higher precedence of multiplication over addition tells us that the compiler will arrange to multiply c by d, and then add the result to b. If we want the other interpretation, we can use parentheses:
a = (b + c) * d;
But suppose that we have some function calls thrown into the mix. That is, suppose we write
a = x() + y() * z();
Now, while it's clear that the return value of y() will be multiplied by the return value of z(), can we say anything about the order that x(), y(), and z() will be called in? The answer is, no, we absolutely cannot! If you're at all unsure, I invite you to try it, using x, y, and z functions like this:
int x() { printf("this is x()\n"); return 2; }
int y() { printf("this is y()\n"); return 3; }
int z() { printf("this is z()\n"); return 4; }
The first time I tried this, using the compiler in front of me, I discovered that function x() was called first, even though its result is needed last. When I changed the calling code to
a = (x() + y()) * z();
the order of the calls to x, y, and z stayed exactly the same, the compiler just arranged to combine their results differently.
Finally, it's important to realize that expressions like i++ do two things: they take i's value and add 1 to it, and then they store the new value back into i. But the store back into i doesn't necessarily happen right away, it can happen later. And the question of "when exactly does the store back into i happen?" is sort of like the question of "when does function x get called?". You can't really tell, it's up to the compiler, it usually doesn't matter, it will differ from compiler to compiler, if you really care, you're going to have to do something else to force the order.
And in any case, remember that the definition of i++ is that it gives the old value of i out to the surrounding expression. That's a pretty absolute rule, and it can not be changed just by adding some parentheses! That's not what parentheses do.
Let's go back to the previous example involving functions x, y, and z. I noticed that function x was called first. Suppose I didn't want that, suppose I wanted functions y and z to be called first. Could I achieve that by writing
x = z() + ((y() * z())?
I could write that, but it doesn't change anything. Remember, the parentheses don't mean "do everything inside first". They do cause the multiplication to happen before the addition, but the compiler was already going to do it that way anyway, based on the higher precedence of multiplication over addition.
Up above I said, "if you really care, you're going to have to do something else to force the order". What you generally have to do is use some temporary variables and some extra statements. (The technical term is "insert some sequence points.") For example, to cause y and z to get called first, I could write
c = y();
d = z();
b = x();
a = b + c * d;
In your case, if you wanted to make sure that the new value of b got assigned to a, you could write
c = b++;
a = b;
But of course that's silly -- if all you want to do is increment b and have its new value assigned to a, that's what prefix ++ is for:
a = ++b;
Your expectations are completely unfounded.
Parentheses have no direct effect on the order of execution. They don't introduce sequence points into the expression and thus they don't force any side-effects to materialize earlier than they would've materialized without parentheses.
Moreover, by definition, post-increment expression b++ evaluates to the original value of b. This requirement will remain in place regardless of how many pair of parentheses you add around b++. Even if parentheses somehow "forced" an instant increment, the language would still require (((b++))) to evaluate to the old value of b, meaning that a would still be guaranteed to receive the non-incremented value of b.
Parentheses only affects the syntactic grouping between operators and their operands. For example, in your original expression a = b++ one might immediately ask whether the ++ apples to b alone or to the result of a = b. In your case, by adding the parentheses you simply explicitly forced the ++ operator to apply to (to group with) b operand. However, according to the language syntax (and the operator precedence and associativity derived from it), ++ already applies to b, i.e. unary ++ has higher precedence than binary =. Your parentheses did not change anything, it only reiterated the grouping that was already there implicitly. Hence no change in the behavior.
Parentheses are entirely syntactic. They just group expressions and they are useful if you want to override the precedence or associativity of operators. For example, if you use parentheses here:
a = 2*(b+1);
you mean that the result of b+1 should be doubled, whereas if you omit the parentheses:
a = 2*b+1;
you mean that just b should be doubled and then the result should be incremented. The two syntax trees for these assignments are:
= =
/ \ / \
a * a +
/ \ / \
2 + * 1
/ \ / \
b 1 2 b
a = 2*(b+1); a = 2*b+1;
By using parentheses, you can therefore change the syntax tree that corresponds to your program and (of course) different syntax may correspond to different semantics.
On the other hand, in your program:
a1 = (b1++);
a2 = b2++;
parentheses are redundant because the assignment operator has lower precedence than the postfix increment (++). The two assignments are equivalent; in both cases, the corresponding syntax tree is the following:
=
/ \
a ++ (postfix)
|
b
Now that we're done with the syntax, let's go to semantics. This statement means: evaluate b++ and assign the result to a. Evaluating b++ returns the current value of b (which is 10 in your program) and, as a side effect, increments b (which now becomes 11). The returned value (that is, 10) is assigned to a. This is what you observe, and this is the correct behaviour.
However, I expected the parentheses to have b1 incremented before its value is assigned to a1.
You aren't assigning b1 to a1: you're assigning the result of the postincrement expression.
Consider the following program, which prints the value of b when executing assignment:
#include <iostream>
using namespace std;
int b;
struct verbose
{
int x;
void operator=(int y) {
cout << "b is " << b << " when operator= is executed" << endl;
x = y;
}
};
int main() {
// your code goes here
verbose a;
b = 10;
a = b++;
cout << "a is " << a.x << endl;
return 0;
}
I suspect this is undefined behavior, but nonetheless when using ideone.com I get the output shown below
b is 11 when operator= is executed
a is 10
OK, in a nutshell: b++ is a unary expression, and parentheses around it won't ever take influence on precedence of arithmetic operations, because the ++ increment operator has one of the highest (if not the highest) precedence in C. Whilst in a * (b + c), the (b + c) is a binary expression (not to be confused with binary numbering system!) because of a variable b and its addend c. So it can easily be remembered like this: parentheses put around binary, ternary, quaternary...+INF expressions will almost always have influence on precedence(*); parentheses around unary ones NEVER will - because these are "strong enough" to "withstand" grouping by parentheses.
(*)As usual, there are some exceptions to the rule, if only a handful: e. g. -> (to access members of pointers on structures) has a very strong binding despite being a binary operator. However, C beginners are very likely to take quite awhile until they can write a -> in their code, as they will need an advanced understanding of both pointers and structures.
The parentheses will not change the post-increment behaviour itself.
a1=(b1++); //b1=10
It equals to,
uint8_t mid_value = b1++; //10
a1 = (mid_value); //10
Placing ++ at the end of a statement (known as post-increment), means that the increment is to be done after the statement.
Even enclosing the variable in parenthesis doesn't change the fact that it will be incremented after the statement is done.
From learn.geekinterview.com:
In the postfix form, the increment or decrement takes place after the value is used in expression evaluation.
In prefix increment or decrement operation the increment or decrement takes place before the value is used in expression evaluation.
That's why a = (b++) and a = b++ are the same in terms of behavior.
In your case, if you want to increment b first, you should use pre-increment, ++b instead of b++ or (b++).
Change
a1 = (b1++);
to
a1 = ++b1; // b will be incremented before it is assigned to a.
To make it short:
b++ is incremented after the statement is done
But even after that, the result of b++ is put to a.
Because of that parentheses do not change the value here.

How to compare C pointers?

Recently, I wrote some code to compare pointers like this:
if(p1+len < p2)
however, some staff said that I should write like this:
if(p2-p1 > len)
to be safe.
Here,p1 and p2 are char * pointers,len is an integer.
I have no idea about that.Is that right?
EDIT1: of course,p1 and p2 pointer to the same memory object at begging.
EDIT2:just one min ago,I found the bogo of this question in my code(about 3K lines),because len is so big that p1+len can't store in 4 bytes of pointer,so p1+len < p2 is true.But it shouldn't in fact,so I think we should compare pointers like this in some situation:
if(p2 < p1 || (uint32_t)p2-p1 > (uint32_t)len)
In general, you can only safely compare pointers if they're both pointing to parts of the same memory object (or one position past the end of the object). When p1, p1 + len, and p2 all conform to this rule, both of your if-tests are equivalent, so you needn't worry. On the other hand, if only p1 and p2 are known to conform to this rule, and p1 + len might be too far past the end, only if(p2-p1 > len) is safe. (But I can't imagine that's the case for you. I assume that p1 points to the beginning of some memory-block, and p1 + len points to the position after the end of it, right?)
What they may have been thinking of is integer arithmetic: if it's possible that i1 + i2 will overflow, but you know that i3 - i1 will not, then i1 + i2 < i3 could either wrap around (if they're unsigned integers) or trigger undefined behavior (if they're signed integers) or both (if your system happens to perform wraparound for signed-integer overflow), whereas i3 - i1 > i2 will not have that problem.
Edited to add: In a comment, you write "len is a value from buff, so it may be anything". In that case, they are quite right, and p2 - p1 > len is safer, since p1 + len may not be valid.
"Undefined behavior" applies here. You cannot compare two pointers unless they both point to the same object or to the first element after the end of that object. Here is an example:
void func(int len)
{
char array[10];
char *p = &array[0], *q = &array[10];
if (p + len <= q)
puts("OK");
}
You might think about the function like this:
// if (p + len <= q)
// if (array + 0 + len <= array + 10)
// if (0 + len <= 10)
// if (len <= 10)
void func(int len)
{
if (len <= 10)
puts("OK");
}
However, the compiler knows that ptr <= q is true for all valid values of ptr, so it might optimize the function to this:
void func(int len)
{
puts("OK");
}
Much faster! But not what you intended.
Yes, there are compilers that exist in the wild that do this.
Conclusion
This is the only safe version: subtract the pointers and compare the result, don't compare the pointers.
if (p - q <= 10)
Technically, p1 and p2 must be pointers into the same array. If they are not in the same array, the behaviour is undefined.
For the addition version, the type of len can be any integer type.
For the difference version, the result of the subtraction is ptrdiff_t, but any integer type will be converted appropriately.
Within those constraints, you can write the code either way; neither is more correct. In part, it depends on what problem you're solving. If the question is 'are these two elements of the array more than len elements apart', then subtraction is appropriate. If the question is 'is p2 the same element as p1[len] (aka p1 + len)', then the addition is appropriate.
In practice, on many machines with a uniform address space, you can get away with subtracting pointers to disparate arrays, but you might get some funny effects. For example, if the pointers are pointers to some structure type, but not parts of the same array, then the difference between the pointers treated as byte addresses may not be a multiple of the structure size. This may lead to peculiar problems. If they're pointers into the same array, there won't be a problem like that — that's why the restriction is in place.
The existing answers show why if (p2-p1 > len) is better than if (p1+len < p2), but there's still a gotcha with it -- if p2 happens to point BEFORE p1 in the buffer and len is an unsigned type (such as size_t), then p2-p1 will be negative, but will be converted to a large unsigned value for comparison with the unsigned len, so the result will probably be true, which may not be what you want.
So you might actually need something like if (p1 <= p2 && p2 - p1 > len) for full safety.
As Dietrich already said, comparing unrelated pointers is dangerous, and could be considered as undefined behavior.
Given that two pointers are within the range 0 to 2GB (on a 32-bit Windows system), subtracting the 2 pointers will give you a value between -2^31 and +2^31. This is exactly the domain of a signed 32-bit integer. So in this case it does seem to make sense to subtract two pointers because the result will always be within the domain you would expect.
However, if the LargeAddressAware flag is enabled in your executable (this is Windows-specific, don't know about Unix), then your application will have an address space of 3GB (when run in 32-bit Windows with the /3G flag) or even 4GB (when run on a 64-bit Windows system).
If you then start to subtract two pointers, the result could be outside the domain of a 32-bit integer, and your comparison will fail.
I think this is one of the reasons why the address space was originally divided in 2 equal parts of 2GB, and the LargeAddressAware flag is still optional. However, my impression is that current software (your own software and the DLL's you're using) seem to be quite safe (nobody subtracts pointers anymore, isn't it?) and my own application has the LargeAddressAware flag turned on by default.
Neither variant is safe if an attacker controls your inputs
The expression p1 + len < p2 compiles down to something like p1 + sizeof(*p1)*len < p2, and the scaling with the size of the pointed-to type can overflow your pointer:
int *p1 = (int*)0xc0ffeec0ffee0000;
int *p2 = (int*)0xc0ffeec0ffee0400;
int len = 0x4000000000000000;
if(p1 + len < p2) {
printf("pwnd!\n");
}
When len is multiplied by the size of int, it overflows to 0 so the condition is evaluated as if(p1 + 0 < p2). This is obviously true, and the following code is executed with a much too high length value.
Ok, so what about p2-p1 < len. Same thing, overflow kills you:
char *p1 = (char*)0xa123456789012345;
char *p2 = (char*)0x0123456789012345;
int len = 1;
if(p2-p1 < len) {
printf("pwnd!\n");
}
In this case, the difference between the pointer is evaluated as p2-p1 = 0xa000000000000000, which is interpreted as a negative signed value. As such, it compares smaller then len, and the following code is executed with a much too low len value (or much too large pointer difference).
The only approach that I know is safe in the presence of attacker-controlled values, is to use unsigned arithmetic:
if(p1 < p2 &&
((uintptr_t)p2 - (uintptr_t)p1)/sizeof(*p1) < (uintptr_t)len
) {
printf("safe\n");
}
The p1 < p2 guarantees that p2 - p1 cannot yield a genuinely negative value. The second clause performs the actions of p2 - p1 < len while forcing use of unsigned arithmetic in a non-UB way. I.e. (uintptr_t)p2 - (uintptr_t)p1 gives exactly the count of bytes between the bigger p2 and the smaller p1, no matter the values involved.
Of course, you don't want to see such comparisons in your code unless you know that you need to defend against determined attackers. Unfortunately, it's the only way to be safe, and if you rely on either form given in the question, you open yourself up to attacks.

Resources