Assume p is a integer pointer and i is an integer:
*p++ gives an integer value corresponding to p.
i++ gives an integer value incremented by 1
Since by behavior, both the above yields integer, ++*p++ and ++i++ shouldn't have same error reported? But why ++*p++ works while ++i++ gives compiler error?
int main()
{
int a[10] = {0};
int *p = (int*)&a;
int i = 0;
// printf("%d", ++i++); -- FAILS error: lvalue required as increment operand
printf("%d\n", ++*p++ ); // Prints 1
return 0;
}
EDIT
++i++ is decomposed as following:
i++
++(result)
This is exactly where I am confused :
In the same way we can decompose ++*p++ as
*p++
++(result).
*p++ returns a value (rvalue) , not a pointer. So why the difference ?
The result of post-increment is an rvalue. You're not allowed to modify it. ++i++ attempts to modify that rvalue, which the compiler rejects.
p++ produces an rvalue, but it's of pointer type. You're not allowed to modify it, but you are allowed to dereference it. *p++ dereferences that rvalue. This gives you the value it points at, as an lvalue. The pre-increment then modifies the lvalue it points at, not the rvalue that was produced by the post-increment.
Edit: I should probably also add one more point: even if ++i++ was allowed by the compiler, the result would be undefined behavior, because it attempts to modify i twice without an intervening sequence point. In the case of ++*p++, that doesn't happen either -- the post increment modifies the pointer itself, while the pre-increment modifies what the pointer pointed at (before it was incremented). Since we're modifying two entirely different locations, the result in not undefined behavior.
If you wanted to badly enough, you could still get undefined behavior by initializing the pointer to point at itself, in which case both increments would attempt to modify the pointer. The committee didn't work very hard at preventing this, probably because only the truly pedantic would be at all likely to even think of such an insane thing.
Bottom line: in this case, the compiler is mostly trying to protect you from yourself, though you can still shoot yourself in the foot if you try hard enough.
++i++ is decomposed as following:
i++
++(result)
Problem: i++ returns a rvalue, ie a 'temporary' value, not incrementable. This is because i++ returns i before incrementing it.
Related
In C, suppose for a pointer p we do *p++ = 0. If p points to an int variable, is this defined behavior?
You can do arithmetic resulting in pointing one past the end of an "array object" per the standard, but I am unable to find a really precise definition of "array object" in the standard. I don't think in this context it means just an object explicitly defined as an array, because p=malloc(sizeof(int)); ++p; pretty clearly is intended to be defined behavior.
If a variable does not qualify as an "array object", then as far as I can tell *p++ = 0 is undefined behavior.
I am using the C23 draft, but an answer citing the C11 standard would probably answer the question too.
Yes it is well-defined. Pointer arithmetic is defined by the additive operators so that's where you need to look.
C17 6.5.6/7
For the purposes of these operators, a pointer to an object that is not an element of an array behaves
the same as a pointer to the first element of an array of length one with the type of the object as its
element type.
That is, int x; is to be regarded as equivalent to int x[1]; for the purpose of determining valid pointer arithmetic.
Given int x; int* p = &x; *p++ = 0; then it is fine to point 1 item past it but not to de-reference that item:
C17 6.5.6/8
If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation
shall not produce an overflow; otherwise, the behavior is undefined. If the result points one past the last element of the array object, it shall not be used as the operand of a unary * operator that is evaluated.
This behavior has not changed in the various revisions of the standard. It's the very same from C90 to C23.
There are two separate questions: 1. What constructs does the Standard specify that correct conforming implementations should process meaningfully, and 2. What constructs do clang and gcc actually process meaningfully. The clear intention of the Standard is to define the behavior of a pointer "one past" an array object and a pointer to the start of another array object that happens to immediately follow it. The actual behavior of clang and gcc tells another story, however.
Given the source code:
#include <stdint.h>
extern int x[],y[];
int test1(int *p)
{
y[0] = 1;
if (p == x+1)
*p = 2;
return y[0];
}
int test2(int *p)
{
y[0] = 1;
uintptr_t p1 = 3*(uintptr_t)(x+1);
uintptr_t p2 = 5*(uintptr_t)p;
if (5*p1 == 3*p2)
*p = 2;
return y[0];
}
both clang and gcc will recognize in both functions that the *p=2 assignment will only run if p happens to be equal to a one-past pointer to x, and will conclude as a consequence that it would be impossible for p to equal y. Construction of an executable example where clang and gcc would erroneously make this assumption is difficult without the ability to execute a program containing two compilation units, but examination of the generated machine code at https://godbolt.org/z/x78GMqbrv will reveal that every ret instruction is immediately preceded by mov eax,1, which loads the return value with 1.
Note that the code in test2 doesn't compare pointers, nor even compare integers that are directly formed from pointers, but the fact that clang and gcc are able to show that the numbers being compared can only be equal if the pointers happened to be equal is sufficient for test2() to, as perceived by clang or gcc, invoke UB if the function is passed a pointer to y, and y happens to equal x+1.
Is this behavior defined or not?
volatile long (*volatile ptr)[1] = (void*)NULL;
volatile long v = (long) *ptr;
printf("%ld\n", v);
It works because by dereferencing pointer to array we are receiving an array itself, then that array decaying to pointer to it's first element.
Updated demo: https://ideone.com/DqFF6T
Also, GCC even considers next code as a constant expression:
volatile long (*ptr2)[1] = (void*)NULL;
enum { this_is_constant_in_gcc = ((void*)ptr2 == (void*)*ptr2) };
printf("%d\n", this_is_constant_in_gcc);
Basically, dereferencing ptr2 at compile time;
This:
long (*ptr)[1] = NULL;
Is declaring a pointer to an "array of 1 long" (more precisely, the type is long int (*)[1]), with the initial value of NULL. Everything fine, any pointer can be NULL.
Then, this:
long v = (long) *ptr;
Is dereferencing the NULL pointer, which is undefined behavior. All bets are off, if your program does not crash, the following statement could print any value or do anything else really.
Let me make this clear one more time: undefined behavior means that anything can happen. There is no explanation as to why anything strange happens after invoking undefined behavior, nor there needs to be. The compiler could very well emit 16-bit Real Mode x86 assembly, produce a binary that deletes your entire home folder, emit the Apollo 11 Guidance Computer assembly code, or whatever else. It is not a bug. It's perfectly conforming to the standard.
The only reason your code seems to work is because GCC decides, purely out of coincidence, to do the following (Godbolt link):
mov QWORD PTR [rbp-8], 0 ; put NULL on the stack
mov rax, QWORD PTR [rbp-8]
mov QWORD PTR [rbp-16], rax ; move NULL to the variable v
Causing the NULL-dereference to never actually happen. This is most probably a consequence of the undefined behavior in dereferencing ptr ¯\_(ツ)_/¯
Funnily enough, I previously said in a comment:
dereferencing NULL is invalid and will basically always cause a segmentation fault.
But of course, since it is undefined behavior that "basically always" is wrong. I think this is the first time I ever see a null-pointer dereference not cause a SIGSEGV.
Is this behavior defined or not?
Not.
long (*ptr)[1] = NULL;
long v = (long) *ptr;
printf("%ld\n", v);
It works because by dereferencing pointer to array we are receiving an
array itself, then that array decaying to pointer to it's first
element.
No, you are confusing type with value. It is true that the expression *ptr on the second line has type long[1], but evaluating that expression produces undefined behavior regardless of the data type, and regardless of the automatic conversion that would be applied to the result if it were defined.
The relevant section of the spec is paragraph 6.5.2.3/4:
The unary * operator denotes indirection. If the operand points to a
function, the result is a function designator; if it points to an
object, the result is an lvalue designating the object. If the operand
has type ''pointer to type'', the result has type ''type''. If an
invalid value has been assigned to the pointer, the behavior of the
unary * operator is undefined.
A footnote goes on to clarify that
[...] Among the invalid values for dereferencing a pointer by the unary * operator are a null pointer [...]
It may "work" for you in an empirical sense, but from a language perspective, any output at all or none is a conforming result.
Update:
It may be interesting to note that the answer would be different for explicitly taking the address of *ptr than it is for supposing that array decay will overcome the undefinedness of the dereference. The standard provides that, as a special case, where the operand of the unary & operator is the result of a unary * operator, neither of those operators is evaluated. Provided that all relevant constraints are satisfied, the result is as if they were both omitted altogether, except that it is never an lvalue.
Thus, this is ok:
long (*ptr)[1] = NULL;
long v = (long) &*ptr;
printf("%ld\n", v);
On many implementations it will reliably print 0, but do note that C does not specify that it must be 0.
The key distinction here is that in this case, the * operation is not evaluated (per spec). The * operation in the original code is is evaluated, notwithstanding the fact that if the pointer value were valid, the resulting array would be converted right back to a pointer (of a different type, but to the same location). That does suggest an obvious shortcut that implementations may take with the original code, and they may take it, if they wish, without regard to whether ptr's value is valid because if it is invalid then they can do whatever they want.
To just answer you´re provided questions:
Is dereferencing a NULL pointer to array valid in C?
No.
Is this behavior defined or not?
It is classified as "undefined behavior", so it is not defined.
Never mind of the case, that this trick with the array, maybe will work on some implementations and it fills absolutely no needs to do so (I imply you are asking out of curiousity), it is not valid per the C standard to dereference a NULL pointer in any way and will cause "Undefined Behavior".
Anything can happen when you implement such statements into your program.
Look at the answers on this question, which explain why:
What EXACTLY is meant by "de-referencing a NULL pointer"?
One qoute from Adam Rosenfield´s answer:
A null pointer is a pointer that does not point to any valid data (but it is not the only such pointer). The C standard says that it is undefined behavior to dereference a null pointer. This means that absolutely anything could happen: the program could crash, it could continue working silently, or it could erase your hard drive (although that's rather unlikely).
Is this behavior defined or not?
The behavior is undefined because you are applying * operator to a pointer that compares equal to null pointer constant.
The following stackoverflow thread tries to explain what undefined behavior is: Undefined, unspecified and implementation-defined behavior
main()
{
char buffer[6]="hello";
char *ptr3 = buffer +8;
char *str;
for(str=buffer;str <ptr3;str++)
printf("%d \n",str);
}
Here, ptr3 is pointing out of array bounds. However, if I run this program, I am getting consecutive memory locations (for ex.1000.....1007). So, according to the C standard, a pointer pointing more than one past the array bound is explicitly undefined behavior.
My question is how the above code results in undefined behavior?
There are multiple occurrences of undefined behavior in your program.
For starters you're calling printf without the required #include <stdio.h>, and main() should be int main(void). That's not what you're asking about, but you should fix it.
char buffer[6]="hello";
This is ok.
char *ptr3 = buffer +8;
Evaluating the expression buffer +8 has undefined behavior. N1570 6.5.6 specifies the behavior of the + addition operator, and paragraph 8 says:
If both the pointer operand and the result point to elements of the
same array object, or one past the last element of the array object,
the evaluation shall not produce an overflow; otherwise, the behavior
is undefined.
Computing the pointer value by itself has undefined behavior, even if you never dereference it or access its value.
char *str;
for(str=buffer;str <ptr3;str++)
printf("%d \n",str);
You're passing a char* value to printf, but %d requires an argument of type int. Passing a value of the wrong type to printf also has undefined behavior.
If you want to print the pointer value, you need to write:
printf("%p\n", (void*)str);
which will likely print the pointer value in hexadecimal, depending on the implementation. (I've removed the unnecessary trailing space.)
When str points to buffer[5], str++ is valid; it causes str to point just past the end of buffer. (Dereferencing str after that would have undefined behavior, but you don't do that.) Incrementing str again after that has undefined behavior. The comparison str < ptr3 also has undefined behavior, since ptr3 has an invalid value -- but you already triggered undefined behavior when you initialized ptr3. so this is just icing on the proverbial cake.
Keep in mind that "undefined behavior" means that the C standard does not define the behavior. It doesn't mean that the program will crash or print an error message. In fact the worst possible consequence of undefined behavior is that the code seems to "work"; it means that you have a bug, but it's going to be difficult to diagnose and fix it.
You are seeing the address of the pointer. If you want the value, you need use the dereference (*) operator in the printf.
The other thing is, if you want see characters and not ASCII codes, you should use %c in printf.
printf("%c\n",*str);
In C, you can always add two numbers. You can always add an integer to a pointer, or subtract two pointers. You will always get an "answer": the compiler will generate code and the code will execute. That's no assurance that answer is valid, useful, or even defined.
The C standard defines the language. Within the scope of what the syntax admits, it defines what's valid -- what definitely means something -- and what's not. When you color outside those lines, the compiler may produce weird code or no code. In C, it's not the job of the compiler to anticipate every weird circumstance and arrive at a reasonable answer. The compiler writer assumes the programmer knows the rules, and is not required to verify he followed them.
There are lots of examples of valid syntax that's meaningless or undefined. In math, you cannot take the log of a negative, and you cannot divide by zero. Dividing by zero doesn't yield zero or not zero; the operation is undefined.
In your case, ptr3 has a value, duly computed, 8 larger than buffer. That's the result of some pointer arithmetic. So far, so good.
But just because you have a pointer, doesn't mean it points to anything. (void*) 0 is explicitly guaranteed not point to anything. Likewise, your ptr3 doesn't point to anything. It needn't even be a value 8 larger than buffer. Section 6.5.6 of the C standard defines the result of adding an integer to a pointer, and puts it this way:
If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined.
When you say, I am getting consecutive memory locations (for ex.1000.....1007), what you're seeing is behavior. You had to see some behavior. And that behavior is undefined. According to the standard, you could see some other behavior, such as wrapping back to 1000 or 0.
What the compiler accepts and what the standard defines are two different things.
After hunting for a related or duplicate question concerning the following to no avail (I can only do marginal justice to describe the sheer number of pointer-arithmetic and post-decrement questions tagged with C, but suffice it to say "boatloads" does a grave injustice to that result set count) I toss this in the ring in hopes of clarification or a referral to a duplicate that eluded me.
If the post-decrement operator is applied to a pointer such as below, a simple reverse-iteration of an array sequence, does the following code invoke undefined behavior?
#include <stdio.h>
#include <string.h>
int main()
{
char s[] = "some string";
const char *t = s + strlen(s);
while(t-->s)
fputc(*t, stdout);
fputc('\n', stdout);
return 0;
}
It was recently proposed to me that 6.5.6.p8 Additive operators, in conjunction with 6.5.2.p4, Postfix increment and decrement operators, specifies even performing a post-decrement upon t when it already contains the base-address of s invokes undefined behavior, regardless of whether the resulting value of t (not the t-- expression result) is evaluated or not. I simply want to know if that is indeed the case.
The cited portions of the standard were:
6.5.6 Additive Operators
If both the pointer operand and the result point to elements of the
same array object, or one past the last element of the array object,
the evaluation shall not produce an overflow; otherwise, the behavior
is undefined.
and its nearly tightly coupled relationship with...
6.5.2.4 Postfix increment and decrement operators Constraints
The operand of the postfix increment or decrement operator shall have
atomic, qualified, or unqualified real or pointer type, and shall be a
modifiable lvalue.
Semantics
The result of the postfix ++ operator is the value of the operand. As a side effect, the value of the operand object is incremented (that is, the value 1 of the appropriate type is added to it). See the discussions of additive operators and compound assignment for information on constraints, types, and conversions and the effects of operations on pointers. The value computation of the result is sequenced before the side effect of updating the stored value of the operand. With respect to an indeterminately-sequenced function call, the operation of postfix ++ is a single evaluation. Postfix ++ on an object with atomic type is a read-modify-write operation with memory_order_seq_cst memory order semantics.98)
The postfix -- operator is analogous to the postfix ++ operator, except that the value of the operand is decremented (that is, the value 1 of the appropriate type is subtracted from it).
Forward references: additive operators (6.5.6), compound assignment (6.5.16.2).
The very reason for using the post-decrement operator in the posted sample is to avoid evaluating an eventually-invalid address value against the base address of the array. For example, the code above was a refactor of the following:
#include <stdio.h>
#include <string.h>
int main()
{
char s[] = "some string";
size_t len = strlen(s);
char *t = s + len - 1;
while(t >= s)
{
fputc(*t, stdout);
t = t - 1;
}
fputc('\n', stdout);
}
Forgetting for a moment this has a non-zero-length string for s, this general algorithm clearly has issues (perhaps not as clearly to some). If s[] were instead "", then t would be assigned a value of s-1, which itself is not in the valid range of s through its one-past-address, and the evaluation for comparison against s that ensues is no good. If s has non-zero length, that addresses the initial s-1 problem, but only temporarily, as eventually this is still counting on that value (whatever it is) being valid for comparison against s to terminate the loop. It could be worse. it could have naively been:
size_t len = strlen(s) - 1;
char *t = s + len;
This has disaster written all over it if s were a zero-length string. The refactored code of this question opened with was intended to address all of these issues. But...
My paranoia may be getting to me, but it isn't paranoia if they're really all out to get you. So, per the standard (these sections, or perhaps others), does the original code (scroll to the top of this novel if you forgot what it looks like by now) indeed invoke undefined behavior or not?
I am pretty certain that the result of the post-decrement in this case is indeed undefined behaviour. The post-decrement clearly subtracts one from a pointer to the beginning of an object, so the result does not point to an element of the same array, and by the definition of pointer arithmetic (§6.5.6/8, as cited in the OP) that's undefined behaviour. The fact that you never use the resulting pointer is irrelevant.
What's wrong with:
char *t = s + strlen(s);
while (t > s) fputc(*--t, stdout);
Interesting but irrelevant fact: The implementation of reverse iterators in the standard C++ library usually holds in the reverse iterator a pointer to one past the target element. This allows the reverse iterator to be used normally without ever involving a pointer to "one before the beginning" of the container, which would be UB, as above.
int func(int **a)
{
*a = NULL;
return 1234;
}
int main()
{
int x = 0, *ptr = &x;
*ptr = func(&ptr); // <-???
printf("%d\n", x); // print '1234'
printf("%p\n", ptr); // print 'nil'
return 0;
}
Is this an example of undefined behavior or has to do with sequence points?
why the line:
*ptr = func(&ptr);
doesn't behave like:
*NULL = 1234;
EDIT: I forgot to mention that I get the output '1234' and 'nil' with gcc 4.7.
Since there is no sequence point between evaluations of the left and right hand sides of the assignment operator, it is not specified whether *ptr or func(&ptr) is evaluated first. Thus it is not guaranteed that the evaluation of *ptr is allowed, and the program has undefined behaviour.
The language does not guarantee you that the right-hand side subexpression func(&ptr) in
*ptr = func(&ptr);
is evaluated first, and the left-hand side subexpression *ptr is evaluated later (which is apparently what you expected to happen). The left-hand side can legally be evaluated first, before call to func. And this is exactly what happened in your case: *ptr got evaluated before the call, when ptr was still pointing to x. After that the assignment destination became finalized (i.e. it became known that the code will assign to x). Once it happens, changing ptr no longer changes the assignment destination.
So, the immediate behavior of your code is unspecified due to unspecified order of evaluation. However, one possible evaluation schedule leads to undefined behavior by causing a null pointer dereference. This means that in general case the behavior is undefined.
If I had to model the behavior of this code in terms of C++ language, I'd say that the process of evaluation in this case can be split into these essential steps
1a. int &lhs = *ptr; // evaluate the left-hand side
1b. int rhs = func(&ptr); // evaluate the right-hand side
2. lhs = rhs; // perform the actual assignment
(Even though C language does not have references, internally it uses the same concept of "run-time bound lvalue" to store the result of evaluation of left-hand side of assignment.) The language specification allows enough freedom to make steps 1a and 1b to occur in any order. You expected 1b to occur first, while your compiler decided to start with 1a.
This is undefined behaviour, I believe. The standard does not stipulate when the LHS of the assignment is evaluated compared to the RHS. If *ptr is evaluated after the function is called, you will be dereferencing a null pointer; if it is evaluated before the function is called, then you get sane behaviour.
The code is thoroughly disreputable. Do not try using it, or anything similar, in real code.
Note that there is a sequence point immediately before a function is called, after its arguments have been evaluated; there is also a sequence point immediately before a function returns. Thus, there are sequence points related to the evaluation of the function arguments and its return value, but...and this is crucial in this context...it still does not tell you whether *ptr is evaluated before or after the function is called. Either is possible; both are correct; the code depends on which happens, which makes it rely on undefined behaviour.
The assignment operator is not a sequence point. So, there is no guarantee, as to which side will be evaluated first. So, it is unspecified behaviour.
In one of the cases (dereferencing a NULLPTR) it could exhibit undefined behavior.
Between consecutive "sequence points" an object's value can be
modified only once by an expression.
You can see a list of what are defined sequence points in C here.
While the call of a function is a sequence point, this is bound to the evaluation of parameters (before the call), not the functions side-effects (the call itself).