does decrementing a NULL pointer lead to undefined behavior? - c

Decrementing a NULL pointer on my machine still gives a NULL pointer, I wonder if this is well defined.
char *p = NULL;
--p;

Yes, the behavior is undefined.
--p is equivalent to p = p - 1 (except that p is only evaluated once, which doesn't matter in this case).
N1570 6.5.6 paragraph 8, discussing additive operators, says:
When an expression that has integer type is added to or subtracted
from a pointer, the result has the type of the pointer operand. If the
pointer operand points to an element of an array object, and the array
is large enough, the result points to an element offset from the
original element such that the difference of the subscripts of the
resulting and original array elements equals the integer expression.
[...]
If both the pointer operand and the result point to elements
of the same array object, or one past the last element of the array
object, the evaluation shall not produce an overflow; otherwise, the
behavior is undefined.
Since your pointer value p doesn't point to an element of an array object or one past the last element of an array object, the behavior of p - 1 is undefined.
(Incidentally, I'd be surprised if your code caused p to be a null pointer -- though since the behavior is undefined the language certainly permits it. I can imagine an optimizing compiler ignoring the --p; because it knows its behavior is undefined, but I haven't seen that myself. How do you know p is null?)

As far as I see with GCC it does not generate a null pointer. Decrementing is just subtracting a number. With underflow the number just wraps around. You can see that here.
#include "stdio.h"
#include <inttypes.h>
int main()
{
char *p = NULL;
printf("%zx\n", (uintptr_t)p);
--p;
printf("%zx\n", (uintptr_t)p);
}
Output is
0
ffffffffffffffff
https://wandbox.org/permlink/gNzc38RWGSBi9tS3

Related

Is computing a pointer to uninitialized memory undefined behavior in C?

If I understand correctly, this programme has undefined behavior in C++ because the intermediate value p + 1 is a pointer to uninitialized memory:
int main () {
int x = 0;
int *p = &x;
p = p + 1 - 1;
*p = 5;
}
If void were put in main's argument list (as required by the C grammar), would it also be undefined behavior in C?
There is neither undefined behavior. You can consider a single object as an array with one element. Using the pointer arithmetic the pointer may point to element past the last element of the array so this statement
p = p + 1 - 1;
is correct.
From the C Standard (6.5.6 Additive operators)
7 For the purposes of these operators, a pointer to an object that is
not an element of an array behaves the same as a pointer to the first
element of an array of length one with the type of the object as its
element type.
and
...Moreover, if the expression P points to the last element of an array object, the expression (P)+1 points one past the last element of
the array object, and if the expression Q points one past the last
element of an array object, the expression (Q)-1 points to the last
element of the array object.
Pay attention to that
...If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array
object, the evaluation shall not produce an overflow; otherwise,
the behavior is undefined.
I think it's a bit unfortunate that the OP chose p + 1 - 1 as an example because p + 1 is not undefined behavior as shown in Vlad from Moscow's answer.
The question is more interesting if we consider p + 2 - 2. Here p + 2 is indeed undefined behavior. But does that matter if in the full expression we "undo this computation".
There is an analog for integers. E.g. given i a signed integer and if i + 2 overflows, thus being undefined behavior, is the expression i + 2 - 2 ok or undefined behavior?
The answer to both is that it is undefined behavior. If an expression is undefined behavior and the program would reach that expression in its evaluation then the whole program exhibits undefined behavior.
There is a more know case about this: computing the mid point of signed integers: (a + b) / 2 is UB if a + b overflows, even if the the final value would fit in the data type.

Incrementing NULL pointer in C

If I incrementing NULL pointer in C, then What happens?
#include <stdio.h>
typedef struct
{
int x;
int y;
int z;
}st;
int main(void)
{
st *ptr = NULL;
ptr++; //Incrementing null pointer
printf("%d\n", (int)ptr);
return 0;
}
Output:
12
Is it undefined behavior? If No, then Why?
The behaviour is always undefined. You can never own the memory at NULL.
Pointer arithmetic is only valid within arrays, and you can set a pointer to an index of the array or one location beyond the final element. Note I'm talking about setting a pointer here, not dereferencing it.
You can also set a pointer to a scalar and one past that scalar.
You can't use pointer arithmetic to traverse other memory that you own.
Yes, it causes undefined behavior.
Any operator needs a "valid" operand, a NULL is not one for the post increment operator.
Quoting C11, chapter §6.5.2.4
The result of the postfix ++ operator is the value of the operand. As a side effect, the
value of the operand object is incremented (that is, the value 1 of the appropriate type is
added to it). [....]
and related to additive operators, §6.5.6
For addition, either both operands shall have arithmetic type, or one operand shall be a
pointer to a complete object type and the other shall have integer type. (Incrementing is
equivalent to adding 1.)
then, P7,
[...] a pointer to an object that is not an element of an
array behaves the same as a pointer to the first element of an array of length one with the
type of the object as its element type.
and, P8,
If the pointer operand points to an element of
an array object, and the array is large enough, the result points to an element offset from
the original element such that the difference of the subscripts of the resulting and original
array elements equals the integer expression. In other words, if the expression P points to
the i-th element of an array object, the expressions (P)+N (equivalently, N+(P)) and
(P)-N (where N has the value n) point to, respectively, the i+n-th and i−n-th elements of
the array object, provided they exist. [....] If both the pointer
operand and the result point to elements of the same array object, or one past the last
element of the array object, the evaluation shall not produce an overflow; otherwise, the
behavior is undefined.
I think ptr will point to the second array member (as if there were) of struct st. Thats what ptr++ does. Initially pointer was at 0 or NULL. Now it is at 12 (3 * sizeof(int) = 3*4 = 12).
In your example you didn't dereferenced the pointer just printed out the address it points to. When you step a pointer, it will be incremented whith the size of it's reference type. Just try:
printf("Test: %lu", sizeof(st));
And you will get Test: 12 as output. If you would dereference it, like *ptr, it will cause an undefined behavior.

C printf. Is this valid code?

The following code outputs 14 using gcc. Why?
printf("%d", (int*)2+3); // This code is meant to be obfuscated!
int * casts 2 as an address. Adding 3 will add 3*sizeof(int) to it. On your system it seems tha sizeof(int) is equal to 4 and that's why it is giving 2 + 12 = 14.
But, you should note that the given code invokes undefined behavior for two reasons:
Performing arithmetic on a pointer that doesn't point to an array element causes undefined behavior.
7.21.6 Formatted input/output functions:
If a conversion specification is invalid, the behavior is undefined.282) If any argument is
not the correct type for the corresponding conversion specification, the behavior is
undefined.
There are two problems here.
First is that your expression (int*)2+3 results in undefined behaviour, because [almost certainly] there is no valid array at 0x2 that expands out to 0xE (14):
[C99: 6.5.6/8]: When an expression that has integer type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the pointer operand points to an element of an array object, and the array is large enough, the result points to an element offset from the original element such that the difference of the subscripts of the resulting and original array elements equals the integer expression. In other words, if the expression P points to the i-th element of an array object, the expressions (P)+N (equivalently, N+(P)) and (P)-N (where N has the value n) point to, respectively, the i+n-th and i−n-th elements of the array object, provided they exist. Moreover, if the expression P points to the last element of an array object, the expression (P)+1 points one past the last element of the array object, and if the expression Q points one past the last element of an array object, the expression (Q)-1 points to the last element of the array object. If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined. If the result points one past the last element of the array object, it shall not be used as the operand of a unary * operator that is evaluated.
The second is that you're using the %d formatting specifier but providing an object of pointer type:
[C99: 7.19.6.1/9]: If a conversion specification is invalid, the behavior is undefined. If any argument is not the correct type for the corresponding conversion specification, the behavior is
undefined.
Either one of these factors is enough to say that your program has no meaningful output whatsoever. However, if you do see "14" it's because (int*)2 results in a pointer to memory at 0x2, and applying pointer arithmetic +3 despite the undefined behaviour may add a further sizeof(int)*3 to the pointer. 0x2 + 4*3 → 0x2 + 12 → 0xE (14). The problem is that printing this pointer value through %d could even be a security vulnerability on a system where sizeof(int) != sizeof(int*).
If you didn't give this answer in your interview, you should not have the job; if you didn't give this answer in your interview but you got the job, you shouldn't take the job.
I think it's invalid code, because we pass a integer pointer (int *) to 2nd param of printf and define an integer format (%d) in 1st param.
I don't have more knowledment in c but it's what i'm thinking.

Accesing a 2D array using a single pointer

There are tons of code like this one:
#include <stdio.h>
int main(void)
{
int a[2][2] = {{0, 1}, {2, -1}};
int *p = &a[0][0];
while (*p != -1) {
printf("%d\n", *p);
p++;
}
return 0;
}
But based on this answer, the behavior is undefined.
N1570. 6.5.6 p8:
When an expression that has integer type is added to or subtracted
from a pointer, the result has the type of the pointer operand. If the
pointer operand points to an element of an array object, and the array
is large enough, the result points to an element offset from the
original element such that the difference of the subscripts of the
resulting and original array elements equals the integer expression.
In other words, if the expression P points to the i-th element of an
array object, the expressions (P)+N (equivalently, N+(P)) and (P)-N
(where N has the value n) point to, respectively, the i+n-th and
i−n-th elements of the array object, provided they exist. Moreover,
if the expression P points to the last element of an array object, the
expression (P)+1 points one past the last element of the array object,
and if the expression Q points one past the last element of an array
object, the expression (Q)-1 points to the last element of the array
object. If both the pointer operand and the result point to elements
of the same array object, or one past the last element of the array
object, the evaluation shall not produce an overflow; otherwise, the
behavior is undefined. If the result points one past the last element
of the array object, it shall not be used as the operand of a unary
* operator that is evaluated.
Can someone explain this in detail?
The array who's base address (pointer to first element) p is assigned is of type int[2]. This means the address in p can legally be dereferenced only at locations *p and *(p+1), or if you prefer subscript notation, p[0] and p[1]. Furthermore, p+2 is guaranteed to be a legally evaluated as an address, and comparable to other addresses in that sequence, but can not be dereferenced. This is the one-past address.
The code you posted violates the one-past rule by dereferencing p once it passes the last element in the array in which it is homed. That the array in which it is homed is buttressed up against another array of similar dimension is not relevant to the formal definition cited.
That said, in practice it works, but as is often said. observed behavior is not, and should never be considered, defined behavior. Just because it works doesn't make it right.
The object representation of pointers is opaque, in C. There is no prohibition against pointers having bounds information encoded. That's one possibility to keep in mind.
More practically, implementations are also able to achieve certain optimizations based on assumptions which are asserted by rules like these: Aliasing.
Then there's the protection of programmers from accidents.
Consider the following code, inside a function body:
struct {
char c;
int i;
} foo;
char * cp1 = (char *) &foo;
char * cp2 = &foo.c;
Given this, cp1 and cp2 will compare as equal, but their bounds are nonetheless different. cp1 can point to any byte of foo and even to "one past" foo, but cp2 can only point to "one past" foo.c, at most, if we wish to maintain defined behaviour.
In this example, there might be padding between the foo.c and foo.i members. While the first byte of that padding co-incides with "one past" the foo.c member, cp2 + 2 might point into the other padding. The implementation can notice this during translation and instead of producing a program, it can advise you that you might be doing something you didn't think you were doing.
By contrast, if you read the initializer for the cp1 pointer, it intuitively suggests that it can access any byte of the foo structure, including padding.
In summary, this can produce undefined behaviour during translation (a warning or error) or during program execution (by encoding bounds information); there's no difference, standard-wise: The behaviour is undefined.
You can cast your pointer into a pointer to a pointer to array to ensure the correct array semantics.
This code is indeed not defined but provided as a C extension in every compiler in common usage today.
However the correct way of doing it would be to cast the pointer into a pointer to array as so:
((int (*)[2])p)[0][0]
to get the zeroth element or say:
((int (*)[2])p)[1][1]
to get the last.
To be strict, he reason I think this is illegal is that you are breaking strict aliasing, pointers to different types may not point to the same address (variable).
In this case you are creating a pointer to an array of ints and a pointer to an int and pointing them to the same value, this is not allowed by the standard as the only type that may alias another pointer is a char * and even this is rarely used properly.

How does unary addition on C pointers work?

I know that the unary operator ++ adds one to a number. However, I find that if I do it on an int pointer, it increments by 4 (the sizeof an int on my system). Why does it do this? For example, the following code:
int main(void)
{
int *a = malloc(5 * sizeof(int));
a[0] = 42;
a[1] = 42;
a[2] = 42;
a[3] = 42;
a[4] = 42;
printf("%p\n", a);
printf("%p\n", ++a);
printf("%p\n", ++a);
return 0;
}
will return three numbers with a difference of 4 between each.
It's just the way C is - the full explanation is in the spec, Section 6.5.6 Additive operators, paragraph 8:
When an expression that has integer type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the pointer operand points to an element of an array object, and the array is large enough, the result points to an element offset from the original element such that the difference of the subscripts of the resulting and original array elements equals the integer expression. In other words, if the expression P points to the i-th element of an array object, the expressions (P)+N (equivalently, N+(P)) and (P)-N (where N has the value n) point to, respectively, the i+n-th and i−n-th elements of the array object, provided they exist. Moreover, if the expression P points to the last element of an array object, the expression (P)+1 points one past the last element of the array object, and if the expression Q points one past the last element of an array object, the expression (Q)-1 points to the last element of the array object. If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined. If the result points one past the last element of the array object, it shall not be used as the operand of a unary * operator that is evaluated.
To relate that to your use of the prefix ++ operator, you need to also read Section 6.5.3.1 Prefix increment and decrement operators, paragraph 2:
The value of the operand of the prefix ++ operator is incremented. The result is the new value of the operand after incrementation. The expression ++E is equivalent to (E+=1).
And also Section 6.5.16.2 Compound assignment, paragraph 3:
A compound assignment of the form E1 op= E2 differs from the simple assignment expression E1 = E1 op (E2) only in that the lvalue E1 is evaluated only once.
It's incrementing the pointer location by the size of int, the declared type of the pointer.
Remember, an int * is just a pointer to a location in memory, where you are saying an "int" is stored. When you ++ to the pointer, it shifts it one location (by the size of the type), in this case, it will make your value "4" higher, since sizeof(int)==4.
The reason for this is to make the following statement true:
*(ptr + n) == ptr[n]
These can be used interchangeably.
In pointer arithmetic, adding one to a pointer will add the sizeof the type which it points to.
so for a given:
TYPE * p;
Adding to p will actually increment by sizeof(TYPE). In this case the size of the int is 4.
See this related question
Because in "C" pointer arithmetic is always scaled by the size of the object being pointed to. If you think about it a bit, it turns out to be "the right thing to do".
It does this so that you don't start accessing an integer in the middle of it.
Because a pointer is not a reference ;). It's not a value, it's just an address in memory. When you check the pointer's value, it will be a number, possibly big, and unrelated to the actual value that's stored at that memory position. Say, printf("%p\n", a); prints "2000000" - this means your pointer points to the 2000000th byte in your machine's memory. It's pretty much unaware of what value it's stored there.
Now, the pointer knows what type it points to. An integer, in your case. Since an integer is 4 bytes long, when you want to jump to the next "cell" the pointer points to, it needs to be 2000004. That's exatly 1 integer farther, so a++ makes perfect sense.
BTW, if you want to get 42 (from your example), print out the value pointed to: printf("%d\n", *a);
I hope this makes sense ;)
Thats simple, cause when it comes down to pointer, in your case an integer pointer, a unary increment means INCREMENT THE MEMORY LOCATION BY ONE UNIT, where ONE UNIT = SIZE OF INTEGER .
This size of integer depends from compile to compiler, for a 32-bit and 16-bit it is 4bytes, while for a 64-bit compiler it is 8bytes.
Try doing the same program with character datatype, it will give difference of 1 byte as character takes 1 byte.
In Short, the difference of 4's that
you've come across is the difference
of SIZE OF ONE INTEGER in memory.
Hope this helped, if it didn't i'll be glad to help just let me know.
"Why does it do this?" Why would you expect it to do anything else? Incrementing a point makes it point to the next item of the type that it's a pointer to.

Resources