Incrementing NULL pointer in C - c

If I incrementing NULL pointer in C, then What happens?
#include <stdio.h>
typedef struct
{
int x;
int y;
int z;
}st;
int main(void)
{
st *ptr = NULL;
ptr++; //Incrementing null pointer
printf("%d\n", (int)ptr);
return 0;
}
Output:
12
Is it undefined behavior? If No, then Why?

The behaviour is always undefined. You can never own the memory at NULL.
Pointer arithmetic is only valid within arrays, and you can set a pointer to an index of the array or one location beyond the final element. Note I'm talking about setting a pointer here, not dereferencing it.
You can also set a pointer to a scalar and one past that scalar.
You can't use pointer arithmetic to traverse other memory that you own.

Yes, it causes undefined behavior.
Any operator needs a "valid" operand, a NULL is not one for the post increment operator.
Quoting C11, chapter §6.5.2.4
The result of the postfix ++ operator is the value of the operand. As a side effect, the
value of the operand object is incremented (that is, the value 1 of the appropriate type is
added to it). [....]
and related to additive operators, §6.5.6
For addition, either both operands shall have arithmetic type, or one operand shall be a
pointer to a complete object type and the other shall have integer type. (Incrementing is
equivalent to adding 1.)
then, P7,
[...] a pointer to an object that is not an element of an
array behaves the same as a pointer to the first element of an array of length one with the
type of the object as its element type.
and, P8,
If the pointer operand points to an element of
an array object, and the array is large enough, the result points to an element offset from
the original element such that the difference of the subscripts of the resulting and original
array elements equals the integer expression. In other words, if the expression P points to
the i-th element of an array object, the expressions (P)+N (equivalently, N+(P)) and
(P)-N (where N has the value n) point to, respectively, the i+n-th and i−n-th elements of
the array object, provided they exist. [....] If both the pointer
operand and the result point to elements of the same array object, or one past the last
element of the array object, the evaluation shall not produce an overflow; otherwise, the
behavior is undefined.

I think ptr will point to the second array member (as if there were) of struct st. Thats what ptr++ does. Initially pointer was at 0 or NULL. Now it is at 12 (3 * sizeof(int) = 3*4 = 12).

In your example you didn't dereferenced the pointer just printed out the address it points to. When you step a pointer, it will be incremented whith the size of it's reference type. Just try:
printf("Test: %lu", sizeof(st));
And you will get Test: 12 as output. If you would dereference it, like *ptr, it will cause an undefined behavior.

Related

Undefined behavior with pointer arithmetic on dynamically allocated memory

I'm probably misunderstanding this, but does the c99 spec prevent any form of pointer arithmetic on dynamically allocated memory?
From 6.5.6p7...
For the purposes of these operators, a pointer to an object that is not an element of an array behaves the same as a pointer to the first element of an array of length one with the type of the object as its element type.
... a pointer to an object not in an array is treated as if it points into an array of 1 item (when using the operators + and -). Then in this snippet:
char *make_array (void) {
char *p = malloc(2*sizeof(*p));
p[0] = 1; // valid
p[1] = 2; // invalid ?
return p;
}
...the second subscript p[1] is invalid? Since p points to an object not in an array it is treated as pointing to an object in an array of one item and then from 6.5.6p8...
When an expression that has integer type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the pointer operand points to an element of an array object, and the array is large enough, the result points to an element offset from the original element such that the difference of the subscripts of the resulting and original array elements equals the integer expression. In other words, if the expression P points to the i-th element of an array object, the expressions (P)+N (equivalently, N+(P)) and (P)-N (where N has the value n) point to, respectively, the i+n-th and i−n-th elements of the array object, provided they exist. Moreover, if the expression P points to the last element of an array object, the expression (P)+1 points one past the last element of the array object, and if the expression Q points one past the last element of an array object, the expression (Q)-1 points to the last element of the array object. If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined. If the result points one past the last element of the array object, it shall not be used as the operand of a unary * operator that is evaluated.
...we have undefined behaviour since we dereference past the array bound (the one implied to have length 1).
Edit:
OK, to try to clarify more what confuses me, let's do it step-by-step:
1.) p[1] is defined to mean *(p+1).
2.) p points to an object that isn't inside of an array, so it's treated as if it points to an object inside an array of length 1 for the purpose of evalutating p+1.
3.) p+1 produces a pointer 1 past the array that p is implied to point into.
4.) *(p+1) does the invalid dereference.
From C99, 7.20.3 - Memory management functions (emphasis mine) :
The pointer returned if the allocation succeeds is suitably aligned so that it may be assigned to a pointer to any type of object and then used to access such an object or an array of such objects in the space allocated (until the space is explicitly deallocated).
This implies that the allocated memory can be accessed as an array of char (as per your example), and so pointer arithmetic is well defined.

Is "int *ptr = *( ( &a ) + 1 );" where "a" is int[5] well-defined by the Standard?

Based on this Question ( strange output issue in c) there was an Answer ( provided by #Lundin ) about this line:
int *ptr = (int*)(&a+1);
where he said:
the cast (int*) was hiding this bug.
So I came with the following:
#include <stdio.h>
int main( void ){
int a[5] = {1,2,3,4,5};
int *ptr = *( ( &a ) + 1 );
printf("%d", *(ptr-1) );
}
I would like to know if this:
int *ptr = *( ( &a ) + 1 );
Is well-defined by the Standard?
EDIT:
At some point #chux pointed to §6.3.2.3.7 which is:
A pointer to an object type may be converted to a pointer to a different object type. If the
resulting pointer is not correctly aligned68) for the referenced type, the behavior is
undefined. Otherwise, when converted back again, the result shall compare equal to the
original pointer. When a pointer to an object is converted to a pointer to a character type,
the result points to the lowest addressed byte of the object. Successive increments of the
result, up to the size of the object, yield pointers to the remaining bytes of the object.
But I am not sure if I understand it right.
This expression invokes undefined behavior as a result of the dereference operator *:
int *ptr = *( ( &a ) + 1 );
First, let's start with ( &a ) + 1. This part is valid. &a has type int (*)[5], i.e. a pointer to an array of size 5. Performing pointer arithmetic by adding 1 is valid, even though a is not an element of an array.
In section 6.5.6 of the C standard detailing Additive Operators, paragraph 7 states:
For the purposes of these operators, a pointer to an object that is
not an element of an array behaves the same as a pointer to the first
element of an array of length one with the type of the object as its
element type.
It's also allowed to create a pointer that points to one element past the end of an array. So &a + 1 is allowed.
The problem is when we dereference this expression. Paragraph 8 states:
When an expression that has integer type is added to or subtracted
from a pointer, the result has the type of the pointer operand. If the
pointer operand points to an element of an array object, and the array
is large enough, the result points to an element offset from the
original element such that the difference of the subscripts of the
resulting and original array elements equals the integer expression.
In other words, if the expression P points to the i-th element of an
array object, the expressions (P)+N (equivalently, N+(P)) and (P)-N
(where N has the value n) point to, respectively, the i+n-th and
i−n-th elements of the array object, provided they exist. Moreover, if
the expression P points to the last element of an array object, the
expression (P)+1 points one past the last element of the array object,
and if the expression q points one past the last element of an array
object, the expression (Q)-1 points to the last element of the array
object. If both the pointer operand and the result point to elements
of the same array object, or one past the last element of the array
object, the evaluation shall not produce an overflow; otherwise, the
behavior is undefined. If the result points one past the last element
of the array object, it shall not be used as the operand of a unary *
operator that is evaluated.
Since dereferencing a pointer to one past the end of an array is not allowed, the beahvior is undefined.
Going back to the expression in the referenced post:
int *ptr = (int*)(&a+1);
printf("%d %d", *(a+1), *(ptr-1));
This is also undefined behavior but for a different reason. In this case, a int (*)[5] is converted to int * and the converted value is subsequently used. The only case where using such a converted value is legal is when converting an object pointer to a pointer to a character type, e.g. char * or unsigned char * and subsequently dereferenced to read the bytes of the object's representation.
EDIT:
It seems the two lines above are actually well defined. At the time the pointer dereference *(ptr-1) occurs, the object being accessed has effective type int, which matches the dereferenced type of ptr-1. Casting the pointer value &a+1 from int (*)[5] to int * is valid, and performing pointer arithmetic on the casted pointer value is also valid because it points either inside of a or one element past it.
*( ( &a ) + 1 ) is UB due to
... If the result points one past the last element of the array object, it
shall not be used as the operand of a unary * operator that is evaluated. C11 §6.5.6 8
( &a ) + 1 points to "one past". Using * on that goes against "shall not".
int a[5] = {1,2,3,4,5};
int *ptr = *( ( &a ) + 1 );
Even if a was int a this applies due to
For the purposes of these operators, a pointer to an object that is not an element of an array behaves the same as a pointer to the first element of an array of length one with the type of the object as its element type. §6.5.6 7
int *ptr = *( ( &a ) + 1 ); is invoked undefined behaviour.
C11 - §6.5.6 "Additive operators" (P8) :
When an expression that has integer type is added to or subtracted from a pointer, the
result has the type of the pointer operand. If the pointer operand points to an element of
an array object, and the array is large enough, the result points to an element offset from
the original element such that the difference of the subscripts of the resulting and original
array elements equals the integer expression. In other words, if the expression P points to
the i-th element of an array object, the expressions (P)+N (equivalently, N+(P)) and
(P)-N (where N has the value n) point to, respectively, the i+n-th and i−n-th elements of
the array object, provided they exist. Moreover, if the expression P points to the last
element of an array object, the expression (P)+1 points one past the last element of the
array object, and if the expression Q points one past the last element of an array object,
the expression (Q)-1 points to the last element of the array object.[...]

Trying to understand: invalid operands to to binary expression, C

Last line yields "invalid operands to binary expression". Trying to understand why. Does it mean that "p2-p1" is an invalid operand to the binary expression "-" that lies to the right of p3? Any rule I can follow here? Confusing to me because "3-2-1" integers are valid.
int array[3] = {1,2,3};
int* p1 = &array[0];
int* p2 = &array[1];
int* p3 = &array[2];
p3-p2-p1;
You are doing address arithmetic. Given operator precedence, it is evaluating p1-p2-p3 as (p1-p2)-p3. p1-p2 yields not an address but an integer. Then you are attempting to subtract an address from an integer, which isn't valid. You could do p1-(p2-p3), then it's taking p2-p3, yielding an integer, and subtracting that as an integer offset from an address (p1), which will compile. However, [Thanks to #EOF for this reference in his comment] such subtraction (of integer from a pointer) would only be valid if it points somewhere within the allocation for p1. It's subject to the C11 standard described specifically in section 6.5.6, excerpted below:
When an expression that has integer type is added to or subtracted
from a pointer, the result has the type of the pointer operand. If the
pointer operand points to an element of an array object, and the array
is large enough, the result points to an element offset from the
original element such that the difference of the subscripts of the
resulting and original array elements equals the integer expression.
In other words, if the expression P points to the i-th element of an
array object, the expressions (P)+N (equivalently, N+(P)) and (P)-N
(where N has the value n) point to, respectively, the i+n-th and
i−n-th elements of the array object, provided they exist. Moreover, if
the expression P points to the last element of an array object, the
expression (P)+1 points one past the last element of the array object,
and if the expression Q points one past the last element of an array
object, the expression (Q)-1 points to the last element of the array
object. If both the pointer operand and the result point to elements
of the same array object, or one past the last element of the array
object, the evaluation shall not produce an overflow; otherwise, the
behavior is undefined. If the result points one past the last element
of the array object, it shall not be used as the operand of a unary *
operator that is evaluated.
In your code, p1, p2 and p3 are all pointers to integers, not integers.
To get what you want, you probably want:
*p3 - *p2 - *p1;
where the * operator is the dereference operator. It dereferences pointers, so in this case *p3 etc are of type int. You can think of it as the inverse of the & address-of operator.

Accessing variables in allocated memory

Lets say I want to allocate memory for 3 integers:
int *pn = malloc(3 * sizeof(*pn));
Now to assign to them values I do:
pn[0] = 5550;
pn[1] = 11;
pn[2] = 70000;
To access 2nd value I do:
pn[1]
But the [n] operator is just a shortcut for *(a+n). Then it would mean that i access first byte after a index. But int is 4 bytes long so shoudn't i do
*(a+sizeof(*a)*n)
instead? How does it work?
No, the compiler takes care of that. There are special rules in pointer arithmetic, and that is one of them.
If you really only want to increment it by one byte, you have to cast the pointer to a pointer to a type which is one byte long (for example char).
Good question, but C will automatically multiply the offset by the size of the pointed-to type. In other words, when you access
p[n]
for a pointer declared as
T *p;
you will access the address p + (sizeof(T) * n) implicitly.
For instance, we can use C99 standard to find out what is going on. According to C99 standard:
6.5.2.1 Array subscripting
Constraints
- 1
One of the expressions shall have type ‘‘pointer to object type’’, the other expression shall
have integer type, and the result has type ‘‘type’’.
Semantics
- 2 A postfix expression followed by an expression in square brackets [] is a subscripted
designation of an element of an array object. The definition of the subscript operator []
is that E1[E2] is identical to (*((E1)+(E2))). Because of the conversion rules that
apply to the binary + operator, if E1 is an array object (equivalently, a pointer to the
initial element of an array object) and E2 is an integer, E1[E2] designates the E2-th
element of E1 (counting from zero).
And from 6.5.5.8 about conversion rules for + operator:
When an expression that has integer type is added to or subtracted from a pointer, the
result has the type of the pointer operand. If the pointer operand points to an element of
an array object, and the array is large enough, the result points to an element offset from
the original element such that the difference of the subscripts of the resulting and original
array elements equals the integer expression. In other words, if the expression P points to
the i-th element of an array object, the expressions (P)+N (equivalently, N+(P)) and
(P)-N (where N has the value n) point to, respectively, the i+n-th and i−n-th elements of
the array object, provided they exist. Moreover, if the expression P points to the last
element of an array object, the expression (P)+1 points one past the last element of the
array object, and if the expression Q points one past the last element of an array object,
the expression (Q)-1 points to the last element of the array object. If both the pointer
operand and the result point to elements of the same array object, or one past the last
element of the array object, the evaluation shall not produce an overflow; otherwise, the
behavior is undefined. If the result points one past the last element of the array object, it
shall not be used as the operand of a unary * operator that is evaluated.
Thus, all these notes are about your case and it works exactly as you wrote and you don't need special constructions, dereferencing or anything else (pointers arithmetic do that for you):
pn[1] => *((pn)+(1))
Or, in terms of byte pointers (to simplify description what is going on) this operation is similar to :
pn[1] => *(((char*)pn) + (1*sizeof(*pn)))
Moreover you can access this element with 1[pn] and result will be the same.
You should not. The rules what is happening when you add a int to a pointer are not obvious. So better not use your intuition, but read language standards about what is happens in such a cases. For example read more about pointer arithmetics here (C) or here (C++).
Shortly - a non-void pointer are "measured" in units of the type lengths.

How does unary addition on C pointers work?

I know that the unary operator ++ adds one to a number. However, I find that if I do it on an int pointer, it increments by 4 (the sizeof an int on my system). Why does it do this? For example, the following code:
int main(void)
{
int *a = malloc(5 * sizeof(int));
a[0] = 42;
a[1] = 42;
a[2] = 42;
a[3] = 42;
a[4] = 42;
printf("%p\n", a);
printf("%p\n", ++a);
printf("%p\n", ++a);
return 0;
}
will return three numbers with a difference of 4 between each.
It's just the way C is - the full explanation is in the spec, Section 6.5.6 Additive operators, paragraph 8:
When an expression that has integer type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the pointer operand points to an element of an array object, and the array is large enough, the result points to an element offset from the original element such that the difference of the subscripts of the resulting and original array elements equals the integer expression. In other words, if the expression P points to the i-th element of an array object, the expressions (P)+N (equivalently, N+(P)) and (P)-N (where N has the value n) point to, respectively, the i+n-th and i−n-th elements of the array object, provided they exist. Moreover, if the expression P points to the last element of an array object, the expression (P)+1 points one past the last element of the array object, and if the expression Q points one past the last element of an array object, the expression (Q)-1 points to the last element of the array object. If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined. If the result points one past the last element of the array object, it shall not be used as the operand of a unary * operator that is evaluated.
To relate that to your use of the prefix ++ operator, you need to also read Section 6.5.3.1 Prefix increment and decrement operators, paragraph 2:
The value of the operand of the prefix ++ operator is incremented. The result is the new value of the operand after incrementation. The expression ++E is equivalent to (E+=1).
And also Section 6.5.16.2 Compound assignment, paragraph 3:
A compound assignment of the form E1 op= E2 differs from the simple assignment expression E1 = E1 op (E2) only in that the lvalue E1 is evaluated only once.
It's incrementing the pointer location by the size of int, the declared type of the pointer.
Remember, an int * is just a pointer to a location in memory, where you are saying an "int" is stored. When you ++ to the pointer, it shifts it one location (by the size of the type), in this case, it will make your value "4" higher, since sizeof(int)==4.
The reason for this is to make the following statement true:
*(ptr + n) == ptr[n]
These can be used interchangeably.
In pointer arithmetic, adding one to a pointer will add the sizeof the type which it points to.
so for a given:
TYPE * p;
Adding to p will actually increment by sizeof(TYPE). In this case the size of the int is 4.
See this related question
Because in "C" pointer arithmetic is always scaled by the size of the object being pointed to. If you think about it a bit, it turns out to be "the right thing to do".
It does this so that you don't start accessing an integer in the middle of it.
Because a pointer is not a reference ;). It's not a value, it's just an address in memory. When you check the pointer's value, it will be a number, possibly big, and unrelated to the actual value that's stored at that memory position. Say, printf("%p\n", a); prints "2000000" - this means your pointer points to the 2000000th byte in your machine's memory. It's pretty much unaware of what value it's stored there.
Now, the pointer knows what type it points to. An integer, in your case. Since an integer is 4 bytes long, when you want to jump to the next "cell" the pointer points to, it needs to be 2000004. That's exatly 1 integer farther, so a++ makes perfect sense.
BTW, if you want to get 42 (from your example), print out the value pointed to: printf("%d\n", *a);
I hope this makes sense ;)
Thats simple, cause when it comes down to pointer, in your case an integer pointer, a unary increment means INCREMENT THE MEMORY LOCATION BY ONE UNIT, where ONE UNIT = SIZE OF INTEGER .
This size of integer depends from compile to compiler, for a 32-bit and 16-bit it is 4bytes, while for a 64-bit compiler it is 8bytes.
Try doing the same program with character datatype, it will give difference of 1 byte as character takes 1 byte.
In Short, the difference of 4's that
you've come across is the difference
of SIZE OF ONE INTEGER in memory.
Hope this helped, if it didn't i'll be glad to help just let me know.
"Why does it do this?" Why would you expect it to do anything else? Incrementing a point makes it point to the next item of the type that it's a pointer to.

Resources