I wonder what defines if the start of an memory object is at lower or higher addresses than the end of an object. For example:
char buffer[10];
char* p = &buffer[0];
printf("%p\n",p); //0x7fff064a6276
p = &buffer[9];
printf("%p\n",p); //0x7fff064a627f
In this example the start of object is at a lower address than the end. Even though the stack grows towards lower addresses.
Why does the layout goes the reverse direction of the stack growth?
What defines this direction? Language? OS? Compiler? CPU architecture? ...
Is it always the case that the end of the object is at a higher address than the beginning?
One part of the standard that is relevant is in §6.3.2.3 Pointers (under §6.3 Conversions):
¶7 … When a pointer to an object is converted to a pointer to a character type, the result points to the lowest addressed byte of the object. Successive increments of the result, up to the size of the object, yield pointers to the remaining bytes of the object.
Another relevant portion is §6.7.2.1 Structure and union specifiers:
¶15 Within a structure object, the non-bit-field members and the units in which bit-fields reside have addresses that increase in the order in which they are declared. A pointer to a structure object, suitably converted, points to its initial member (or if that member is a bit-field, then to the unit in which it resides), and vice versa. There may be unnamed
padding within a structure object, but not at its beginning.
The definition of addition (and subtraction) is partly relevant (§6.5.6 Additive operators):
¶8 When an expression that has integer type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the pointer operand points to an element of an array object, and the array is large enough, the result points to an element offset from the original element such that the difference of the subscripts of the resulting and original
array elements equals the integer expression. In other words, if the expression P points to the i-th element of an array object, the expressions (P)+N (equivalently, N+(P)) and (P)-N (where N has the value n) point to, respectively, the i+n-th and i−n-th elements of the array object, provided they exist. Moreover, if the expression P points to the last element of an array object, the expression (P)+1 points one past the last element of the array object, and if the expression Q points one past the last element of an array object, the expression (Q)-1 points to the last element of the array object. If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined. If the result points one past the last element of the array object, it shall not be used as the operand of a unary * operator that is evaluated.
¶9 is a similar paragraph defining the behaviour of subtraction.
And then there's §6.5.2.1 Array subscripting:
¶2 A postfix expression followed by an expression in square brackets [] is a subscripted
designation of an element of an array object. The definition of the subscript operator []
is that E1[E2] is identical to (*((E1)+(E2))). Because of the conversion rules that
apply to the binary + operator, if E1 is an array object (equivalently, a pointer to the
initial element of an array object) and E2 is an integer, E1[E2] designates the E2-th
element of E1 (counting from zero).
From these, you know that the address of an object converted to a char * must point to the lowest byte address holding the object. In practice, this means that the 'object pointer' address of the object also points to the lowest address. The rule in no way enforces that the data in an int type must be little-endian or big-endian; both are valid.
You also know that the first element in a structure is at a lower address within the structure than later elements.
Most compilers will allocate space on the stack for all of the local variables in one block, with the start of the arrays at the lowest address going upwards.
You need to go to a deeper subroutine to see the addresses "going down".
For example call other subroutine which also has a local buffer. You will find its memory 'lower' in the address space (as the stack has gotten bigger) that the local array in the parent routine.
Related
Decrementing a pointer to the first element of an array is an undefined behaviour as of C17. This answer cites C17 standard saying
C17 6.5.6/8
If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined.
Is this the earliest standard where this was defined? How such operation was defined in earlier standards? Was it legal before?
The very earliest C standard, C89, has the same rule in 3.3.6 Additive Operators:
When an expression that has integral type is added to or subtracted from a pointer, the integral value is first multiplied by the size of the object pointed to. The result has the type of the pointer operand. If the pointer operand points to a member of an array object, and the array object is large enough, the result points to a member of the same array object, appropriately offset from the original member. Thus if P points to a member of an array object, the expression P+1 points to the next member of the array object. Unless both the pointer operand and the result point to a member of the same array object, or one past the last member of the array object, the behavior is undefined. Unless both the pointer operand and the result point to a member of the same array object, or the pointer operand points one past the last member of an array object and the result points to a member of the same array object, the behavior is undefined if the result is used as the operand of a unary * operator.
I don't believe that forming pointers to the "-1" element of an array has ever been well-defined C. Of course there might have been specific implementations where it happened to work, or was documented to do so.
Does the C standard require pointers to be (integer) numbers?
One may argue that yes, because of pointer arithmetic...
But on the other hand operations like -- or ++ may be understood as previous memory location, next memory location, depending on how they are described in the standard, and actual implementation may use any representation to hold pointer data (as long as mentioned operations are implemented)...
Another question comes to mind - does C require arrays/buffers etc. to be contiguous, i.e. next element is stored in next memory location (++p where p is a pointer)? I ask because you can often see implementations online that seem to assume that it does.
No, pointers need not be plain numbers.
If you read the standard, there are provisions for that:
Two pointers to unrelated objects (meaning not part of a bigger object, remember structs and arrays) may not be compared, except for equality.
6.5.8 Relational operators
[...]
5 When two pointers are compared, the result depends on the relative locations in the address space of the objects pointed to. If two pointers to object or incomplete types both point to the same object, or both point one past the last element of the same array object, they compare equal. If the objects pointed to are members of the same aggregate object, pointers to structure members declared later compare greater than pointers to members declared earlier in the structure, and pointers to array elements with larger subscript
values compare greater than pointers to elements of the same array with lower subscript values. All pointers to members of the same union object compare equal. If the expression P points to an element of an array object and the expression Q points to the last element of the same array object, the pointer expression Q+1 compares greater than P. In all other cases, the behavior is undefined.
Two pointers to unrelated objects may not be subtracted.
6.5.6 Additive operators
[...]
9 When two pointers are subtracted, both shall point to elements of the same array object, or one past the last element of the array object; the result is the difference of the subscripts of the twoarray elements. The size of the result is implementation-defined, and its type (a signed integer type) is ptrdiff_t defined in the <stddef.h> header. If the result is not representable in an object of that type, the behavior is undefined. In other words, if the expressions P and Q point to, respectively,the i-th and j-th elements of an array object, the expression (P)-(Q) has the value i−j provided the value fits in an object of type ptrdiff_t. Moreover, if the expression P points either to an element of an array object or one past the last element of an array object, and the expression Q points to the last element of the same array object, the expression ((Q)+1)-(P) has the same
value as ((Q)-(P))+1 and as -((P)-((Q)+1)), and has the value zero if the expression P points one past the last element of the array object, even though the expression (Q)+1 does not point to an element of the array object.91)
There may not be a way to represent a pointer as a number, as no suitable type might exist. Thus, trying to convert might result in Undefined Behavior.
Any specific implementation defining a behavior does not mean it isn't UB according to the standard.
6.3.2.3 Pointers
[...]
6 Any pointer type may be converted to an integer type. Except as previously specified, the result is implementation-defined. If the result cannot be represented in the integer type, the behavior is undefined. The result need not be in the range of values of anyinteger type.
7.18.1.4 Integer types capable of holding object pointers
1 The following type designates a signed integer type with the property that any valid pointer to void can be converted to this type, then converted back to pointer to void, and the result will compare equal to the original pointer:
intptr_t
The following type designates an unsigned integer type with the property that any valid pointer to void can be converted to this type, then converted back to pointer to void, and the result will compare equal to the original pointer:
uintptr_t
These types are optional.
That's just off the top of my head, I'm sure there's more.
All quotes from n1256 (C99 draft).
Arrays have always been required to be contiguous.
To answer to your second question in arrays elements are in contiguous Memory locations. Thats why you use pointer arithmetic to move between elements.
I'm probably misunderstanding this, but does the c99 spec prevent any form of pointer arithmetic on dynamically allocated memory?
From 6.5.6p7...
For the purposes of these operators, a pointer to an object that is not an element of an array behaves the same as a pointer to the first element of an array of length one with the type of the object as its element type.
... a pointer to an object not in an array is treated as if it points into an array of 1 item (when using the operators + and -). Then in this snippet:
char *make_array (void) {
char *p = malloc(2*sizeof(*p));
p[0] = 1; // valid
p[1] = 2; // invalid ?
return p;
}
...the second subscript p[1] is invalid? Since p points to an object not in an array it is treated as pointing to an object in an array of one item and then from 6.5.6p8...
When an expression that has integer type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the pointer operand points to an element of an array object, and the array is large enough, the result points to an element offset from the original element such that the difference of the subscripts of the resulting and original array elements equals the integer expression. In other words, if the expression P points to the i-th element of an array object, the expressions (P)+N (equivalently, N+(P)) and (P)-N (where N has the value n) point to, respectively, the i+n-th and i−n-th elements of the array object, provided they exist. Moreover, if the expression P points to the last element of an array object, the expression (P)+1 points one past the last element of the array object, and if the expression Q points one past the last element of an array object, the expression (Q)-1 points to the last element of the array object. If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined. If the result points one past the last element of the array object, it shall not be used as the operand of a unary * operator that is evaluated.
...we have undefined behaviour since we dereference past the array bound (the one implied to have length 1).
Edit:
OK, to try to clarify more what confuses me, let's do it step-by-step:
1.) p[1] is defined to mean *(p+1).
2.) p points to an object that isn't inside of an array, so it's treated as if it points to an object inside an array of length 1 for the purpose of evalutating p+1.
3.) p+1 produces a pointer 1 past the array that p is implied to point into.
4.) *(p+1) does the invalid dereference.
From C99, 7.20.3 - Memory management functions (emphasis mine) :
The pointer returned if the allocation succeeds is suitably aligned so that it may be assigned to a pointer to any type of object and then used to access such an object or an array of such objects in the space allocated (until the space is explicitly deallocated).
This implies that the allocated memory can be accessed as an array of char (as per your example), and so pointer arithmetic is well defined.
Last line yields "invalid operands to binary expression". Trying to understand why. Does it mean that "p2-p1" is an invalid operand to the binary expression "-" that lies to the right of p3? Any rule I can follow here? Confusing to me because "3-2-1" integers are valid.
int array[3] = {1,2,3};
int* p1 = &array[0];
int* p2 = &array[1];
int* p3 = &array[2];
p3-p2-p1;
You are doing address arithmetic. Given operator precedence, it is evaluating p1-p2-p3 as (p1-p2)-p3. p1-p2 yields not an address but an integer. Then you are attempting to subtract an address from an integer, which isn't valid. You could do p1-(p2-p3), then it's taking p2-p3, yielding an integer, and subtracting that as an integer offset from an address (p1), which will compile. However, [Thanks to #EOF for this reference in his comment] such subtraction (of integer from a pointer) would only be valid if it points somewhere within the allocation for p1. It's subject to the C11 standard described specifically in section 6.5.6, excerpted below:
When an expression that has integer type is added to or subtracted
from a pointer, the result has the type of the pointer operand. If the
pointer operand points to an element of an array object, and the array
is large enough, the result points to an element offset from the
original element such that the difference of the subscripts of the
resulting and original array elements equals the integer expression.
In other words, if the expression P points to the i-th element of an
array object, the expressions (P)+N (equivalently, N+(P)) and (P)-N
(where N has the value n) point to, respectively, the i+n-th and
i−n-th elements of the array object, provided they exist. Moreover, if
the expression P points to the last element of an array object, the
expression (P)+1 points one past the last element of the array object,
and if the expression Q points one past the last element of an array
object, the expression (Q)-1 points to the last element of the array
object. If both the pointer operand and the result point to elements
of the same array object, or one past the last element of the array
object, the evaluation shall not produce an overflow; otherwise, the
behavior is undefined. If the result points one past the last element
of the array object, it shall not be used as the operand of a unary *
operator that is evaluated.
In your code, p1, p2 and p3 are all pointers to integers, not integers.
To get what you want, you probably want:
*p3 - *p2 - *p1;
where the * operator is the dereference operator. It dereferences pointers, so in this case *p3 etc are of type int. You can think of it as the inverse of the & address-of operator.
Lets say I want to allocate memory for 3 integers:
int *pn = malloc(3 * sizeof(*pn));
Now to assign to them values I do:
pn[0] = 5550;
pn[1] = 11;
pn[2] = 70000;
To access 2nd value I do:
pn[1]
But the [n] operator is just a shortcut for *(a+n). Then it would mean that i access first byte after a index. But int is 4 bytes long so shoudn't i do
*(a+sizeof(*a)*n)
instead? How does it work?
No, the compiler takes care of that. There are special rules in pointer arithmetic, and that is one of them.
If you really only want to increment it by one byte, you have to cast the pointer to a pointer to a type which is one byte long (for example char).
Good question, but C will automatically multiply the offset by the size of the pointed-to type. In other words, when you access
p[n]
for a pointer declared as
T *p;
you will access the address p + (sizeof(T) * n) implicitly.
For instance, we can use C99 standard to find out what is going on. According to C99 standard:
6.5.2.1 Array subscripting
Constraints
- 1
One of the expressions shall have type ‘‘pointer to object type’’, the other expression shall
have integer type, and the result has type ‘‘type’’.
Semantics
- 2 A postfix expression followed by an expression in square brackets [] is a subscripted
designation of an element of an array object. The definition of the subscript operator []
is that E1[E2] is identical to (*((E1)+(E2))). Because of the conversion rules that
apply to the binary + operator, if E1 is an array object (equivalently, a pointer to the
initial element of an array object) and E2 is an integer, E1[E2] designates the E2-th
element of E1 (counting from zero).
And from 6.5.5.8 about conversion rules for + operator:
When an expression that has integer type is added to or subtracted from a pointer, the
result has the type of the pointer operand. If the pointer operand points to an element of
an array object, and the array is large enough, the result points to an element offset from
the original element such that the difference of the subscripts of the resulting and original
array elements equals the integer expression. In other words, if the expression P points to
the i-th element of an array object, the expressions (P)+N (equivalently, N+(P)) and
(P)-N (where N has the value n) point to, respectively, the i+n-th and i−n-th elements of
the array object, provided they exist. Moreover, if the expression P points to the last
element of an array object, the expression (P)+1 points one past the last element of the
array object, and if the expression Q points one past the last element of an array object,
the expression (Q)-1 points to the last element of the array object. If both the pointer
operand and the result point to elements of the same array object, or one past the last
element of the array object, the evaluation shall not produce an overflow; otherwise, the
behavior is undefined. If the result points one past the last element of the array object, it
shall not be used as the operand of a unary * operator that is evaluated.
Thus, all these notes are about your case and it works exactly as you wrote and you don't need special constructions, dereferencing or anything else (pointers arithmetic do that for you):
pn[1] => *((pn)+(1))
Or, in terms of byte pointers (to simplify description what is going on) this operation is similar to :
pn[1] => *(((char*)pn) + (1*sizeof(*pn)))
Moreover you can access this element with 1[pn] and result will be the same.
You should not. The rules what is happening when you add a int to a pointer are not obvious. So better not use your intuition, but read language standards about what is happens in such a cases. For example read more about pointer arithmetics here (C) or here (C++).
Shortly - a non-void pointer are "measured" in units of the type lengths.