When (void *) p == (void *) *p - What does the Standard say about this? - arrays

Example:
int a[99];
int (*p)[99] = &a;
// this prints 1
printf("%d\n", (void *) p == (void *) *p);
In general, if p is a pointer to an array, then both the object representations (i.e. the bit patterns) of p and *p are equal.
I'm just lost and completely unsure about the portability of this behaviour.
So, I'm curious whether this behaviour is guaranteed by the Standard. If so, could someone please quote all of the relevant paragraphs that guarantee it?

This comparison is guaranteed to be 1.
The relevant part of the C standard is section 6.5.9p6 regarding the equality operator and the comparison of pointers:
Two pointers compare equal if and only if both are null pointers, both
are pointers to the same object (including a pointer to an object and
a subobject at its beginning) or function, both are pointers to one
past the last element of the same array object, or one is a pointer to
one past the end of one array object and the other is a pointer to the
start of a different array object that happens to immediately
follow the first array object in the address space.
Take particular note of the passage in bold. This means two things: 1) a pointer to a struct and a pointer to its first member (suitable converted) will compare equal, and 2) a pointer to an array and a pointer to its first member (again, suitable converted) will compare equal.
In your particular case, p points to an array and *p is the array itself, and using *p in an expression yields a pointer to its first member. Both are converted to void * to give them a common type. So this comparison will always evaluate to 1.

In general, if p is a pointer to an array, then both the object representations (i.e. the bit patterns) of p and *p are equal.
If p is a pointer to an array, then *p is the array. The bit representation of the array is the concatenation of the bit representations of the elements of the array (because C 2018 6.2.5 20 says an array is made of contiguously allocated objects). The bits in the array are not generally equal to the bits in the pointer.
However, when an array is used in an expression other than as the operand of unary & or the operand of sizeof or as a string literal used to initialize an array, the array is automatically converted to a pointer to its first element. The first element of the array *p is (*p)[0], so *p is automatically converted to &(*p)[0].
Then the question is whether (void *) p equals (void *) &(*p)[0].
C 2018 6.3.2.3 1 tells us any pointer to an object type may be converted to void *. However, it does not tell us what the results of comparisons are while the pointer is void *. It does tell us that converting the void * back to its original type yields a pointer that compares equal to the original.
C 2018 6.5.9 6 tells us “Two pointers compare equal if and only if …, both are pointers to the same object (including a pointer to an object and a subobject at its beginning)…” (I elided some other cases that are not of concern here.) What are we to make of this given two void *? It seems the intent is for a pointer to “point to an object” even if it is currently in the form of a void *. Then (void *) p points to the array and (void *) &(*p)[0] points to a subobject at its beginning, so they compare equal.
The semantics would be clearer with (char *) p == (char *) *p because C 2018 6.3.2.3 7 tells us that converting to char * produces a pointer to the first byte of an object, and the first byte of an array is the same as the first byte of its first element.

Related

What does this mean: a pointer to void will never be equal to another pointer?

One of my friends pointed out from "Understanding and Using C Pointers - Richard Reese, O'Reilly publications" the second bullet point and I wasn't able to explain the first sentence from it. What am I missing?
Pointer to void
A pointer to void is a general-purpose pointer used to hold references to any data type. An example of a pointer to void is shown below:
void *pv;
It has two interesting properties:
A pointer to void will have the same representation and memory alignment as a pointer to char.
A pointer to void will never be equal to another pointer. However, two void pointers assigned a NULL value will be equal.
This is my code, not from the book and all pointers are having the same value and are equal.
#include <stdio.h>
int main()
{
int a = 10;
int *p = &a;
void *p1 = (void*)&a;
void *p2 = (void*)&a;
printf("%p %p\n",p1,p2);
printf("%p\n",p);
if(p == p1)
printf("Equal\n");
if(p1 == p2)
printf("Equal\n");
}
Output:
0x7ffe1fbecfec 0x7ffe1fbecfec
0x7ffe1fbecfec
Equal
Equal
TL/DR: the book is wrong.
What am I missing?
Nothing, as far as I can see. Even the erratum version presented in comments ...
A pointer to void will never be equal to another pointer to void.
... simply is not supported by the C language specification. To the extent that the author is relying on the language specification, the relevant text would be paragraph 6.5.9/6:
Two pointers compare equal if and only if both are null pointers, both
are pointers to the same object (including a pointer to an object and
a subobject at its beginning) or function, both are pointers to one
past the last element of the same array object, or one is a pointer to
one past the end of one array object and the other is a pointer to the
start of a different array object that happens to immediately follow
the first array object in the address space.
void is an object type, albeit an "incomplete" one. Pointers to void that are valid and non-null are pointers to objects, and they compare equal to each other under the conditions expressed by the specification. The usual way that such pointers are obtained is by converting an object pointer of a different (pointer) type to void *. The result of such a conversion still points to the same object that the original pointer did.
My best guess is that the book misinterprets the spec to indicate that pointers to void should not be interpreted as pointers to objects. Although there are special cases that apply only to pointers to void, that does not imply that general provisions applying to object pointers do not also apply to void pointers.
C 2018 6.5.9 6 says:
Two pointers compare equal if and only if both are null pointers, both are pointers to the same object (including a pointer to an object and a subobject at its beginning) or function, both are pointers to one past the last element of the same array object, or one is a pointer to one past the end of one array object and the other is a pointer to the start of a different array object that happens to immediately follow the first array object in the address space.
So, suppose we have:
int a;
void *p0 = &a;
void *p1 = &a;
Then, if p0 and p1 “point to the same object”, p0 == p1 must evaluate as true. However, one might interpret the standard to mean that a void * does not point to anything while it is a void *; it just holds the information necessary to convert it back to its original type. But we can test this interpretation.
Consider the specification that two pointers compare equal if they point to an object and a subobject at its beginning. That means that given int a[1];, &a == &a[0] should evaluate as true. However, we cannot properly use &a == &a[0], because the constraints for == for pointers require the operands point to compatible types or that one or both is a void * (with qualifiers like const allowed). But a and a[0] neither have compatible types nor are void.
The only way for a fully defined situation to arise in which we are comparing pointers to this object and its subobject is for at least one of the pointers to have been converted either to void * or to a pointer to a character type (because these are given special treatment in conversions). We could interpret the standard to mean only the latter, but I judge the more reasonable interpretation to be that void * is included. The intent is that (void *) &a == (void *) &a[0] is to be interpreted as a comparison of a pointer to the object a to a pointer to the object a[0] even though those pointers are in the form void *. Thus, these two void * should compare as equal.
The following section from this Draft C11 Standard completely refutes the claim made (even with the clarification mentioned in the 'errata', in the comment by GSerg).
6.3.2.3 Pointers
1     A pointer to void may be converted to or from a pointer to any object type. A pointer to any
object type may be converted to a pointer to void and back again;
the result shall compare equal to the original pointer.
Or, this section from the same draft Standard:
7.20.1.4 Integer types capable of holding object pointers
1    The following type designates a signed integer type with
the property that any valid pointer to void can be converted to this
type, then converted back to pointer to void, and the result will
compare equal to the original pointer:
      intptr_t
A pointer is just an address in memory. Any two pointers are equal if they're NULL or if they point to the same address. You can go on and on about how that can happen with the language of structures, unions and so on. But in the end, it's simply just algebra with memory locations.
A pointer to void will never be equal to another pointer. However, two void pointers assigned a NULL value will be equal.
Since NULL is mentioned in that statement, I believe it is a mistype. The statement should be something like
A pointer to void will never be equal to NULL pointer. However, two void pointers assigned a NULL value will be equal.
That means any valid pointer to void is never equal to NULL pointer.

How memcpy finds the first byte of the passed object?

Let g be a object designator.
void *p = &g;
char *pf = (char *)p;
Every pointer type can be converted to a pointer to void and back, the result shall compare equal to original pointer.
When a pointer to object type casted to a pointer to character type, the resulting pointer points to the first byte of the object.
Pointer to void and pointer to char types are interchangeable.
But at code example above. A void pointer doesn't need to point anything, all it needs to do is to conform condition number 1. So we can't even say that it points to our object. So if we cast that void pointer to a character pointer we can't say that resulting pointer points to lowest addressing byte of our object.
My question is, if my conclusion is true, how memcpy function finds the lowest addressed byte of the passed object; since every pointer passed to memcpy converted to a pointer to void?
The C standard fails to present rules for pointer conversions expressed in formal mathematics or logic. It expresses rules in natural language (English) in clause 6.3.2.3 (in C 2018). While these natural language rules do not explicitly state that a pointer to an object converted first to void * and then to char * yields the same result as converting directly to char *, this is understood. That is, experienced practitioners with C and compilers understand this is the intent.

division during pointer subtraction in C

Consider below code snippet :
int *p;
/* Lets say p points to address 100
and sizeof(int) is 4 bytes. */
int *q = p+1;
unsigned long r = q-p;
/* r results in 1, hence for r = q-p
something is happening similar to r=(104-100)/4 */
Is there a real division by sizeof(datatype) going on during runtime when two pointers of same type are subtracted, or there is some other mechanism through which pointer subtraction works.
The C standard states the following regarding pointer subtraction (section 6.5.6p9):
When two pointers are subtracted, both shall point to elements of the
same array object, or one past the last element of the array
object; the result is the difference of the subscripts of the
two array elements. The size of the result is
implementation-defined, and its type (a signed integer type) is
ptrdiff_t defined in the header. If the result is not
representable in an object of that type, the behavior is
undefined. In other words, if the expressions P and Q point to,
respectively, the i
-th and j
-th elements of an array object, the expression (P)-(Q) has the value i−j provided the value fits in an object of type ptrdiff_t . Moreover,
if the expression P points either to an element of an array object or
one past the last element of an array object, and the expression Q
points to the last element of the same array object, the expression
((Q)+1)-(P) has the same value as ((Q)-(P))+1 and as
-((P)-((Q)+1)) , and has the value zero if the expression P points one past the last element of the array object, even
though the expression (Q)+1 does not point to an element of the array
object. 106)
Footnote 106 states:
Another way to approach pointer arithmetic is first to convert the
pointer(s) to character pointer(s): In this scheme the integer
expression added to or subtracted from the converted pointer is first
multiplied by the size of the object originally pointed to,
and the resulting pointer is converted back to the original
type. For pointer subtraction, the result of the difference
between the character pointers is similarly divided by the size of
the object originally pointed to. When viewed in this way, an
implementation need only provide one extra byte (which may
overlap another object in the program) just after the end of the
object in order to satisfy the "one past the last element"
requirements.
So the footnote states that pointer subtraction may be implemented by subtracting the raw pointer values and dividing by the size of the pointed-to object. It doesn't have to be implemented this way, however.
Note also that the standard requires that pointer subtraction is performed between pointers pointing to elements of the same array object (or one element past the end). If they don't then the behavior is undefined. In practice, if you're working on a system with a flat memory model you'll probably still get the "expected" values but you can't depend on that.
See #dbush answer for the explanation on how pointer substraction works.
If, instead, you are programming something low-level, say a kernel, driver, debugger or similar and you need to have actual subtraction of addresses, cast the pointers to char *:
(char *)q - (char *)p
The result will be of ptrdiff_t type, an implementation defined signed integer.
Of course, this is not defined/portable C, but will work on most architectures/environments.

C programming address for 2d array

If I initialized a 2d array let’s say
Int a[2][3] = {
1, 2, 3,
4, 5, 6};
Is a[0] == &a[0]??
I know a[0] refers to the address for the first element of the array. So is &a[0] still the address?
First of all, the type of arrayNum[0] is Int[3] and the type of &arrayNum[0] is Int(*)[3] (I didn't change the OP's Int to the probable int).
Secondly, arrays can decay to a pointer to its first element, so arrayNum[0] can decay to &arrayNum[0][0] which is of type Int*.
Both those pointers, &arrayNum[0] and &arrayNum[0][0] will point to the same location, but their types are very different.
I'm not sure what you meant to comapre using the == in your question, but let me tell you these, they are not the same.
Data type:
Check the data type.
a[0] is the first element of the array of type int [3].
&a[0] is the pointer to the first element of the array of type int [3], so, it is essentially int (*) [3].
Usage: Now, based on the usage, in certain cases Note, an "array type", decays to a pointer to it's first element. Considering that case, a[0] and &a[0], both are equivalent to writing &(a[0][0]), so the pointer value will be same.
For better understanding of the difference, use both a[0] and &a[0] as the argument yo sizeof operator (where the decay does not happen) and print the value using %zu conversion specifier.
Typically, they will print
12, which is (sizeof (int) * 3) and
8, which is sizeof (int (*) [3])
on a platform where size of an int is 4 and size of a pointer is 8.
[Note]:
Quoting C11, chapter §6.3.2.1
Except when it is the operand of the sizeof operator, the _Alignof operator, or the
unary& operator, or is a string literal used to initialize an array, an expression that has
type ‘‘array of type’’ is converted to an expression with type ‘‘pointer to type’’ that points
to the initial element of the array object and is not an lvalue. [....]
There are two senses in which you might ask whether a[0] equals &a[0]:
Do a[0] and &a[0] point to the same place?
and:
Does a[0] == &a[0] evaluate to true?
It is not clear from your question which you mean. Your text asks “Is a[0] == &a[0]?” Since the “==” is not in code format, it is not clear whether you intended to exclude it.
The answer to the first question is yes (given that a[0] is automatically converted to an address), and the answer to the second question is not necessarily.
As other answers and comments have pointed out, a[0] and &a[0] are different things. a is an array of two arrays of three int. So a[0] is an array of three int, and, in most expressions, it is automatically converted to a pointer to its first element. So the result is a pointer to an int, effectively &a[0][0]. In contrast, &a[0] is the address of an array of three int.
So, these expressions point to two different objects, but the two objects start at the same location, so the pointers point to the same “place.” We can see this in:
(char *) a[0] == (char *) &a[0] // Evaluates to true.
When we convert a pointer to a pointer to char, the result points to the first (lowest addressed) byte of the object. Since the two pointers point to the same place, this expression will evaluate to true.
However, when you evaluate a[0] == &a[0], there is a problem. To conform to the C standard, a comparison of pointers must compare pointers to compatible types. But int and array of three int are not compatible types. So this comparison is not strictly conforming C, although some compilers may allow it, likely with a warning message. We can instead evaluate:
a[0] == (int *) &a[0] // Value is not specified.
By converting the pointer on the right to a pointer to int, we make the left and right sides have the same type, and we can compare them. However, the result of the comparison is not defined. This is because that, although the C standard allows us to convert a pointer to one type to a pointer to another type, it does not generally guarantee what the value that results from the conversion is, except that, if you convert it back to the original type, then it will compare equal to the original pointer. (Converting to a pointer to a character type is special; for those, the compiler does guarantee the result points to the first byte of the object.)
So, since we do not know what the value of (int *) &a[0] is, we do not know whether comparing it to a[0] will return true or false.
This might seem strange; if one address points to the same place as another address, why wouldn’t they compare equal? On some computers, there is more than one way of referring to the same place in memory. Addresses may actually be formed of comnbinations of parts, such as base addresses plus offsets. For example, the address (1000, 230), representing 1230, points to the same place as (1200, 30), also representing 1230. But clearly (1000, 230) is not the same as (1200, 30).
When you compare two pointers to the same type, the compiler automatically adjusts the representations of the addresses in whatever way it needs to to perform the comparison. But, when you convert a pointer to one type to a pointer to another (non-character) type, the change of types may prevent the compiler from having the information it needs to do this adjustment properly. So the C standard does not tell us what happens in this case.
No they are not the same.
a[0] is an element of type int[3], while &a[0] is a pointer (of type int*[3]) to a[0].
But both of them points to the same address (the first element of a[0]), but are not the same.

Accesing a 2D array using a single pointer

There are tons of code like this one:
#include <stdio.h>
int main(void)
{
int a[2][2] = {{0, 1}, {2, -1}};
int *p = &a[0][0];
while (*p != -1) {
printf("%d\n", *p);
p++;
}
return 0;
}
But based on this answer, the behavior is undefined.
N1570. 6.5.6 p8:
When an expression that has integer type is added to or subtracted
from a pointer, the result has the type of the pointer operand. If the
pointer operand points to an element of an array object, and the array
is large enough, the result points to an element offset from the
original element such that the difference of the subscripts of the
resulting and original array elements equals the integer expression.
In other words, if the expression P points to the i-th element of an
array object, the expressions (P)+N (equivalently, N+(P)) and (P)-N
(where N has the value n) point to, respectively, the i+n-th and
i−n-th elements of the array object, provided they exist. Moreover,
if the expression P points to the last element of an array object, the
expression (P)+1 points one past the last element of the array object,
and if the expression Q points one past the last element of an array
object, the expression (Q)-1 points to the last element of the array
object. If both the pointer operand and the result point to elements
of the same array object, or one past the last element of the array
object, the evaluation shall not produce an overflow; otherwise, the
behavior is undefined. If the result points one past the last element
of the array object, it shall not be used as the operand of a unary
* operator that is evaluated.
Can someone explain this in detail?
The array who's base address (pointer to first element) p is assigned is of type int[2]. This means the address in p can legally be dereferenced only at locations *p and *(p+1), or if you prefer subscript notation, p[0] and p[1]. Furthermore, p+2 is guaranteed to be a legally evaluated as an address, and comparable to other addresses in that sequence, but can not be dereferenced. This is the one-past address.
The code you posted violates the one-past rule by dereferencing p once it passes the last element in the array in which it is homed. That the array in which it is homed is buttressed up against another array of similar dimension is not relevant to the formal definition cited.
That said, in practice it works, but as is often said. observed behavior is not, and should never be considered, defined behavior. Just because it works doesn't make it right.
The object representation of pointers is opaque, in C. There is no prohibition against pointers having bounds information encoded. That's one possibility to keep in mind.
More practically, implementations are also able to achieve certain optimizations based on assumptions which are asserted by rules like these: Aliasing.
Then there's the protection of programmers from accidents.
Consider the following code, inside a function body:
struct {
char c;
int i;
} foo;
char * cp1 = (char *) &foo;
char * cp2 = &foo.c;
Given this, cp1 and cp2 will compare as equal, but their bounds are nonetheless different. cp1 can point to any byte of foo and even to "one past" foo, but cp2 can only point to "one past" foo.c, at most, if we wish to maintain defined behaviour.
In this example, there might be padding between the foo.c and foo.i members. While the first byte of that padding co-incides with "one past" the foo.c member, cp2 + 2 might point into the other padding. The implementation can notice this during translation and instead of producing a program, it can advise you that you might be doing something you didn't think you were doing.
By contrast, if you read the initializer for the cp1 pointer, it intuitively suggests that it can access any byte of the foo structure, including padding.
In summary, this can produce undefined behaviour during translation (a warning or error) or during program execution (by encoding bounds information); there's no difference, standard-wise: The behaviour is undefined.
You can cast your pointer into a pointer to a pointer to array to ensure the correct array semantics.
This code is indeed not defined but provided as a C extension in every compiler in common usage today.
However the correct way of doing it would be to cast the pointer into a pointer to array as so:
((int (*)[2])p)[0][0]
to get the zeroth element or say:
((int (*)[2])p)[1][1]
to get the last.
To be strict, he reason I think this is illegal is that you are breaking strict aliasing, pointers to different types may not point to the same address (variable).
In this case you are creating a pointer to an array of ints and a pointer to an int and pointing them to the same value, this is not allowed by the standard as the only type that may alias another pointer is a char * and even this is rarely used properly.

Resources