There are tons of code like this one:
#include <stdio.h>
int main(void)
{
int a[2][2] = {{0, 1}, {2, -1}};
int *p = &a[0][0];
while (*p != -1) {
printf("%d\n", *p);
p++;
}
return 0;
}
But based on this answer, the behavior is undefined.
N1570. 6.5.6 p8:
When an expression that has integer type is added to or subtracted
from a pointer, the result has the type of the pointer operand. If the
pointer operand points to an element of an array object, and the array
is large enough, the result points to an element offset from the
original element such that the difference of the subscripts of the
resulting and original array elements equals the integer expression.
In other words, if the expression P points to the i-th element of an
array object, the expressions (P)+N (equivalently, N+(P)) and (P)-N
(where N has the value n) point to, respectively, the i+n-th and
i−n-th elements of the array object, provided they exist. Moreover,
if the expression P points to the last element of an array object, the
expression (P)+1 points one past the last element of the array object,
and if the expression Q points one past the last element of an array
object, the expression (Q)-1 points to the last element of the array
object. If both the pointer operand and the result point to elements
of the same array object, or one past the last element of the array
object, the evaluation shall not produce an overflow; otherwise, the
behavior is undefined. If the result points one past the last element
of the array object, it shall not be used as the operand of a unary
* operator that is evaluated.
Can someone explain this in detail?
The array who's base address (pointer to first element) p is assigned is of type int[2]. This means the address in p can legally be dereferenced only at locations *p and *(p+1), or if you prefer subscript notation, p[0] and p[1]. Furthermore, p+2 is guaranteed to be a legally evaluated as an address, and comparable to other addresses in that sequence, but can not be dereferenced. This is the one-past address.
The code you posted violates the one-past rule by dereferencing p once it passes the last element in the array in which it is homed. That the array in which it is homed is buttressed up against another array of similar dimension is not relevant to the formal definition cited.
That said, in practice it works, but as is often said. observed behavior is not, and should never be considered, defined behavior. Just because it works doesn't make it right.
The object representation of pointers is opaque, in C. There is no prohibition against pointers having bounds information encoded. That's one possibility to keep in mind.
More practically, implementations are also able to achieve certain optimizations based on assumptions which are asserted by rules like these: Aliasing.
Then there's the protection of programmers from accidents.
Consider the following code, inside a function body:
struct {
char c;
int i;
} foo;
char * cp1 = (char *) &foo;
char * cp2 = &foo.c;
Given this, cp1 and cp2 will compare as equal, but their bounds are nonetheless different. cp1 can point to any byte of foo and even to "one past" foo, but cp2 can only point to "one past" foo.c, at most, if we wish to maintain defined behaviour.
In this example, there might be padding between the foo.c and foo.i members. While the first byte of that padding co-incides with "one past" the foo.c member, cp2 + 2 might point into the other padding. The implementation can notice this during translation and instead of producing a program, it can advise you that you might be doing something you didn't think you were doing.
By contrast, if you read the initializer for the cp1 pointer, it intuitively suggests that it can access any byte of the foo structure, including padding.
In summary, this can produce undefined behaviour during translation (a warning or error) or during program execution (by encoding bounds information); there's no difference, standard-wise: The behaviour is undefined.
You can cast your pointer into a pointer to a pointer to array to ensure the correct array semantics.
This code is indeed not defined but provided as a C extension in every compiler in common usage today.
However the correct way of doing it would be to cast the pointer into a pointer to array as so:
((int (*)[2])p)[0][0]
to get the zeroth element or say:
((int (*)[2])p)[1][1]
to get the last.
To be strict, he reason I think this is illegal is that you are breaking strict aliasing, pointers to different types may not point to the same address (variable).
In this case you are creating a pointer to an array of ints and a pointer to an int and pointing them to the same value, this is not allowed by the standard as the only type that may alias another pointer is a char * and even this is rarely used properly.
Related
I was playing around with some arrays and pointers in c and started wondering whether doing this would be undefined behavior.
int (*arr)[5] = malloc(sizeof(int[5][5]));
// Is this undefined behavior?
int val0 = arr[0][5];
// Rephrased, is it guaranteed it'll always have the same effect as this line?
int val1 = arr[1][0];
Thank you for any insights.
In C, what you're doing is undefined behavior.
The expression arr[0] has type int [5]. So the expression arr[0][5] dereferences one element past the end of the array arr[0], and dereferencing past the end of an array is undefined behavior.
Section 6.5.2.1p2 of the C standard regarding Array Subscripting states:
The definition of the subscript operator [] is that E1[E2] is identical to (*((E1)+(E2))).
And section 6.5.6p8 of the C standard regarding Additive Operators states:
When an expression that has integer type is added to or
subtracted from a pointer, the result has the type of the pointer
operand. If the pointer operand points to an element of an array
object, and the array is large enough, the result points to an element
offset from the original element such that the difference of the
subscripts of the resulting and original array elements equals the
integer expression. In other words, if the expression P points to
the i-th element of an array object, the expressions (P)+N
(equivalently,N+(P)) and (P)-N (where N has the value n)
point to, respectively, the i+n-th and i−n -th elements of the
array object, provided they exist. Moreover, if the
expression P points to the last element of an array object, the
expression (P)+1 points one past the last element of the array
object, and if the expression Q points one past the last
element of an array object,the expression (Q)-1 points to the
last element of the array object. If both the pointer operand
and the result point to elements of the same array object,
or one past the last element of the array object, the evaluation
shall not produce an overflow; otherwise, the behavior is undefined.
If the result points one past the last element of the array object, it
shall not be used as the operand of a unary * operator that is
evaluated.
The part in bold specifies that the addition implicit in an array subscript may not result in a pointer more that one element past the end of an array, and that a pointer to one element past the end of an array may not be defererenced.
The fact that the array in question is itself a member of an array, meaning the elements of each subarray are continuous in memory, doesn't change this. Aggressive optimization settings in the compiler may note that it is undefined behavior to access past the end of the array and make optimizations based on this fact.
The Standard is clearly intended to avoid requiring that a compiler given something like:
int foo[5][10];
int test(int i)
{
foo[1][0] = 1;
foo[0][i] = 2;
return foo[1][0];
}
must reload the value of foo[1][0] to accommodate the possibility that the write to foo[0][i] might affect foo[1][0]. On the other hand, before the Standard was written, it would have been idiomatic to write something like:
void dump_array(int *p, int rows, int cols)
{
int i,j;
for (i=0; i<rows; i++)
{
for (j=0; j<cols; j++)
printf("%6d", *p++);
printf("\n");
}
}
int foo[5][10];
...
dump_array(foo[0], 5, 10);
and nothing in the published Rationale suggests that the authors had any intention of forbidding such constructs nor breaking code that used them. Indeed, the primary benefit of requiring that rows of an array be placed consecutively, even when adding padding would improve efficiency, is to allow such code to function.
At the time the Standard was written, when generating code for a function that received a pointer, compilers would treat the pointer as though it might identify some arbitrary part of some arbitrary larger object, without making any effort to know or care about what that enclosing object might be. They would thus, as a very popular form of "conforming language extension", support constructs like dump_array without regard for whether the Standard required them to do so, and consequently the authors of the Standard saw no reason to worry about when the Standard mandated such support. Instead, they left such matters as a Quality of Implementation issue over which the Standard could waive jurisdiction.
Unfortunately, because the authors of the Standard expected that compilers would treat the act of passing a pointer to a function as implicitly "laundering" it, the authors of the Standard saw no need to define any explicit method for laundering information about a pointer's enclosing objects in cases where it would be necessary for a function to treat a pointer identifying "raw" storage. Such distinctions didn't matter given the state of compiler technology in the 1980s, but may be quite relevant if e.g. code does something like:
int matrix[10][10];
void test2(int c)
{
matrix[4][0] = 1;
dump_array(matrix[0], 1, c);
matrix[4][0] = 2;
}
or
void test3(int r)
{
matrix[4][0] = 1;
dump_array((int*)matrix, r, 10);
matrix[4][0] = 2;
}
Depending upon what the functions is intending to do, having a compiler optimize out the first write to matrix[4][0] in one or both may improve efficiency, or it may cause the generated code to behave uselessly. Treating explicit pointer conversions as erasing type information, but treating array-to-pointer decay as retaining it, would allow programmers to achieve required semantics if they write code as in the second example, while allowing compilers to perform the relevant optimizations when source code is written as in the first example. Unfortunately, the Standard makes no distinctions, and maintainers of free compilers are loath to forego any "optimizations" they view the Standard as giving them, leaving the language with nothing but "hope for the best" semantics except on implementations that either refrain from cross-procedural optimizations or document what needs to be done to block them.
Consider below code snippet :
int *p;
/* Lets say p points to address 100
and sizeof(int) is 4 bytes. */
int *q = p+1;
unsigned long r = q-p;
/* r results in 1, hence for r = q-p
something is happening similar to r=(104-100)/4 */
Is there a real division by sizeof(datatype) going on during runtime when two pointers of same type are subtracted, or there is some other mechanism through which pointer subtraction works.
The C standard states the following regarding pointer subtraction (section 6.5.6p9):
When two pointers are subtracted, both shall point to elements of the
same array object, or one past the last element of the array
object; the result is the difference of the subscripts of the
two array elements. The size of the result is
implementation-defined, and its type (a signed integer type) is
ptrdiff_t defined in the header. If the result is not
representable in an object of that type, the behavior is
undefined. In other words, if the expressions P and Q point to,
respectively, the i
-th and j
-th elements of an array object, the expression (P)-(Q) has the value i−j provided the value fits in an object of type ptrdiff_t . Moreover,
if the expression P points either to an element of an array object or
one past the last element of an array object, and the expression Q
points to the last element of the same array object, the expression
((Q)+1)-(P) has the same value as ((Q)-(P))+1 and as
-((P)-((Q)+1)) , and has the value zero if the expression P points one past the last element of the array object, even
though the expression (Q)+1 does not point to an element of the array
object. 106)
Footnote 106 states:
Another way to approach pointer arithmetic is first to convert the
pointer(s) to character pointer(s): In this scheme the integer
expression added to or subtracted from the converted pointer is first
multiplied by the size of the object originally pointed to,
and the resulting pointer is converted back to the original
type. For pointer subtraction, the result of the difference
between the character pointers is similarly divided by the size of
the object originally pointed to. When viewed in this way, an
implementation need only provide one extra byte (which may
overlap another object in the program) just after the end of the
object in order to satisfy the "one past the last element"
requirements.
So the footnote states that pointer subtraction may be implemented by subtracting the raw pointer values and dividing by the size of the pointed-to object. It doesn't have to be implemented this way, however.
Note also that the standard requires that pointer subtraction is performed between pointers pointing to elements of the same array object (or one element past the end). If they don't then the behavior is undefined. In practice, if you're working on a system with a flat memory model you'll probably still get the "expected" values but you can't depend on that.
See #dbush answer for the explanation on how pointer substraction works.
If, instead, you are programming something low-level, say a kernel, driver, debugger or similar and you need to have actual subtraction of addresses, cast the pointers to char *:
(char *)q - (char *)p
The result will be of ptrdiff_t type, an implementation defined signed integer.
Of course, this is not defined/portable C, but will work on most architectures/environments.
If I incrementing NULL pointer in C, then What happens?
#include <stdio.h>
typedef struct
{
int x;
int y;
int z;
}st;
int main(void)
{
st *ptr = NULL;
ptr++; //Incrementing null pointer
printf("%d\n", (int)ptr);
return 0;
}
Output:
12
Is it undefined behavior? If No, then Why?
The behaviour is always undefined. You can never own the memory at NULL.
Pointer arithmetic is only valid within arrays, and you can set a pointer to an index of the array or one location beyond the final element. Note I'm talking about setting a pointer here, not dereferencing it.
You can also set a pointer to a scalar and one past that scalar.
You can't use pointer arithmetic to traverse other memory that you own.
Yes, it causes undefined behavior.
Any operator needs a "valid" operand, a NULL is not one for the post increment operator.
Quoting C11, chapter §6.5.2.4
The result of the postfix ++ operator is the value of the operand. As a side effect, the
value of the operand object is incremented (that is, the value 1 of the appropriate type is
added to it). [....]
and related to additive operators, §6.5.6
For addition, either both operands shall have arithmetic type, or one operand shall be a
pointer to a complete object type and the other shall have integer type. (Incrementing is
equivalent to adding 1.)
then, P7,
[...] a pointer to an object that is not an element of an
array behaves the same as a pointer to the first element of an array of length one with the
type of the object as its element type.
and, P8,
If the pointer operand points to an element of
an array object, and the array is large enough, the result points to an element offset from
the original element such that the difference of the subscripts of the resulting and original
array elements equals the integer expression. In other words, if the expression P points to
the i-th element of an array object, the expressions (P)+N (equivalently, N+(P)) and
(P)-N (where N has the value n) point to, respectively, the i+n-th and i−n-th elements of
the array object, provided they exist. [....] If both the pointer
operand and the result point to elements of the same array object, or one past the last
element of the array object, the evaluation shall not produce an overflow; otherwise, the
behavior is undefined.
I think ptr will point to the second array member (as if there were) of struct st. Thats what ptr++ does. Initially pointer was at 0 or NULL. Now it is at 12 (3 * sizeof(int) = 3*4 = 12).
In your example you didn't dereferenced the pointer just printed out the address it points to. When you step a pointer, it will be incremented whith the size of it's reference type. Just try:
printf("Test: %lu", sizeof(st));
And you will get Test: 12 as output. If you would dereference it, like *ptr, it will cause an undefined behavior.
Looks like GCC with some optimization thinks two pointers from different translation units can never be same even if they are actually the same.
Code:
main.c
#include <stdint.h>
#include <stdio.h>
int a __attribute__((section("test")));
extern int b;
void check(int cond) { puts(cond ? "TRUE" : "FALSE"); }
int main() {
int * p = &a + 1;
check(
(p == &b)
==
((uintptr_t)p == (uintptr_t)&b)
);
check(p == &b);
check((uintptr_t)p == (uintptr_t)&b);
return 0;
}
b.c
int b __attribute__((section("test")));
If I compile it with -O0, it prints
TRUE
TRUE
TRUE
But with -O1
FALSE
FALSE
TRUE
So p and &b are actually the same value, but the compiler optimized out their comparison assuming they can never be equal.
I can't figure out, which optimization made this.
It doesn't look like strict aliasing, because pointers are of one type, and -fstrict-aliasing option doesn't make this effect.
Is this the documented behavour? Or is this a bug?
There are three aspects in your code which result in general problems:
Conversion of a pointer to an integer is implementation defined. There is no guarantee conversion of two pointers to have all bits identical.
uintptr_t is guaranteed to convert from a pointer to the same type then back unchanged (i.e. compare equal to the original pointer). But nothing more. The integer values themselves are not guaranteed to compare equal. E.g. there could be unused bits with arbitrary value. See the standard, 7.20.1.4.
And (briefly) two pointers can only compare equal if they point into the same array or right behind it (last entry plus one) or at least one is a null pointer. For any other constellation, they compare unequal. For the exact details, see the standard, 6.5.9p6.
Finally, there is no guarantee how variables are placed in memory by the toolchain (typically the linker for static variables, the compiler for automatic variables). Only an array or a struct (i.e. composite types) guarantee the ordering of its elements.
For your example, 6.5.9p7 also applies. It basically treats a pointer to a non-array object for comparision like on to the first entry of an array of size 1. This does not cover an incremented pointer past the object like &a + 1. Relevant is the object the pointer is based on. That is object a for pointer p and b for pointer &b. The rest can be found in paragraph 6.
None of your variables is an array (last part of paragraph 6), so the pointers need not compare equal, even for &a + 1 == &b. The last "TRUE" might arise from gcc assuming the uintptr_t comparison returning true.
gcc is known to agressively optimise while strictly following the standard. Other compilers are more conservative, but that results in less optimised code. Please don't try "solving" this by disabling optimisation or other hacks, but fix it using well-defined behaviour. It is a bug in the code.
p == &b is a pointer comparison and is subject to the following rules from the C Standard (6.5.9 Equality operators, point 4):
Two pointers compare equal if and only if both are null pointers, both are pointers to the same object (including a pointer to an object and a subobject at its beginning) or function, both are pointers to one past the last element of the same array object, or one is a pointer to one past the end of one array object and the other is a pointer to the start of a different array object that happens to immediately follow the first array object in the address space.
(uintptr_t)p == (uintptr_t)&b is an arithmetic comparison and is subject to the following rules (6.5.9 Equality operators, point 6):
If both of the operands have arithmetic type, the usual arithmetic conversions are performed. Values of complex types are equal if and only if both their real parts are equal and also their imaginary parts are equal. Any two values of arithmetic types from different type domains are equal if and only if the results of their conversions to the (complex) result type determined by the usual arithmetic conversions are equal.
These two excerpts require very different things from the implementation. And it is clear that the C specification places no requirement on an implementation to mimic the behavior of the former kind of comparison in cases where the latter kind is invoked and vice versa. The implementation is only required to follow this rule (7.18.1.4 Integer types capable of holding object pointers in C99 or 7.20.1.4 in C11):
The [uintptr_t] type designates an unsigned integer type with the property that any valid pointer to void can be converted to this type, then converted back to pointer to void, and the result will compare equal to the original pointer.
(Addendum: The above quote isn't applicable in this case, because the conversion from int* to uintptr_t does not involve void* as an intermediate step. See Hadi's answer for an explanation and citation on this. Still, the conversion in question is implementation-defined and the two comparisons you are attempting are not required to exhibit the same behavior, which is the main takeaway here.)
As an example of the difference, consider two pointers that point at the same address of two different address spaces. Comparing them as pointers shouldn't return true, but comparing them as unsigned integers might.
&a + 1 is an integer added to a pointer, which is subject to the following rules (6.5.6 Additive operators, point 8):
When an expression that has integer type is added to or subtracted from a pointer, the
result has the type of the pointer operand. If the pointer operand points to an element of
an array object, and the array is large enough, the result points to an element offset from
the original element such that the difference of the subscripts of the resulting and original
array elements equals the integer expression. In other words, if the expression P points to
the i-th element of an array object, the expressions (P)+N (equivalently, N+(P)) and
(P)-N (where N has the value n) point to, respectively, the i+n-th and i−n-th elements of
the array object, provided they exist. Moreover, if the expression P points to the last
element of an array object, the expression (P)+1 points one past the last element of the
array object, and if the expression Q points one past the last element of an array object,
the expression (Q)-1 points to the last element of the array object. If both the pointer
operand and the result point to elements of the same array object, or one past the last
element of the array object, the evaluation shall not produce an overflow; otherwise, the
behavior is undefined. If the result points one past the last element of the array object, it
shall not be used as the operand of a unary * operator that is evaluated.
I believe that this excerpt shows that pointer addition (and subtraction) is defined only for pointers within the same array object or one past the last element. And because (1) a is not an array and (2) a and b aren't members of the same array object, it seems to me that your pointer math operation invokes undefined behavior and your compiler takes advantage of it to assume that the pointer comparison returns false. Again as pointed out in Hadi's answer (and in contrast to what my original answer assumed at this point), pointers to non-array objects can be considered pointers to array objects of length one, and thus adding one to your pointer to the scalar does qualify as pointing to one past the end of the array.
Therefore your case seems to fall under the last part of the first excerpt mentioned in this answer, making your comparison well-defined to evaluate to true if and only if the two variables are linked in sequence and in ascending order. Whether this is true for your program is left unspecified by the standard and it's up to the implementation.
While one of the answers has already been accepted, the accepted answer (and all other answers for that matter) are critically wrong as I'll explain and then answer the question. I'll be quoting from the same C standard, namely n1570.
Let's start with &a + 1. In contrast to what #Theodoros and #Peter has stated, this expression has defined behavior. To see this, consider section 6.5.6 paragraph 7 "Additive operators" which states:
For the purposes of these operators, a pointer to an object that is
not an element of an array behaves the same as a pointer to the first
element of an array of length one with the type of the object as its
element type.
and paragraph 8 (in particular, the emphasized part):
When an expression that has integer type is added to or subtracted
from a pointer, the result has the type of the pointer operand. If the
pointer operand points to an element of an array object, and the array
is large enough, the result points to an element offset from the
original element such that the difference of the subscripts of the
resulting and original array elements equals the integer expression.
In other words, if the expression P points to the i-th element of an
array object, the expressions (P)+N (equivalently, N+(P)) and (P)-N
(where N has the value n) point to, respectively, the i+n-th and
i−n-th elements of the array object, provided they exist. Moreover, if
the expression P points to the last element of an array object, the
expression (P)+1 points one past the last element of the array object,
and if the expression Q points one past the last element of an array
object, the expression (Q)-1 points to the last element of the array
object. If both the pointer operand and the result point to elements
of the same array object, or one past the last element of the array
object, the evaluation shall not produce an overflow; otherwise, the
behavior is undefined. If the result points one past the last element
of the array object, it shall not be used as the operand of a unary *
operator that is evaluated.
The expression (uintptr_t)p == (uintptr_t)&b has two parts. The conversion from a pointer to an uintptr_t is NOT defined by section 7.20.1.4 (in contrast to what #Olaf and #Theodoros have said):
The following type designates an unsigned integer type with the
property that any valid pointer to void can be converted to this type,
then converted back to pointer to void, and the result will compare
equal to the original pointer:
uintptr_t
It's important to recognize that this rule applies only to valid pointers to void. However, in this case, we have a valid pointer to int. A relevant paragraph can be found in section 6.3.2.3 paragraph 1:
A pointer to void may be converted to or from a pointer to any object
type. A pointer to any object type may be converted to a pointer to
void and back again; the result shall compare equal to the original
pointer.
This means that (uintptr_t)(void*)p is allowed according to this paragraph and 7.20.1.4. But (uintptr_t)p and (uintptr_t)&b are ruled by section 6.3.2.3 paragraph 6:
Any pointer type may be converted to an integer type. Except as
previously specified, the result is implementation-defined. If the
result cannot be represented in the integer type, the behavior is
undefined. The result need not be in the range of values of any
integer type.
Note that uintptr_t is an integer type as stated in section 7.20.1.4 mentioned above and therefore this rule applies.
The second part of (uintptr_t)p == (uintptr_t)&b is comparing for equality. As previously discussed, since the result of conversion is implementation-defined, the result of equality is also implementation defined. This applies irrespective of whether the pointers themselves are equal or not.
Now I'll discuss p == &b. The third point in #Olaf's answer is wrong and #Theodoros's answer is incomplete regarding this expression. Section 6.5.9 "Equality operators" paragraph 7:
For the purposes of these operators, a pointer to an object that is
not an element of an array behaves the same as a pointer to the first
element of an array of length one with the type of the object as its
element type.
and paragraph 6:
Two pointers compare equal if and only if both are null pointers,
both are pointers to the same object (including a pointer to an object
and a subobject at its beginning) or function, both are pointers to
one past the last element of the same array object, or one is a
pointer to one past the end of one array object and the other is a
pointer to the start of a different array object that happens to
immediately follow the first array object in the address space.)
In contrast what #Olaf have said, comparing pointers using the == operator never results in undefined behavior (which may occur only when using relational operators such as <= according to section 6.5.8 paragraph 5 which I'll omit here for brevity). Now since p points to the next int relative to a, it will be equal to &b only when the linker has placed b in that location in the binary. Otherwise, there are unequal. So this is implementation-dependent (the relative order of a and b is unspecified by the standard). Since the declarations of a and b use a language extension, namely __attribute__((section("test"))), the relative locations is indeed implementation-dependent by J.5 and 3.4.2 (omitted for brevity).
We conclude that the results of check(p == &b) and check((uintptr_t)p == (uintptr_t)&b) are implementation-dependent. So the answer depends on which version of which compiler you are using. I'm using gcc 4.8 and by compiling with default options except for the level of optimization, the output I get in both -O0 and -O1 cases is all TRUE.
According to C11 6.5.9/6 and C11 6.5.9/7, the test p == &b must give 1 if a and b are adjacent in the address space.
Your example shows that GCC appears to not fulfill this requirement of the Standard.
Update 26/Apr/2016: My original answer contained suggestions about modifying the code to remove other potential sources of UB and isolate this one condition.
However, it's since come to light that the issues raised by this thread are under review - N2012.
One of their recommendations is that p == &b should be unspecified, and they acknowledge that GCC does in fact not implement the ISO C11 requirement.
So I have the remaining text from my answer, as it is no longer necessary to prove a "compiler bug", since the non-conformance (whether you want to call it a bug or not) has been established.
Re-reading your program I see that you are (understandably) baffled by the fact that in the optimized version
p == &b
is false, while
(uintptr_t)p == (uintptr_t)&b;
is true. The last line indicates that the numerical values are indeed identical; how can p == &b then be false??
I must admit that I have no idea. I am convinced that it is a gcc bug.
After a discussion with M.M I think I can make the following case if the conversion to uintptr_t goes through an intermediate void pointer (you should include that in your program and see whether it changes anything):
Because both steps in the conversion chain int* -> void* -> uintptr_t are guaranteed to be reversible, unequal int pointers can logically not result in equal uintptr_t values.1 (Those equal uintptr_t values would have to convert back to equal int pointers, altering at least one of them and thus violating the value-preserving conversion rule.) In code (I'm not aiming for equality here, just demonstrating the conversions and comparisons):
int a,b, *ap=&a, *bp = &b;
assert(ap != bp);
void *avp = ap, *bvp bp;
uintptr_t ua = (uintptr_t)avp, ub = (uintptr_t)bvp;
// Now the following holds:
// if ap != bp then *necessarily* ua != ub.
// This is violated by the OP's case (sans the void* step).
assert((int *)(void *)ua == (int*)(void*)ub);
1This assumes that the uintptr_t doesn't carry hidden information in the form of padding bits which are not evaluated in an arithmetic comparison but possibly in a type conversion. One can check that through CHAR_BIT, UINTPTR_MAX, sizeof(uintptr_t) and some bit fiddling.—
For a similar reason it's conceivable that two uintptr_t values compare different but convert back to the same pointer (namely if there are bits in uintptr_t not used for storing a pointer value, and the conversion does not zero them). But that is the opposite of the OP's problem.
I know that the unary operator ++ adds one to a number. However, I find that if I do it on an int pointer, it increments by 4 (the sizeof an int on my system). Why does it do this? For example, the following code:
int main(void)
{
int *a = malloc(5 * sizeof(int));
a[0] = 42;
a[1] = 42;
a[2] = 42;
a[3] = 42;
a[4] = 42;
printf("%p\n", a);
printf("%p\n", ++a);
printf("%p\n", ++a);
return 0;
}
will return three numbers with a difference of 4 between each.
It's just the way C is - the full explanation is in the spec, Section 6.5.6 Additive operators, paragraph 8:
When an expression that has integer type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the pointer operand points to an element of an array object, and the array is large enough, the result points to an element offset from the original element such that the difference of the subscripts of the resulting and original array elements equals the integer expression. In other words, if the expression P points to the i-th element of an array object, the expressions (P)+N (equivalently, N+(P)) and (P)-N (where N has the value n) point to, respectively, the i+n-th and i−n-th elements of the array object, provided they exist. Moreover, if the expression P points to the last element of an array object, the expression (P)+1 points one past the last element of the array object, and if the expression Q points one past the last element of an array object, the expression (Q)-1 points to the last element of the array object. If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined. If the result points one past the last element of the array object, it shall not be used as the operand of a unary * operator that is evaluated.
To relate that to your use of the prefix ++ operator, you need to also read Section 6.5.3.1 Prefix increment and decrement operators, paragraph 2:
The value of the operand of the prefix ++ operator is incremented. The result is the new value of the operand after incrementation. The expression ++E is equivalent to (E+=1).
And also Section 6.5.16.2 Compound assignment, paragraph 3:
A compound assignment of the form E1 op= E2 differs from the simple assignment expression E1 = E1 op (E2) only in that the lvalue E1 is evaluated only once.
It's incrementing the pointer location by the size of int, the declared type of the pointer.
Remember, an int * is just a pointer to a location in memory, where you are saying an "int" is stored. When you ++ to the pointer, it shifts it one location (by the size of the type), in this case, it will make your value "4" higher, since sizeof(int)==4.
The reason for this is to make the following statement true:
*(ptr + n) == ptr[n]
These can be used interchangeably.
In pointer arithmetic, adding one to a pointer will add the sizeof the type which it points to.
so for a given:
TYPE * p;
Adding to p will actually increment by sizeof(TYPE). In this case the size of the int is 4.
See this related question
Because in "C" pointer arithmetic is always scaled by the size of the object being pointed to. If you think about it a bit, it turns out to be "the right thing to do".
It does this so that you don't start accessing an integer in the middle of it.
Because a pointer is not a reference ;). It's not a value, it's just an address in memory. When you check the pointer's value, it will be a number, possibly big, and unrelated to the actual value that's stored at that memory position. Say, printf("%p\n", a); prints "2000000" - this means your pointer points to the 2000000th byte in your machine's memory. It's pretty much unaware of what value it's stored there.
Now, the pointer knows what type it points to. An integer, in your case. Since an integer is 4 bytes long, when you want to jump to the next "cell" the pointer points to, it needs to be 2000004. That's exatly 1 integer farther, so a++ makes perfect sense.
BTW, if you want to get 42 (from your example), print out the value pointed to: printf("%d\n", *a);
I hope this makes sense ;)
Thats simple, cause when it comes down to pointer, in your case an integer pointer, a unary increment means INCREMENT THE MEMORY LOCATION BY ONE UNIT, where ONE UNIT = SIZE OF INTEGER .
This size of integer depends from compile to compiler, for a 32-bit and 16-bit it is 4bytes, while for a 64-bit compiler it is 8bytes.
Try doing the same program with character datatype, it will give difference of 1 byte as character takes 1 byte.
In Short, the difference of 4's that
you've come across is the difference
of SIZE OF ONE INTEGER in memory.
Hope this helped, if it didn't i'll be glad to help just let me know.
"Why does it do this?" Why would you expect it to do anything else? Incrementing a point makes it point to the next item of the type that it's a pointer to.