I have the following code in C:
int arr[] = {1,7,4,2,5,8};
int x = (&(arr[arr[1] - arr[4]]) - arr);
When I run this code, then x = 2.
But if I run this:
int arr[] = {1,7,4,2,5,8};
int a = &(arr[arr[1] - arr[4]]);
int b = arr;
int x = a-b;
Them x = 8.
Why do I get different values?
In your case 8 is equal to 2 * sizeof( int )
In the first code snippet there is used the pointer arithmetic while in the second code snipeet there is used the ordinary arithmetic with integer numbers.
In this expression
&(arr[arr[1] - arr[4]]) - arr
you deal with pointers. Between these two addresses ( &(arr[arr[1] - arr[4]]) and arr ) there are two elements of the array and the first code snippet shows how many elements between these addresses. However the size of the memory that they occupy is equal to `8 and the second code snippet shows that.
Consider a simple example that it would be more clear
int a[2];
sizeof( a ) is equal to 8 that is 2 * sizeof( int ).
while sizeof( a ) / sizeof(int ) is equal to 2. It is the same as the value of the expression
( a + 2 ) - a
int b = arr;
The is an error (more precisely, a constraint violation). Since arr is an array expression, it is, in most contexts including this one, implicitly converted to the address of its first element. Thus the experssion arr, after the implicit conversion, is of type int*.
The initialization of a is invalid for the same reasons.
There is no implicit conversion from int* to int.
Your compiler may implement such an implicit conversion as an extension (it's actually a very old C feature that's no longer defined), but if it's conforming it must at least issue a warning message.
In your first chunk of code, you're subtracting two int* values that are point to locations 2 ints apart. Pointer arithmetic is defined in terms of array indices.
In your second (invalid) chunk of code, if the compiler permits you to store an int* value in an int object, it will probably store the raw memory address. Depending on the addressing scheme used by your system, and depending on the size of int (probably 4 bytes on your system), it's likely that the subtraction effectively computes the difference in bytes between the two pointers.
If you see warnings when you compile this code, you should pay attention to them, and you should definitely mention them when asking about it. If you don't see warnings, you should find out how to encourage your compiler to warn about bad code.
If you really want to compute the distance in bytes between two addresses, you can convert both pointers to char*:
int arr[] = {1,7,4,2,5,8};
int *p1 = &arr[0];
int *p2 = &arr[2];
char *cp1 = (char*)p1;
char *cp2 = (char*)p2;
printf("p2 - p1 = %d\n", (int)(p2 - p1));
printf("cp2 - cp1 = %d\n", (int)(cp2 - cp1));
The first line should print 2; the second should print the value of 2 * sizeof (int).
In the first one, arr is a pointer. In the second one, b is an int.
In the first one, the subtraction results in 2, because the difference is two spots in the array.
In the second one, the subtraction results in 8, because each spot in the array takes up 4 bytes of memory. 4*2 = 8.
When you do arithmetic with pointers, the result is the number of units of the array of type.
When you do arithmetic with integers, the result is the literal number of bytes difference.
Related
So this expression comes out to 4:
int a[] = {1,2,3,4,5}, i= 3, b,c,d;
int *p = &i, *q = a;
char *format = "\n%d\n%d\n%d\n%d\n%d";
printf("%ld",(long unsigned)(q+1) - (long unsigned)q);
I have to explain it in my homework and I have no idea why it's coming out to that value. I see (long unsigned) casting q+1, and then we subtract the value of whatever q is pointing at as a long unsigned and I assumed we would be left with 1. Why is this not the case?
Because q is a pointer the expression q+1 employs pointer arithmetic. This means that q+1 points to one element after q, not one byte after q.
The type of q is int *, meaning it points to an int. The size of an int on your platform is most likely 4 bytes, so adding 1 to a int * actually adds 4 to the raw pointer value so that it points to the next int in the array.
Try printing the parts of the expression and it becomes a bit clearer what is going on.
printf("%p\n",(q+1));
printf("%p\n",q);
printf("%ld\n",(long unsigned)(q+1));
printf("%ld\n",(long unsigned)q);
It becomes more clear that q is a pointer pointing to the zeroth element of a, and q+1 is a pointer pointing to the next element of a. Int's are 4 bytes on my machine (and presumably on your machine), so they are four bytes apart. Casting the pointers to unsigned values has no effect on my machine, so printing out the difference between the two gives a value of 4.
0x7fff70c3d1a4
0x7fff70c3d1a0
140735085269412
140735085269408
It's because sizeof(int) is 4.
This is an esoteric corner of C that is usually best avoided.
(If it doesn't make sense yet, add some temporary variables).
BTW, the printf format string is incorrect. But that's not why it's outputting 4.
#include <stdio.h>
int main() {
int *p = 100;
int *q = 92;
printf("%d\n", p - q); //prints 2
}
Shouldn't the output of above program be 8?
Instead I get 2.
Undefined behavior aside, this is the behavior that you get with pointer arithmetic: when it is legal to subtract pointers, their difference represents the number of data items between the pointers. In case of int which on your system uses four bytes per int, the difference between pointers that are eight-bytes apart is (8 / 4), which works out to 2.
Here is a version that has no undefined behavior:
int data[10];
int *p = &data[2];
int *q = &data[0];
// The difference between two pointers computed as pointer difference
ptrdiff_t pdiff = p - q;
intptr_t ip = (intptr_t)((void*)p);
intptr_t iq = (intptr_t)((void*)q);
// The difference between two pointers computed as integer difference
int idiff = ip - iq;
printf("%td %d\n", pdiff, idiff);
Demo.
This
int *p = 100;
int *q = 92;
is already invalid C. In C you cannot initialize pointers with arbitrary integer values. There's no implicit integer-to-pointer conversion in the language, aside from conversion from null-pointer constant 0. If you need to force a specific integer value into a pointer for some reason, you have to use an explicit cast (e.g. int *p = (int *) 100;).
Even if your code somehow compiles, its behavior in not defined by C language, which means that there's no "should be" answer here.
Your code is undefined behavior.
You cannot simply subtract two "arbitrary" pointers. Quoting C11, chapter ยง6.5.6/P9
When two pointers are subtracted, both shall point to elements of the same array object,
or one past the last element of the array object; the result is the difference of the
subscripts of the two array elements. The size of the result is implementation-defined,
and its type (a signed integer type) is ptrdiff_t defined in the <stddef.h> header. [....]
Also, as mentioned above, if you correctly subtract two pointers, the result would be of type ptrdiff_t and you should use %td to print the result.
That being said, the initialization
int *p = 100;
looks quite wrong itself !! To clarify, it does not store a value of 100 to the memory location pointed by (question: where does it point to?) p. It attempts to sets the pointer variable itself with an integer value of 100 which seems to be a constraint violation in itself.
According to the standard (N1570)
When two pointers are subtracted, both shall point to elements of
the same array object, or one past the last element of the array
object; the result is the difference of the subscripts of the two
array elements.
These are integer pointers, sizeof(int) is 4. Pointer arithmetic is done in units of the size of the thing pointed to. Therefore the "raw" difference in bytes is divided by 4. Also, the result is a ptrdiff_t so %d is unlikely to cut it.
But please note, what you are doing is technically undefined behaviour as Sourav points out. It works in the most common environments almost by accident. However, if p and q point into the same array, the behaviour is defined.
int a[100];
int *p = a + 23;
int *q = a + 25;
printf("0x%" PRIXPTR "\n", (uintptr_t)a); // some number
printf("0x%" PRIXPTR "\n", (uintptr_t)p); // some number + 92
printf("0x%" PRIXPTR "\n", (uintptr_t)q); // some number + 100
printf("%ld\n", q - p); // 2
Why if I subtract from a pointer another pointer (integer pointers) without typecasting the result will be 1 and not 4 bytes (like it is when I typecast to int both pointers). Example :
int a , b , *p , *q;
p = &b;
q = p + 1; // q = &a;
printf("%d",q - p); // The result will be one .
printf("%d",(int)q - (int)p); // The result will be 4(bytes). The memory address of b minus The memory address of a.
According to the C Standard (6.5.6 Additive operators)
9 When two pointers are subtracted, both shall point to elements of
the same array object, or one past the last element of the array
object; the result is the difference of the subscripts of the two
array elements....
If the two pointers pointed to elements of the same array then as it is said in the quote from the Standard
the result is the difference of the subscripts of the two array
elements
That is you would get the number of elements of the array between these two pointers. It is the result of the so-called pointer arithmetic.
If you subtract addresses stored in the pointers as integer values then you will get the number that corresponds to the arithmetic subtract operation.
Why If If I subtract from a pointer another pointer (integer pointers) without typecasting the result will be 1 and not 4 bytes
That's the whole point of the data type that a pointer pointing to. It's probably easier to look at an array context like below. The point is regardless of the underlying data type (here long or double), you can use pointer arithmetic to navigate the array without caring about how exactly the size of its element is. In other words, (pointer + 1) means point the next element regardless of the type.
long l[] = { 10e4, 10e5, 10e6 };
long *pl = l + 1; // point to the 2nd element in the "long" array.
double d[] = { 10e7, 10e8, 10e9 };
double *pd = d + 2; // point to the 3rd element in the "double" array.
Also note in your code:
int a , b , *p , *q;
p = &b;
q = p + 1; // q = &a; <--- NO this is wrong.
The fact that a and b are declared next to each other does not mean that a and b are allocated next to each other in the memory. So q is pointing to the memory address next to that of b - but what is in that address is undefined.
Because the ptrdiff_t from pointer subtraction is calculated relative to the size of the elements pointed to. It's a lot more convenient that way; for one, it tells you how many times you can increment one pointer before you reach the other pointer.
where you have
int a , b , *p , *q;
The compiler can put a and b anywhere. They don't have to even be near each other. Also, when you subtract two int pointers, the result is sized in terms of int, not bytes.
C is not assembly language. So pointers are not just plain integers -- pointers are special guys that know how to point to other things.
It's fundamental to the way pointers and pointer arithmetic work in C that they can point to successive elements of an array. So if we write
int a[10];
int *p1 = &a[4];
int *p2 = &a[3];
then p1 - p2 will be 1. The result is 1 because the "distance" between a[3] and a[4] is one int. The result is 1 because 4 - 3 = 1. The result is not 4 (as you might have thought it would be if you know that ints are 32 bits on your machine) because we're not interesting in doing assembly language programming or working with machine addresses; we're doing higher-level language programming with an array, and we're thinking in those terms.
(But, yes, at the machine address level, the way p2 - p1 is computed is typically as (<raw address value in p2> - <raw address value in p1>) / sizeof(int).)
I get most of pointer arithmetic, until I saw the following:
int x[5];
sizeof(x) // equals 20
sizeof(&x) // equals 4 -- sizeof(int))
So far I give this the semantic meaning of:
pointer to N-element array of T -- in the case of &x
However when doing x+1 we increment with sizeof(int) and when we do &x+1 we increment with sizeof(x).
Is there some underlying logic to this i.e. some equivalences, because this feels very unintuitive.
/edit, thanks to #WhozCraig I came to the conclusion that I made an error:
sizeof(&x) // equals 4 -- sizeof(int))
Should be
sizeof(&x) // equals 8 -- sizeof(int))
Lesson learned: Don't post code you haven't run directly
x is of type int[5], so &x is a pointer to an integer array of five elements, when adding 1 to &x you are incrementing to to next array of 5 elemnts.
Its called typed pointer math (or typed-pointer-arithmetic) and is intuitive when you get one thing engrained in your DNA: Pointer math adjusts addresses based on the type of a pointer that holds said-address.
In your example, what is the type of x? It is an array of int. but what is the type of the expression x ? Hmmm. According to the standard, the expression value of x is the address of the first element of the array, and the type is pointer-to-element-type, in this case, pointer-to-int.
The same standard dictates that for any data var (functions are a little odd) using the & operator results in an address with a type of pointer-to-type, the type being whatever the type of the variable is:
For example, given
int a;
the expression &a results in an address who's type is int *. Similarly,
struct foo { int x,y,z } s;
the expression &s results in an address who's type is struct foo *.
And now, the point of probable confusion, given:
int x[5];
the expression &x results in an address who's type is int (*)[5], i.e. a pointer to an array of five int. This is markedly different than simply x which is, per the standard, evals as an address who's type is a pointer to the underlying array element type
Why does it matter? Because all pointer arithmetic is based on that fundamental type of the expression address. Adjustments therein using typed pointer math are reliant on that fundamental concept.
int x[5];
x + 1
is effectively doing this:
int x[5];
int *p = x;
p + 1 // results is address of next int
Whereas:
&x + 1
is effectively doing this:
int x[5];
int (*p)[5] = &x;
p + 1 // results in address of next int[5]
// (which, not coincidentally, there isn't one)
Regarding the sizeof() differential, once again, those pesky types come home to roost, and in particular difference, it is important to note that sizeof is a compile-time operator; not run-time:
int x[5]
size_t n = sizeof(x);
In the above, sizeof(x) equates to sizeof(type-of x). Since x is int[5] and int is apparently 4 bytes on your system, the result is 20. Similarly,
int x[5];
size_t n = sizeof(*x);
results with sizeof(type-of *x) begin assigned to n. Because *x is of type int, this is synonymous with sizeof(int). The compile-time aspects, incidentally, make the following equally valid, though admittedly it looks a little dangerous at first glance:
int *p = NULL;
size_t n = sizeof(*p);
Just as before, sizeof(type-of *p) equates to sizeof(int)
But what about:
int x[5];
size_t n = sizeof(&x);
Here again, sizeof(&x) equates to sizeof(type-of &x). but we just covered what type &x is; its int (*)[5]. I.e. Its a data pointer type, and as such, its size will be the size of a pointer. On your rig, you apparently have 32bit pointers, since the reported size is 4.
An example of how &x is a pointer type, and that indeed all data pointer types result in a similar size, I close with the following example:
#include <stdio.h>
int main()
{
int x[5];
double y[5];
struct foo { char data[1024]; } z[5];
printf("%zu, %zu, %zu\n", sizeof(x[0]), sizeof(x), sizeof(&x));
printf("%zu, %zu, %zu\n", sizeof(y[0]), sizeof(y), sizeof(&y));
printf("%zu, %zu, %zu\n", sizeof(z[0]), sizeof(z), sizeof(&z));
return 0;
}
Output (Mac OSX 64bit)
4, 20, 8
8, 40, 8
1024, 5120, 8
Note the last value size reports are identical.
You said "I get most of pointer arithmetic, until I saw the following:"
int x[5];
sizeof(x) // equals 20
sizeof(&x) // equals 4 -- sizeof(int))
Investigating the first sizeof...
if ((sizeof(int) == 4) == true) {
then the size of five tightly packed ints is 5 * 4
so the result of (sizeof(int[5]) is 20.
}
However...
if (size(int)) == 4) is true
then when the size of the memory holding the value of another memory address is 4,
ie. when ((sizeof(&(int[5])) == 4) {
it is a cooincidence that memory addresses conveniently fit
into the same amount of memory as an int.
}
}
Don't be fooled, memory addresses have traditionally been the same size as int on some very popular platforms, but if you ever believe that they are the same size, you will prevent your code from running on many platforms.
To further drive the point home
it is true that (sizeof(char[4]) == 4),
but that does not mean that a `char[4]` is a memory address.
Now, in C, the offset operator for memory addresses "knows" the offset based on the type of pointer, char, int, or the implied address size. When you add to a pointer, the addition is translated by the compiler to an operation that looks more like this
addressValue += sizeof(addressType)*offsetCount
where
&x + 1
becomes
x += sizeof(x)*1;
Note that if you really want to have (some very unsafe programming) fun, you can cast your pointer type unsafely and specify offsets that really "don't work" the way they should.
int x[5];
int* xPtr = &x;
char* xAsCharPtr = (char*) xPtr;
printf("%d", xAsCharPtr + 2);
will print out a number comprised of about 1/2 the bits of numbers at x[0] and x[1].
It seems implicit conversion is at play, thanks to the excellent answer in some other pointer arithmetic question, I think it boils down to:
when x is an expression it can be read as &x[0] due to implicit conversion, adding 1 to this expression intuitively makes more sense that we want &x[1]. When doing sizeof(x) the implicit conversion does not occur giving the total size of object x. Arithmetic with &x+1 makes sense also when considering that &x is a pointer to a 5-element array.
The thing that does not become intuitive is sizeof(&x), one would expect it to also be of size x, yet it is the size of an element in the pointed-to array, x.
Consider a struct with two members of integer type. I want to get both members by address. I can successfully get the first, but I'm getting wrong value with the second. I believe that is garbage value. Here's my code:
#include <stdio.h>
typedef struct { int a; int b; } foo_t;
int main(int argc, char **argv)
{
foo_t f;
f.a = 2;
f.b = 4;
int a = ((int)(*(int*) &f));
int b = ((int)(*(((int*)(&f + sizeof(int))))));
printf("%d ..%d\n", a, b);
return 0;
}
I'm getting:
2 ..1
Can someone explain where I've gone wrong?
The offset of the first member must always be zero by the C standard; that's why your first cast works. The offset of the second member, however, may not necessarily be equal to the size of the first member of the structure because of padding.
Moreover, adding a number to a pointer does not add the number of bytes to the address: instead, the size of the thing being pointed to is added. So when you add sizeof(int) to &f, sizeof(int)*sizeof(foo_t) gets added.
You can use offsetof operator if you want to do the math yourself, like this
int b = *((int*)(((char*)&f)+offsetof(foo_t, b)));
The reason this works is that sizeof(char) is always one, as required by the C standard.
Of course you can use &f.b to avoid doing the math manually.
Your problem is &f + sizeof(int). If A is a pointer, and B is an integer, then A + B does not, in C, mean A plus B bytes. Rather, it means A plus B elements, where the size of an element is defined by the pointer type of A. Therefore, if sizeof(int) is 4 on your architecture, then &f + sizeof(int) means "four foo_ts into &f, or 4 * 8 = 32 bytes into &f".
Try ((char *)&f) + sizeof(int) instead.
Or, of course, &f.a and &f.b instead, quite simply. The latter will not only give you handy int pointers anyway and relieve you of all those casts, but also be well-defined and understandable. :)
The expression &f + sizeof(int) adds a number to a pointer of type foo_t*. Pointer arithmetic always assumes the pointer is to an element of an array, and the number is treated as a count of elements.
So if sizeof(int) is 4, &f + sizeof(int) points four foo_t structs past f, or 4*sizeof(foo_t) bytes after &f.
If you must use byte counts, something like this might work:
int b = *(int*)((char*)(&f) + sizeof(int));
... assuming there's no padding between members a and b.
But is there any reason you don't just get the value f.b or the pointer &f.b?
This works:
int b = ((int)(*(int*)((int)&f + sizeof(int))));
You are interpreting &f as a pointer and when you are adding 4 (size of int) to it, it interprets it as adding 4 pointers which is 16 or 32 bytes depending on 32 vs 64 arch. If you cast pointer to int it will properly add 4 bytes to it.
This is an explanation of what is going on. I'm not sure what you are doing, but you most certainly should not be doing it like that. This can get you in trouble with alignment etc. The safe way to figure out offset of a struct element is:
printf("%d\n", &(((foo_t *)NULL)->b));
You can do &f.b, and skip the actual pointer math.
Here is how I would change your int b line:
int b = (int)(*(((int*)(&(f.b)))));
BTW, when I run your program as is, I get 2 ..0 as the output.