Consider a struct with two members of integer type. I want to get both members by address. I can successfully get the first, but I'm getting wrong value with the second. I believe that is garbage value. Here's my code:
#include <stdio.h>
typedef struct { int a; int b; } foo_t;
int main(int argc, char **argv)
{
foo_t f;
f.a = 2;
f.b = 4;
int a = ((int)(*(int*) &f));
int b = ((int)(*(((int*)(&f + sizeof(int))))));
printf("%d ..%d\n", a, b);
return 0;
}
I'm getting:
2 ..1
Can someone explain where I've gone wrong?
The offset of the first member must always be zero by the C standard; that's why your first cast works. The offset of the second member, however, may not necessarily be equal to the size of the first member of the structure because of padding.
Moreover, adding a number to a pointer does not add the number of bytes to the address: instead, the size of the thing being pointed to is added. So when you add sizeof(int) to &f, sizeof(int)*sizeof(foo_t) gets added.
You can use offsetof operator if you want to do the math yourself, like this
int b = *((int*)(((char*)&f)+offsetof(foo_t, b)));
The reason this works is that sizeof(char) is always one, as required by the C standard.
Of course you can use &f.b to avoid doing the math manually.
Your problem is &f + sizeof(int). If A is a pointer, and B is an integer, then A + B does not, in C, mean A plus B bytes. Rather, it means A plus B elements, where the size of an element is defined by the pointer type of A. Therefore, if sizeof(int) is 4 on your architecture, then &f + sizeof(int) means "four foo_ts into &f, or 4 * 8 = 32 bytes into &f".
Try ((char *)&f) + sizeof(int) instead.
Or, of course, &f.a and &f.b instead, quite simply. The latter will not only give you handy int pointers anyway and relieve you of all those casts, but also be well-defined and understandable. :)
The expression &f + sizeof(int) adds a number to a pointer of type foo_t*. Pointer arithmetic always assumes the pointer is to an element of an array, and the number is treated as a count of elements.
So if sizeof(int) is 4, &f + sizeof(int) points four foo_t structs past f, or 4*sizeof(foo_t) bytes after &f.
If you must use byte counts, something like this might work:
int b = *(int*)((char*)(&f) + sizeof(int));
... assuming there's no padding between members a and b.
But is there any reason you don't just get the value f.b or the pointer &f.b?
This works:
int b = ((int)(*(int*)((int)&f + sizeof(int))));
You are interpreting &f as a pointer and when you are adding 4 (size of int) to it, it interprets it as adding 4 pointers which is 16 or 32 bytes depending on 32 vs 64 arch. If you cast pointer to int it will properly add 4 bytes to it.
This is an explanation of what is going on. I'm not sure what you are doing, but you most certainly should not be doing it like that. This can get you in trouble with alignment etc. The safe way to figure out offset of a struct element is:
printf("%d\n", &(((foo_t *)NULL)->b));
You can do &f.b, and skip the actual pointer math.
Here is how I would change your int b line:
int b = (int)(*(((int*)(&(f.b)))));
BTW, when I run your program as is, I get 2 ..0 as the output.
Related
#include<stdio.h>
int main(void){
int *ptr,a,b;
a = ptr;
b = ptr + 1;
printf("the vale of a,b is %x and %x respectively",a,b);
int c,d;
c = 0xff;
d = c + 1;
printf("the value of c d are %x and %x respectively",c,d);
return 0;
}
the out put value is
the vale of a,b is 57550c90 and 57550c94 respectively
the value of c d are ff and 100 respectively%
it turns out the ptr + 1 actually, why it behave this way?
Because pointers are designed to be compatible with arrays:
*(pointer + offset)
is equivalent to
pointer[offset]
So pointer aritmetic doesn't work in terms of bytes, but in terms of sizeof(pointer base type)-bytes sized blocks.
Consider what a pointer is... it's a memory address. Every byte in memory has an address. So, if you have an int that's 4 bytes and its address is 1000, 1001 is actually the 2nd byte of that int and 1002 is the third byte and 1003 is the fourth. Since the size of an int might vary from compiler to compiler, it is imperative that when you increment your pointer you don't get the address of some middle point in the int. So, the job of figuring out how many bytes to skip, based on your data type, is handled for you and you can just use whatever value you get and not worry about it.
As Basile Starynkvitch points out, this amount will vary depending on the sizeof property of the data member pointed to. It's very easy to forget that even though addresses are sequential, the pointers of your objects need to take into account the actual memory space required to house those objects.
Pointer arithmetic is a tricky subject. A pointer addition means passing to some next pointed element. So the address is incremented by the sizeof the pointed element.
Short answer
The address of the pointer will be incremented by sizeof(T) where T is the type pointed to. So for an int, the pointer will be incremented by sizeof(int).
Why?
Well first and foremost, the standard requires it. The reason this behaviour is useful (other than for compatibility with C) is because when you have a data structure which uses contiguous memory, like an array or an std::vector, you can move to the next item in the array by simply adding one to the pointer. If you want to move to the nth item in the container, you just add n.
Being able to write firstAddress + 2 is far simpler than firstAddress + (sizeof(T) * 2), and helps prevent bugs arising from developers assuming sizeof(int) is 4 (it might not be) and writing code like firstAddress + (4 * 2).
In fact, when you say myArray[4], you're saying myArray + 4. This is the reason that arrays indices start at 0; you just add 0 to get the first element (i.e. myArray points to the first element of the array) and n to get the nth.
What if I want to move one byte at a time?
sizeof(char) is guaranteed to be one byte in size, so you can use a char* if you really want to move one byte at a time.
A pointer is used to point to a specific byte of memory marking where an object has been allocated (technically it can point anywhere, but that's how it's used). When you do pointer arithmetic, it operates based on the size of the objects pointed to. In your case, it's a pointer to integers, which have a size of 4 bytes each.
Let consider a pointer p. The expression p+n is like (unsigned char *)p + n * sizeof *p (because sizeof(unsigned char) == 1).
Try this :
#include <stdio.h>
#define N 3
int
main(void)
{
int i;
int *p = &i;
printf("%p\n", (void *)p);
printf("%p\n", (void *)(p + N));
printf("%p\n", (void *)((unsigned char *)p + N * sizeof *p));
return 0;
}
#include<stdio.h>
int main(void){
int *ptr,a,b;
a = ptr;
b = ptr + 1;
printf("the vale of a,b is %x and %x respectively",a,b);
int c,d;
c = 0xff;
d = c + 1;
printf("the value of c d are %x and %x respectively",c,d);
return 0;
}
the out put value is
the vale of a,b is 57550c90 and 57550c94 respectively
the value of c d are ff and 100 respectively%
it turns out the ptr + 1 actually, why it behave this way?
Because pointers are designed to be compatible with arrays:
*(pointer + offset)
is equivalent to
pointer[offset]
So pointer aritmetic doesn't work in terms of bytes, but in terms of sizeof(pointer base type)-bytes sized blocks.
Consider what a pointer is... it's a memory address. Every byte in memory has an address. So, if you have an int that's 4 bytes and its address is 1000, 1001 is actually the 2nd byte of that int and 1002 is the third byte and 1003 is the fourth. Since the size of an int might vary from compiler to compiler, it is imperative that when you increment your pointer you don't get the address of some middle point in the int. So, the job of figuring out how many bytes to skip, based on your data type, is handled for you and you can just use whatever value you get and not worry about it.
As Basile Starynkvitch points out, this amount will vary depending on the sizeof property of the data member pointed to. It's very easy to forget that even though addresses are sequential, the pointers of your objects need to take into account the actual memory space required to house those objects.
Pointer arithmetic is a tricky subject. A pointer addition means passing to some next pointed element. So the address is incremented by the sizeof the pointed element.
Short answer
The address of the pointer will be incremented by sizeof(T) where T is the type pointed to. So for an int, the pointer will be incremented by sizeof(int).
Why?
Well first and foremost, the standard requires it. The reason this behaviour is useful (other than for compatibility with C) is because when you have a data structure which uses contiguous memory, like an array or an std::vector, you can move to the next item in the array by simply adding one to the pointer. If you want to move to the nth item in the container, you just add n.
Being able to write firstAddress + 2 is far simpler than firstAddress + (sizeof(T) * 2), and helps prevent bugs arising from developers assuming sizeof(int) is 4 (it might not be) and writing code like firstAddress + (4 * 2).
In fact, when you say myArray[4], you're saying myArray + 4. This is the reason that arrays indices start at 0; you just add 0 to get the first element (i.e. myArray points to the first element of the array) and n to get the nth.
What if I want to move one byte at a time?
sizeof(char) is guaranteed to be one byte in size, so you can use a char* if you really want to move one byte at a time.
A pointer is used to point to a specific byte of memory marking where an object has been allocated (technically it can point anywhere, but that's how it's used). When you do pointer arithmetic, it operates based on the size of the objects pointed to. In your case, it's a pointer to integers, which have a size of 4 bytes each.
Let consider a pointer p. The expression p+n is like (unsigned char *)p + n * sizeof *p (because sizeof(unsigned char) == 1).
Try this :
#include <stdio.h>
#define N 3
int
main(void)
{
int i;
int *p = &i;
printf("%p\n", (void *)p);
printf("%p\n", (void *)(p + N));
printf("%p\n", (void *)((unsigned char *)p + N * sizeof *p));
return 0;
}
I get most of pointer arithmetic, until I saw the following:
int x[5];
sizeof(x) // equals 20
sizeof(&x) // equals 4 -- sizeof(int))
So far I give this the semantic meaning of:
pointer to N-element array of T -- in the case of &x
However when doing x+1 we increment with sizeof(int) and when we do &x+1 we increment with sizeof(x).
Is there some underlying logic to this i.e. some equivalences, because this feels very unintuitive.
/edit, thanks to #WhozCraig I came to the conclusion that I made an error:
sizeof(&x) // equals 4 -- sizeof(int))
Should be
sizeof(&x) // equals 8 -- sizeof(int))
Lesson learned: Don't post code you haven't run directly
x is of type int[5], so &x is a pointer to an integer array of five elements, when adding 1 to &x you are incrementing to to next array of 5 elemnts.
Its called typed pointer math (or typed-pointer-arithmetic) and is intuitive when you get one thing engrained in your DNA: Pointer math adjusts addresses based on the type of a pointer that holds said-address.
In your example, what is the type of x? It is an array of int. but what is the type of the expression x ? Hmmm. According to the standard, the expression value of x is the address of the first element of the array, and the type is pointer-to-element-type, in this case, pointer-to-int.
The same standard dictates that for any data var (functions are a little odd) using the & operator results in an address with a type of pointer-to-type, the type being whatever the type of the variable is:
For example, given
int a;
the expression &a results in an address who's type is int *. Similarly,
struct foo { int x,y,z } s;
the expression &s results in an address who's type is struct foo *.
And now, the point of probable confusion, given:
int x[5];
the expression &x results in an address who's type is int (*)[5], i.e. a pointer to an array of five int. This is markedly different than simply x which is, per the standard, evals as an address who's type is a pointer to the underlying array element type
Why does it matter? Because all pointer arithmetic is based on that fundamental type of the expression address. Adjustments therein using typed pointer math are reliant on that fundamental concept.
int x[5];
x + 1
is effectively doing this:
int x[5];
int *p = x;
p + 1 // results is address of next int
Whereas:
&x + 1
is effectively doing this:
int x[5];
int (*p)[5] = &x;
p + 1 // results in address of next int[5]
// (which, not coincidentally, there isn't one)
Regarding the sizeof() differential, once again, those pesky types come home to roost, and in particular difference, it is important to note that sizeof is a compile-time operator; not run-time:
int x[5]
size_t n = sizeof(x);
In the above, sizeof(x) equates to sizeof(type-of x). Since x is int[5] and int is apparently 4 bytes on your system, the result is 20. Similarly,
int x[5];
size_t n = sizeof(*x);
results with sizeof(type-of *x) begin assigned to n. Because *x is of type int, this is synonymous with sizeof(int). The compile-time aspects, incidentally, make the following equally valid, though admittedly it looks a little dangerous at first glance:
int *p = NULL;
size_t n = sizeof(*p);
Just as before, sizeof(type-of *p) equates to sizeof(int)
But what about:
int x[5];
size_t n = sizeof(&x);
Here again, sizeof(&x) equates to sizeof(type-of &x). but we just covered what type &x is; its int (*)[5]. I.e. Its a data pointer type, and as such, its size will be the size of a pointer. On your rig, you apparently have 32bit pointers, since the reported size is 4.
An example of how &x is a pointer type, and that indeed all data pointer types result in a similar size, I close with the following example:
#include <stdio.h>
int main()
{
int x[5];
double y[5];
struct foo { char data[1024]; } z[5];
printf("%zu, %zu, %zu\n", sizeof(x[0]), sizeof(x), sizeof(&x));
printf("%zu, %zu, %zu\n", sizeof(y[0]), sizeof(y), sizeof(&y));
printf("%zu, %zu, %zu\n", sizeof(z[0]), sizeof(z), sizeof(&z));
return 0;
}
Output (Mac OSX 64bit)
4, 20, 8
8, 40, 8
1024, 5120, 8
Note the last value size reports are identical.
You said "I get most of pointer arithmetic, until I saw the following:"
int x[5];
sizeof(x) // equals 20
sizeof(&x) // equals 4 -- sizeof(int))
Investigating the first sizeof...
if ((sizeof(int) == 4) == true) {
then the size of five tightly packed ints is 5 * 4
so the result of (sizeof(int[5]) is 20.
}
However...
if (size(int)) == 4) is true
then when the size of the memory holding the value of another memory address is 4,
ie. when ((sizeof(&(int[5])) == 4) {
it is a cooincidence that memory addresses conveniently fit
into the same amount of memory as an int.
}
}
Don't be fooled, memory addresses have traditionally been the same size as int on some very popular platforms, but if you ever believe that they are the same size, you will prevent your code from running on many platforms.
To further drive the point home
it is true that (sizeof(char[4]) == 4),
but that does not mean that a `char[4]` is a memory address.
Now, in C, the offset operator for memory addresses "knows" the offset based on the type of pointer, char, int, or the implied address size. When you add to a pointer, the addition is translated by the compiler to an operation that looks more like this
addressValue += sizeof(addressType)*offsetCount
where
&x + 1
becomes
x += sizeof(x)*1;
Note that if you really want to have (some very unsafe programming) fun, you can cast your pointer type unsafely and specify offsets that really "don't work" the way they should.
int x[5];
int* xPtr = &x;
char* xAsCharPtr = (char*) xPtr;
printf("%d", xAsCharPtr + 2);
will print out a number comprised of about 1/2 the bits of numbers at x[0] and x[1].
It seems implicit conversion is at play, thanks to the excellent answer in some other pointer arithmetic question, I think it boils down to:
when x is an expression it can be read as &x[0] due to implicit conversion, adding 1 to this expression intuitively makes more sense that we want &x[1]. When doing sizeof(x) the implicit conversion does not occur giving the total size of object x. Arithmetic with &x+1 makes sense also when considering that &x is a pointer to a 5-element array.
The thing that does not become intuitive is sizeof(&x), one would expect it to also be of size x, yet it is the size of an element in the pointed-to array, x.
#include<stdio.h>
int main(void){
int *ptr,a,b;
a = ptr;
b = ptr + 1;
printf("the vale of a,b is %x and %x respectively",a,b);
int c,d;
c = 0xff;
d = c + 1;
printf("the value of c d are %x and %x respectively",c,d);
return 0;
}
the out put value is
the vale of a,b is 57550c90 and 57550c94 respectively
the value of c d are ff and 100 respectively%
it turns out the ptr + 1 actually, why it behave this way?
Because pointers are designed to be compatible with arrays:
*(pointer + offset)
is equivalent to
pointer[offset]
So pointer aritmetic doesn't work in terms of bytes, but in terms of sizeof(pointer base type)-bytes sized blocks.
Consider what a pointer is... it's a memory address. Every byte in memory has an address. So, if you have an int that's 4 bytes and its address is 1000, 1001 is actually the 2nd byte of that int and 1002 is the third byte and 1003 is the fourth. Since the size of an int might vary from compiler to compiler, it is imperative that when you increment your pointer you don't get the address of some middle point in the int. So, the job of figuring out how many bytes to skip, based on your data type, is handled for you and you can just use whatever value you get and not worry about it.
As Basile Starynkvitch points out, this amount will vary depending on the sizeof property of the data member pointed to. It's very easy to forget that even though addresses are sequential, the pointers of your objects need to take into account the actual memory space required to house those objects.
Pointer arithmetic is a tricky subject. A pointer addition means passing to some next pointed element. So the address is incremented by the sizeof the pointed element.
Short answer
The address of the pointer will be incremented by sizeof(T) where T is the type pointed to. So for an int, the pointer will be incremented by sizeof(int).
Why?
Well first and foremost, the standard requires it. The reason this behaviour is useful (other than for compatibility with C) is because when you have a data structure which uses contiguous memory, like an array or an std::vector, you can move to the next item in the array by simply adding one to the pointer. If you want to move to the nth item in the container, you just add n.
Being able to write firstAddress + 2 is far simpler than firstAddress + (sizeof(T) * 2), and helps prevent bugs arising from developers assuming sizeof(int) is 4 (it might not be) and writing code like firstAddress + (4 * 2).
In fact, when you say myArray[4], you're saying myArray + 4. This is the reason that arrays indices start at 0; you just add 0 to get the first element (i.e. myArray points to the first element of the array) and n to get the nth.
What if I want to move one byte at a time?
sizeof(char) is guaranteed to be one byte in size, so you can use a char* if you really want to move one byte at a time.
A pointer is used to point to a specific byte of memory marking where an object has been allocated (technically it can point anywhere, but that's how it's used). When you do pointer arithmetic, it operates based on the size of the objects pointed to. In your case, it's a pointer to integers, which have a size of 4 bytes each.
Let consider a pointer p. The expression p+n is like (unsigned char *)p + n * sizeof *p (because sizeof(unsigned char) == 1).
Try this :
#include <stdio.h>
#define N 3
int
main(void)
{
int i;
int *p = &i;
printf("%p\n", (void *)p);
printf("%p\n", (void *)(p + N));
printf("%p\n", (void *)((unsigned char *)p + N * sizeof *p));
return 0;
}
Below is the program to find the size of a structure without using sizeof operator:
struct MyStruct
{
int i;
int j;
};
int main()
{
struct MyStruct *p=0;
int size = ((char*)(p+1))-((char*)p);
printf("\nSIZE : [%d]\nSIZE : [%d]\n", size);
return 0;
}
Why is typecasting to char * required?
If I don't use the char* pointer, the output is 1 - why?
Because pointer arithmetic works in units of the type pointed to. For example:
int* p_num = malloc(10 * sizeof(int));
int* p_num2 = p_num + 5;
Here, p_num2 does not point five bytes beyond p_num, it points five integers beyond p_num. If on your machine an integer is four bytes wide, the address stored in p_num2 will be twenty bytes beyond that stored in p_num. The reason for this is mainly so that pointers can be indexed like arrays. p_num[5] is exactly equivalent to *(p_num + 5), so it wouldn't make sense for pointer arithmetic to always work in bytes, otherwise p_num[5] would give you some data that started in the middle of the second integer, rather than giving you the sixth integer as you would expect.
In order to move a specific number of bytes beyond a pointer, you need to cast the pointer to point to a type that is guaranteed to be exactly 1 byte wide (a char).
Also, you have an error here:
printf("\nSIZE : [%d]\nSIZE : [%d]\n", size);
You have two format specifiers but only one argument after the format string.
If I don't use the char* pointer, the output is 1 - WHY?
Because operator- obeys the same pointer arithmetic rules that operator+ does. You incremented the sizeof(MyStruct) when you added one to the pointer, but without the cast you are dividing the byte difference by sizeof(MyStruct) in the operator- for pointers.
Why not use the built in sizeof() operator?
Because you want the size of your struct in bytes. And pointer arithmetics implicitly uses type sizes.
int* p;
p + 5; // this is implicitly p + 5 * sizeof(int)
By casting to char* you circumvent this behavior.
Pointer arithmetic is defined in terms of the size of the type of the pointer. This is what allows (for example) the equivalence between pointer arithmetic and array subscripting -- *(ptr+n) is equivalent to ptr[n]. When you subtract two pointers, you get the difference as the number of items they're pointing at. The cast to pointer to char means that it tells you the number of chars between those addresses. Since C makes char and byte essentially equivalent (i.e. a byte is the storage necessary for one char) that's also the number of bytes occupied by the first item.