Pointer subtraction confusion - c

When we subtract a pointer from another pointer the difference is not equal to how many bytes they are apart but equal to how many integers (if pointing to integers) they are apart. Why so?

The idea is that you're pointing to blocks of memory
+----+----+----+----+----+----+
| 06 | 07 | 08 | 09 | 10 | 11 | mem
+----+----+----+----+----+----+
| 18 | 24 | 17 | 53 | -7 | 14 | data
+----+----+----+----+----+----+
If you have int* p = &(array[5]) then *p will be 14. Going p=p-3 would make *p be 17.
So if you have int* p = &(array[5]) and int *q = &(array[3]), then p-q should be 2, because the pointers are point to memory that are 2 blocks apart.
When dealing with raw memory (arrays, lists, maps, etc) draw lots of boxes! It really helps!

Because everything in pointer-land is about offsets. When you say:
int array[10];
array[7] = 42;
What you're actually saying in the second line is:
*( &array[0] + 7 ) = 42;
Literally translated as:
* = "what's at"
(
& = "the address of"
array[0] = "the first slot in array"
plus 7
)
set that thing to 42
And if we can add 7 to make the offset point to the right place, we need to be able to have the opposite in place, otherwise we don't have symmetry in our math. If:
&array[0] + 7 == &array[7]
Then, for sanity and symmetry:
&array[7] - &array[0] == 7

So that the answer is the same even on platforms where integers are different lengths.

Say you have an array of 10 integers:
int intArray[10] = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };
Then you take a pointer to intArray:
int *p = intArray;
Then you increment p:
p++;
What you would expect, because p starts at intArray[0], is for the incremented value of p to be intArray[1]. That's why pointer arithmetic works like that. See the code here.

"When you subtract two pointers, as long as they point into the same array, the result is the number of elements separating them"
Check for more here.

This way pointer subtraction behaves is consistent with the behaviour of pointer addition. It means that p1 + (p2 - p1) == p2 (where p1 and p2 are pointers into the same array).
Pointer addition (adding an integer to a pointer) behaves in a similar way: p1 + 1 gives you the address of the next item in the array, rather than the next byte in the array - which would be a fairly useless and unsafe thing to do.
The language could have been designed so that pointers are added and subtracted the same way as integers, but it would have meant writing pointer arithmetic differently, and having to take into account the size of the type pointed to:
p2 = p1 + n * sizeof(*p1) instead of p2 = p1 + n
n = (p2 - p1) / sizeof(*p1) instead of n = p2 - p1
So the result would be code that is longer, and harder to read, and easier to make mistakes in.

When applying arithmetic operations on pointers of a specific type, you always want the resulting pointer to point to a "valid" (meaning the right step size) memory-address relative to the original starting-point. That is a very comfortable way of accessing data in memory independently from the underlying architecture.
If you want to use a different "step-size" you can always cast the pointer to the desired type:
int a = 5;
int* pointer_int = &a;
double* pointer_double = (double*)pointer_int; /* totally useless in that case, but it works */

#fahad Pointer arithmetic goes by the size of the datatype it points.So when ur pointer is of type int you should expect pointer arithmetic in the size of int(4 bytes).Likewise for a char pointer all operations on the pointer will be in terms of 1 byte.

Related

I am new to C and I have a question about pointers

The following lines of code work as you'd expect
#include <stdio.h>
int main(void)
{
int n;
int a[5];
int *p;
a[2] = 1024;
p = &n;
/*
* write your line of code here...
* Remember:
* - you are not allowed to use a
* - you are not allowed to modify p
* - only one statement
* - you are not allowed to code anything else than this line of code
*/
/* ...so that this prints 98\n */
printf("a[2] = %d\n", a[2]);
return (0);
}
This prints out a[2] = 1024
Now I was asked to modify this code so that a[2] = 98 gets printed instead. There were a ton of constraints. I couldn't use the variable a anywhere else in the code again and a couple other things. I found a solution online but I don't understand it at all.
#include <stdio.h>
int main(void)
{
int n;
int a[5];
int *p;
a[2] = 1024;
p = &n;
/*
* write your line of code here...
* Remember:
* - you are not allowed to use a
* - you are not allowed to modify p
* - only one statement
* - you are not allowed to code anything else than this line of code
*/
p[5] = 98;
/* ...so that this prints 98\n */
printf("a[2] = %d\n", a[2]);
return (0);
}
So, setting p[5] = 98; results in a[2] = 98 being printed, which is the intended result. I'm fairly new to C programming and pointers in general but I have absolutely no idea why this works the way it does.
This is all about the layout of the stack, the memory where the local variables in your method are stored. Variables are pushed onto the stack, so first the variable n is pushed to the stack, let us assume that it ends up at address 1000 (this is just a fictional address). Since it is an integer it takes up the space of an integer (4 bytes if integers are 32 bits). Then you push the array a to the stack. It will be located next to n.
Since n was placed at address 1000 and took up 4 bytes of memory, then a will be placed at address 1004 (1000 + 4). a is an array of 5 integers, each taking up 4 bytes. So a takes up the space from 1004 to 1024.
Your variable p is an integer pointer, and you set it to point to n. That means that p points to the address 1000 which is the address of n. You then write p[5] in C that is equivalent to the expression *(p + 5). Which basically means take value of the address p + 5. And since p is an integer pointer and each integer takes up 4 bytes, you are essentially asking for the value of address: 1000 + (5 * 4) = 1020
In the array a you stored the value 1024 at index 2, that corresponds to address: 1004 + 2 * 4 = 1012, so when you print the value of a[2] you are printing the value of address 1012. This means that the value you are setting to 98, is not a[2] but a[4].
The reason why I am mentioning this, is that in my case it did not print 98 but 1024. As people have already mentioned you are working with undefined behavior, and although it might work on some setups, it might not work on all.
I can't tell you the answer to your problem. But I can write a program which might, maybe, find the answer to your problem.
Try running this program:
#include <stdio.h>
int main2(int off)
{
int n;
int a[5];
int *p;
a[2] = 1024;
p = &n;
p[off] = 98;
return a[2];
}
int main()
{
int i;
for(i = -5; i <= 5; i++)
if(main2(i) == 98)
printf("the magic offset is %d\n", i);
}
On my computer, with one of my compilers, today, this program prints
the magic offset is 4
That tells me that (on my computer, with that compiler, today) the "solution" to your ridiculous problem would be
#include <stdio.h>
int main()
{
int n;
int a[5];
int *p;
a[2] = 1024;
p = &n;
p[4] = 98;
printf("a[2] = %d\n", a[2]);
}
I'm not even going to try to explain why this works, because the reasons are so obscure, unrepeatable, and meaningless. (See this question's other good answers for more details.) Basically the first program automates the search for a magic offset from p, more or less as you discovered.
And, in fact, under the first compiler I tried it, it didn't even work. Despite using the magic number 4 that the first program discovered, the second program printed a[2] = 1024. That's not too surprising: the relative positions of variables like a, p, and n are not specified by any standard. They're totally up to the compiler. The compiler is perfectly within its rights to arrange them one way in function main2 in my first program, and a completely different way in function main in my second program.
I tried my first program under a different compiler, and it printed
the magic offset is -3
and then crashed with a segmentation fault. But then, under that compiler, when I changed the relevant line in the second program to
p[-3] = 98;
it "worked", printing a[2] = 98 as required.
(And then I tried turning up the optimization level, and it stopped working.)
To be perfectly clear, the fact that my approach did not work under that first compiler, because it failed to arrange things in a consistent or predictable way, does not mean there's anything wrong with that compiler! Quite the contrary: the fault is entirely in the broken programs I wrote, and the broken assignment of yours that motivated them.
Here is an alternative exercise which will teach you something useful about arrays and pointers, without requiring that you "learn" false, unrepeatable facts about how variables are or aren't guaranteed to be arranged in stack frames.
#include <stdio.h>
int main(void)
{
int a[5];
int *p;
a[2] = 1024;
p = &a[4];
/*
* write your line of code here...
* Remember:
* - you are not allowed to use a
* - you are not allowed to modify p
* - only one statement
* - you are not allowed to code anything else than this line of code
*/
/* ...so that this prints 98\n */
printf("a[2] = %d\n", a[2]);
}
This problem has a similar solution — you can easily work it out — but the solution is unique and guaranteed to work, because it depends on well-defined properties of arrays and pointer arithmetic in C, not on accidental details of the stack layout.
It "works" by accident. It relies on n, p, and a being laid out in memory in a specific order (each box represents 4 bytes):
Address Item
------- --------
+---+
0x8000 n: | | p[0]
+---+
0x8004 p: | | p[1]
+---+
0x8008 | | p[2]
+---+
0x800c a: | | a[0] p[3]
+---+
0x8010 | | a[1] p[4]
+---+
0x8014 | | a[2] p[5]
+---+
0x8018 | | a[3] p[6]
+---+
0x801c | | a[4] p[7]
+---+
Some background:
A pointer is any expression whose value is the location of an object or function in a running program's execution environment - essentially, an address. A variable of pointer type stores an address value. However, pointers have associated type semantics - a pointer to int is a different type than a pointer to double, which is a different type than a pointer to struct foo, which is a different type than a pointer to an array of char, etc.
When you add 1 to a pointer value, the result is a pointer to the next object of the pointed-to type immediately following:
char *cp = &some_char;
short *sp = &some_short;
long *lp = &some_long;
+---+ +---+ +---+
some_char: | | <-- cp some_short: | | <-- sp some_long: | | <-- lp
+---+ | | | |
| | <-- cp + 1 | | | |
+---+ +---+ | |
| | <-- cp + 2 | | <-- sp + 1 | |
+---+ | | | |
| | <-- cp + 3 | | | |
+---+ +---+ +---+
| | <-- cp + 4 | | <-- sp + 2 | | <-- lp + 1
+---+ | | | |
... ... ...
This is exactly how array subscripting works - the array subscript operation a[i] is defined as *(a + i) - given a starting address a, offset i elements (not bytes!) from that address and deference the result. Arrays are not pointers; rather, array expressions "decay" to pointers to their first element under most circumstances. But this means you can use the [] subscript operator on pointer variables as well, so if you set p to point to n with
int *p = &n;
then you can apply the [] operator to p and treat it as though it was an array. So, if p == 0x8000 (the address of n), then p + 5 == 0x8014, which is the address of a[2]. Thus, *(p + 5) == p[5] == a[2] == *(a + 2).
But...
This behavior is undefined - it may work, it may not. It may result in garbled output, it may branch into some random subroutine, it may invoke Rogue. Neither the compiler nor the runtime environment are required to handle it in any particular way - any result is equally correct as far as the language is concerned.
We're pretending n is the first element of an array of int when it really isn't, so we're indexing out of bounds with p[1], p[2], etc. We're assuming objects are laid out in a specific order, but the compiler is under no obligation to lay variables out that way. The compiler may optimize things such that p is stored in a register, rather than on the stack.
This is a horrible way to teach pointers. It's unsafe, it's unportable, it's bad practice, it's confusing, it's an atypical use case, it doesn't explain why we use pointers. Whoever gave you this code shouldn't be teaching anyone how to program in C. If they write C for a living they are a menace.

Related to pointers in C language [duplicate]

When we subtract a pointer from another pointer the difference is not equal to how many bytes they are apart but equal to how many integers (if pointing to integers) they are apart. Why so?
The idea is that you're pointing to blocks of memory
+----+----+----+----+----+----+
| 06 | 07 | 08 | 09 | 10 | 11 | mem
+----+----+----+----+----+----+
| 18 | 24 | 17 | 53 | -7 | 14 | data
+----+----+----+----+----+----+
If you have int* p = &(array[5]) then *p will be 14. Going p=p-3 would make *p be 17.
So if you have int* p = &(array[5]) and int *q = &(array[3]), then p-q should be 2, because the pointers are point to memory that are 2 blocks apart.
When dealing with raw memory (arrays, lists, maps, etc) draw lots of boxes! It really helps!
Because everything in pointer-land is about offsets. When you say:
int array[10];
array[7] = 42;
What you're actually saying in the second line is:
*( &array[0] + 7 ) = 42;
Literally translated as:
* = "what's at"
(
& = "the address of"
array[0] = "the first slot in array"
plus 7
)
set that thing to 42
And if we can add 7 to make the offset point to the right place, we need to be able to have the opposite in place, otherwise we don't have symmetry in our math. If:
&array[0] + 7 == &array[7]
Then, for sanity and symmetry:
&array[7] - &array[0] == 7
So that the answer is the same even on platforms where integers are different lengths.
Say you have an array of 10 integers:
int intArray[10] = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };
Then you take a pointer to intArray:
int *p = intArray;
Then you increment p:
p++;
What you would expect, because p starts at intArray[0], is for the incremented value of p to be intArray[1]. That's why pointer arithmetic works like that. See the code here.
"When you subtract two pointers, as long as they point into the same array, the result is the number of elements separating them"
Check for more here.
This way pointer subtraction behaves is consistent with the behaviour of pointer addition. It means that p1 + (p2 - p1) == p2 (where p1 and p2 are pointers into the same array).
Pointer addition (adding an integer to a pointer) behaves in a similar way: p1 + 1 gives you the address of the next item in the array, rather than the next byte in the array - which would be a fairly useless and unsafe thing to do.
The language could have been designed so that pointers are added and subtracted the same way as integers, but it would have meant writing pointer arithmetic differently, and having to take into account the size of the type pointed to:
p2 = p1 + n * sizeof(*p1) instead of p2 = p1 + n
n = (p2 - p1) / sizeof(*p1) instead of n = p2 - p1
So the result would be code that is longer, and harder to read, and easier to make mistakes in.
When applying arithmetic operations on pointers of a specific type, you always want the resulting pointer to point to a "valid" (meaning the right step size) memory-address relative to the original starting-point. That is a very comfortable way of accessing data in memory independently from the underlying architecture.
If you want to use a different "step-size" you can always cast the pointer to the desired type:
int a = 5;
int* pointer_int = &a;
double* pointer_double = (double*)pointer_int; /* totally useless in that case, but it works */
#fahad Pointer arithmetic goes by the size of the datatype it points.So when ur pointer is of type int you should expect pointer arithmetic in the size of int(4 bytes).Likewise for a char pointer all operations on the pointer will be in terms of 1 byte.

Why does the last line print 2 instead of 8? [duplicate]

When we subtract a pointer from another pointer the difference is not equal to how many bytes they are apart but equal to how many integers (if pointing to integers) they are apart. Why so?
The idea is that you're pointing to blocks of memory
+----+----+----+----+----+----+
| 06 | 07 | 08 | 09 | 10 | 11 | mem
+----+----+----+----+----+----+
| 18 | 24 | 17 | 53 | -7 | 14 | data
+----+----+----+----+----+----+
If you have int* p = &(array[5]) then *p will be 14. Going p=p-3 would make *p be 17.
So if you have int* p = &(array[5]) and int *q = &(array[3]), then p-q should be 2, because the pointers are point to memory that are 2 blocks apart.
When dealing with raw memory (arrays, lists, maps, etc) draw lots of boxes! It really helps!
Because everything in pointer-land is about offsets. When you say:
int array[10];
array[7] = 42;
What you're actually saying in the second line is:
*( &array[0] + 7 ) = 42;
Literally translated as:
* = "what's at"
(
& = "the address of"
array[0] = "the first slot in array"
plus 7
)
set that thing to 42
And if we can add 7 to make the offset point to the right place, we need to be able to have the opposite in place, otherwise we don't have symmetry in our math. If:
&array[0] + 7 == &array[7]
Then, for sanity and symmetry:
&array[7] - &array[0] == 7
So that the answer is the same even on platforms where integers are different lengths.
Say you have an array of 10 integers:
int intArray[10] = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };
Then you take a pointer to intArray:
int *p = intArray;
Then you increment p:
p++;
What you would expect, because p starts at intArray[0], is for the incremented value of p to be intArray[1]. That's why pointer arithmetic works like that. See the code here.
"When you subtract two pointers, as long as they point into the same array, the result is the number of elements separating them"
Check for more here.
This way pointer subtraction behaves is consistent with the behaviour of pointer addition. It means that p1 + (p2 - p1) == p2 (where p1 and p2 are pointers into the same array).
Pointer addition (adding an integer to a pointer) behaves in a similar way: p1 + 1 gives you the address of the next item in the array, rather than the next byte in the array - which would be a fairly useless and unsafe thing to do.
The language could have been designed so that pointers are added and subtracted the same way as integers, but it would have meant writing pointer arithmetic differently, and having to take into account the size of the type pointed to:
p2 = p1 + n * sizeof(*p1) instead of p2 = p1 + n
n = (p2 - p1) / sizeof(*p1) instead of n = p2 - p1
So the result would be code that is longer, and harder to read, and easier to make mistakes in.
When applying arithmetic operations on pointers of a specific type, you always want the resulting pointer to point to a "valid" (meaning the right step size) memory-address relative to the original starting-point. That is a very comfortable way of accessing data in memory independently from the underlying architecture.
If you want to use a different "step-size" you can always cast the pointer to the desired type:
int a = 5;
int* pointer_int = &a;
double* pointer_double = (double*)pointer_int; /* totally useless in that case, but it works */
#fahad Pointer arithmetic goes by the size of the datatype it points.So when ur pointer is of type int you should expect pointer arithmetic in the size of int(4 bytes).Likewise for a char pointer all operations on the pointer will be in terms of 1 byte.

When subtracting two pointers in C

I was playing with pointers in order to fully get the concept and then wanted to subtract two pointers expecting the distance between these two addresses or something, but apparently I was wrong, so here is my code.
int x = 5, y = 7;
int *p = &y;
int *q = &x;
printf("p is %d\nq is %d\np - q is %d", p, q, (p - q));
Why does the program output p - q is 1? Thank you.
It is undefined behavior. According to the standard (N1570):
6.5.6 Additive operators
....
9 When two pointers are subtracted, both shall point to elements of the same array object,
or one past the last element of the array object; the result is the difference of the
subscripts of the two array elements.
Note that when allowed, the result is the subscripts difference. So if pointers point to two sequential elements of the same type, the subtraction gives 1, regardless of the size of the type. (This is perhaps the reason why you get 1 in your concrete case.)
Your particular case is cause for undefined behavior since p and q point to unrelated objects.
You can make sense of p-q only if p and q point to the same array/one past the last element of the same array.
int array[10];
int* p = &array[0];
int* q = &array[5];
ptrdiff_t diff1 = q - p; // Valid. diff1 is 5
ptrdiff_t diff2 = p - q; // Valid. diff2 is -5
q - p is 5 in this case since they point to elements of the array that are 5 elements apart.
Put another way, p+5 is equal to q. If you start from p and step over 5 elements of the array, you will point to the same element of the array that q points to.
As an aside, don't use the format specifier %d for printing pointers. Use %p. Use %td for ptrdiff_t.
printf(" p is %p\n q is %p\n p-q is :%td", p, q, p-q);`
// ^^ ^^
See http://en.cppreference.com/w/c/io/fprintf for the valid format specifiers for the different types.
Pointer arithmetic works like that. It doesn't give you differences between two addresses. Instead it will show difference between two variables as if they are stored in an array. so, no matter if your variables (of same type ) are 4 bytes, 8 bytes or 1 byte, if stored in adjacent memory location their pointer subtraction will always result in 1 or -1.
The subtraction of 2 pointers give the distance in between the 2 variables.
For eg.
//let the address of a is 1000 then of a+1 will be 1004
int a[]={1,2,3};
int *p1=a;
int *p2=a+1;
printf("%u",p2-p1);
The result of this will be 1 not 4.
Same here in your case the the location of x and y are consecutive that is why the ans. is 1.
The formula used by pointer substraction is:
( p2 - p1 ) == ( addr( p2 ) - addr( p1 ) ) / sizeof( T )
with T being the type of both p1 and p2.
int array[10];
int* p1 = array + 2;
int* p2 = array + 5;
int* a = p2 - p1; // == 3
int* b = p1 - p2; // == -3
int a=5,b=6;
int *ptr=&a,*diff;
diff=ptr;
printf("%p\n",diff);
ptr++;ptr++;ptr++;ptr++;ptr++;ptr++;ptr++;ptr++;ptr++;ptr++;
printf("%p\n",ptr);
printf("diff==%td\n",(ptr-diff)*sizeof(int));
(diff-ptr) will provide the distance between two variables, but will not provide the memory gap between two pointers.
Important: only the same type of pointer can be subtracted.
The subtraction of two pointers in array will give the distance between the two elements.
Let the address of first element i.e., is 1000 then address of second element a+1 will be 1004. Hence p1 = 1000 and p2 =1004.
p2-p1 = (1004- 1000) /size of int = (1004-1000)/4 =4/4 =1

Finding the size of the variable without using sizeof() [duplicate]

When we subtract a pointer from another pointer the difference is not equal to how many bytes they are apart but equal to how many integers (if pointing to integers) they are apart. Why so?
The idea is that you're pointing to blocks of memory
+----+----+----+----+----+----+
| 06 | 07 | 08 | 09 | 10 | 11 | mem
+----+----+----+----+----+----+
| 18 | 24 | 17 | 53 | -7 | 14 | data
+----+----+----+----+----+----+
If you have int* p = &(array[5]) then *p will be 14. Going p=p-3 would make *p be 17.
So if you have int* p = &(array[5]) and int *q = &(array[3]), then p-q should be 2, because the pointers are point to memory that are 2 blocks apart.
When dealing with raw memory (arrays, lists, maps, etc) draw lots of boxes! It really helps!
Because everything in pointer-land is about offsets. When you say:
int array[10];
array[7] = 42;
What you're actually saying in the second line is:
*( &array[0] + 7 ) = 42;
Literally translated as:
* = "what's at"
(
& = "the address of"
array[0] = "the first slot in array"
plus 7
)
set that thing to 42
And if we can add 7 to make the offset point to the right place, we need to be able to have the opposite in place, otherwise we don't have symmetry in our math. If:
&array[0] + 7 == &array[7]
Then, for sanity and symmetry:
&array[7] - &array[0] == 7
So that the answer is the same even on platforms where integers are different lengths.
Say you have an array of 10 integers:
int intArray[10] = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };
Then you take a pointer to intArray:
int *p = intArray;
Then you increment p:
p++;
What you would expect, because p starts at intArray[0], is for the incremented value of p to be intArray[1]. That's why pointer arithmetic works like that. See the code here.
"When you subtract two pointers, as long as they point into the same array, the result is the number of elements separating them"
Check for more here.
This way pointer subtraction behaves is consistent with the behaviour of pointer addition. It means that p1 + (p2 - p1) == p2 (where p1 and p2 are pointers into the same array).
Pointer addition (adding an integer to a pointer) behaves in a similar way: p1 + 1 gives you the address of the next item in the array, rather than the next byte in the array - which would be a fairly useless and unsafe thing to do.
The language could have been designed so that pointers are added and subtracted the same way as integers, but it would have meant writing pointer arithmetic differently, and having to take into account the size of the type pointed to:
p2 = p1 + n * sizeof(*p1) instead of p2 = p1 + n
n = (p2 - p1) / sizeof(*p1) instead of n = p2 - p1
So the result would be code that is longer, and harder to read, and easier to make mistakes in.
When applying arithmetic operations on pointers of a specific type, you always want the resulting pointer to point to a "valid" (meaning the right step size) memory-address relative to the original starting-point. That is a very comfortable way of accessing data in memory independently from the underlying architecture.
If you want to use a different "step-size" you can always cast the pointer to the desired type:
int a = 5;
int* pointer_int = &a;
double* pointer_double = (double*)pointer_int; /* totally useless in that case, but it works */
#fahad Pointer arithmetic goes by the size of the datatype it points.So when ur pointer is of type int you should expect pointer arithmetic in the size of int(4 bytes).Likewise for a char pointer all operations on the pointer will be in terms of 1 byte.

Resources