Pointers can only move in discrete steps.
int *p;
p = malloc(sizeof(int)*8);
Therefore, formally *(p+2) is calculated as *(p+2*sizeof(int)).
However If I actually code the above two, I get different results, which seems understandable.
*p = 123;
*(p+2) = 456;
printf("%d\n",*(p+2*(sizeof(int)))); \\0
printf("%d\n",*(p+2)); \\456
The question is, is this calculation implicit, done by the compiler at compile time?
The question is, is this calculation implicit, done by the compiler at
compile time?
Yes this is implicit, when you write ptr+n it actually advances forward n times as many bytes as size of pointee type (e.g. in case of int* - this is 4 bytes granted integer takes four bytes on your computer).
e.g.
int *x = malloc(4 * sizeof(int)); // say x points at 0x1000
x++; // x now points at 0x1004 if size of int is 4
You can read more on pointer arithmetic.
Therefore, formally *(p+2) is calculated as *(p+2*sizeof(int)).
No, *(p+2) is calculated as *(int*)((char*)p+2*sizeof(int)).
Even a brief look reveals that the only way for your statement to hold is if sizeof(int) == 1.
Related
When we subtract a pointer from another pointer the difference is not equal to how many bytes they are apart but equal to how many integers (if pointing to integers) they are apart. Why so?
The idea is that you're pointing to blocks of memory
+----+----+----+----+----+----+
| 06 | 07 | 08 | 09 | 10 | 11 | mem
+----+----+----+----+----+----+
| 18 | 24 | 17 | 53 | -7 | 14 | data
+----+----+----+----+----+----+
If you have int* p = &(array[5]) then *p will be 14. Going p=p-3 would make *p be 17.
So if you have int* p = &(array[5]) and int *q = &(array[3]), then p-q should be 2, because the pointers are point to memory that are 2 blocks apart.
When dealing with raw memory (arrays, lists, maps, etc) draw lots of boxes! It really helps!
Because everything in pointer-land is about offsets. When you say:
int array[10];
array[7] = 42;
What you're actually saying in the second line is:
*( &array[0] + 7 ) = 42;
Literally translated as:
* = "what's at"
(
& = "the address of"
array[0] = "the first slot in array"
plus 7
)
set that thing to 42
And if we can add 7 to make the offset point to the right place, we need to be able to have the opposite in place, otherwise we don't have symmetry in our math. If:
&array[0] + 7 == &array[7]
Then, for sanity and symmetry:
&array[7] - &array[0] == 7
So that the answer is the same even on platforms where integers are different lengths.
Say you have an array of 10 integers:
int intArray[10] = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };
Then you take a pointer to intArray:
int *p = intArray;
Then you increment p:
p++;
What you would expect, because p starts at intArray[0], is for the incremented value of p to be intArray[1]. That's why pointer arithmetic works like that. See the code here.
"When you subtract two pointers, as long as they point into the same array, the result is the number of elements separating them"
Check for more here.
This way pointer subtraction behaves is consistent with the behaviour of pointer addition. It means that p1 + (p2 - p1) == p2 (where p1 and p2 are pointers into the same array).
Pointer addition (adding an integer to a pointer) behaves in a similar way: p1 + 1 gives you the address of the next item in the array, rather than the next byte in the array - which would be a fairly useless and unsafe thing to do.
The language could have been designed so that pointers are added and subtracted the same way as integers, but it would have meant writing pointer arithmetic differently, and having to take into account the size of the type pointed to:
p2 = p1 + n * sizeof(*p1) instead of p2 = p1 + n
n = (p2 - p1) / sizeof(*p1) instead of n = p2 - p1
So the result would be code that is longer, and harder to read, and easier to make mistakes in.
When applying arithmetic operations on pointers of a specific type, you always want the resulting pointer to point to a "valid" (meaning the right step size) memory-address relative to the original starting-point. That is a very comfortable way of accessing data in memory independently from the underlying architecture.
If you want to use a different "step-size" you can always cast the pointer to the desired type:
int a = 5;
int* pointer_int = &a;
double* pointer_double = (double*)pointer_int; /* totally useless in that case, but it works */
#fahad Pointer arithmetic goes by the size of the datatype it points.So when ur pointer is of type int you should expect pointer arithmetic in the size of int(4 bytes).Likewise for a char pointer all operations on the pointer will be in terms of 1 byte.
#include <stdio.h>
int main(void){
int* p = NULL;
int y = 1;
p = &y;
printf("%p\n",p);
*(p+1) = 10;
printf("%p\n",p);
return 0;
}
outputs:
0x7ffe2368f2e4
0x7ffe0000000a
I do not know why p was changed here and the second one has "0000000a" which is 10 in the end, could you help me with that? thank you. I compiled it with gcc in linux.
The two variables y and p are allocated on the stack. Their storage is adjacent to each other, and looks like this:
y: <addr1> <val1> 4 bytes
p: <addr2> <val2> 8 bytes
Note that <addr2> is <addr1> + 4.
For your example, the actual addresses look like this:
y: 0x00007ffe2368f2e4 <value1> 4 bytes
p: 0x00007ffe2368f2e8 <value2> 8 bytes
After y = 1 and p = &y, the memory looks like follows:
y: 0x00007ffe2368f2e4 0x00000001 4 bytes
p: 0x00007ffe2368f2e8 0x00007ffe2368f2e4 8 bytes
p + 1 is value of p + sizeof(int), which is 0x00007ffe2368f2e4 + 4, which is 0x00007ffe2368f2e8, which is the address of p.
*(p + 1) = 10 sets 4 bytes at 0x00007ffe2368f2e8 to 10.
This is overwriting 4 bytes of a 8 byte value at 0x00007ffe2368f2e8, which is 4 bytes of the value of p.
0x7ffe 2368f2e4
0x7ffe 0000000a ------> this is the lower 4 bytes set to 10 i.e. 0xa
What cause the change of address the pointer points to?
TL;DR - undefined behavior.
To elaborate, in your code,
*(p+1) = 10;
invokes undefined behavior as you try to access a a memory out of bound. Kindly note, segmentation fault in only one of the many possible side-effects of UB.
Once your code invokes UB, nothing, absolutely nothing is guaranteed.
Also, FWIW, to print a pointer, you should be casting the pointer to void* before using that as an argument to %p.
As others have said, this statement invokes undefined behavior:
*(p+1) = 10;
Since prior to this statement, p contains the address of y, which is of type int. So the above statement writes to an area of memory which is not y.
As for an explanation of what actually happened, it appears that p appears right after y on the stack. So when you do *(p+1) = 10;, it writes to the sizeof(int) bytes after y on the stack, which happens to be where p lives.
Based on the fact that the initial value of p is 0x7ffe2368f2e4, that tells us that sizeof(int *) is at least 6 bytes, most likely 8 (i.e. 64 bit). With the value changing to 0x7ffe0000000a after the assignment, we see that the 4 low-order bytes of p were modified. This would make sense if sizeof(int) is 4 and you're working in a little-endian architecture. So the assignment ends up setting the first 4 bytes of p (the low order bytes) to the value 10 while leaving the rest (the high order bytes) unchanged.
That's what happened in your particular case. There is no guarantee that this behavior will be consistent across different machines, compilers, or operating systems. In other words, undefined behavior.
This statement
*(p+1) = 10;
changes the pointer because it overwrites the memory occupied by the pointer.
As you can see from the output values
0x7ffe2368f2e4 0x7ffe0000000a
this part 0x0000000a of the value 0x7ffe0000000a exactly equal to 10 that is assigned to the memory at address p + 1
So at the address p + 1 there is the pointer p itself.
If the sizeof( int ) is equal to 4 then 0x7ffe2368f2e4 is the address of y and 0x7ffe2368f2e8 is the address of the memory occupied by pointer p itself that is overwritten.
Another way to think about this is as y being like an array of 1 integer (that is only y[0] exists. The code:
*(a+i) equals a[i]
by definition. So, since:
p = &y
then:
*(p+1) equals y[1]
and accessing that which does not exist is undefined behavior. That means anything can happen, but the two most likely things are (1) you access some other variable (what happened to you) or (2) the program gets a segmentation fault.
When we subtract a pointer from another pointer the difference is not equal to how many bytes they are apart but equal to how many integers (if pointing to integers) they are apart. Why so?
The idea is that you're pointing to blocks of memory
+----+----+----+----+----+----+
| 06 | 07 | 08 | 09 | 10 | 11 | mem
+----+----+----+----+----+----+
| 18 | 24 | 17 | 53 | -7 | 14 | data
+----+----+----+----+----+----+
If you have int* p = &(array[5]) then *p will be 14. Going p=p-3 would make *p be 17.
So if you have int* p = &(array[5]) and int *q = &(array[3]), then p-q should be 2, because the pointers are point to memory that are 2 blocks apart.
When dealing with raw memory (arrays, lists, maps, etc) draw lots of boxes! It really helps!
Because everything in pointer-land is about offsets. When you say:
int array[10];
array[7] = 42;
What you're actually saying in the second line is:
*( &array[0] + 7 ) = 42;
Literally translated as:
* = "what's at"
(
& = "the address of"
array[0] = "the first slot in array"
plus 7
)
set that thing to 42
And if we can add 7 to make the offset point to the right place, we need to be able to have the opposite in place, otherwise we don't have symmetry in our math. If:
&array[0] + 7 == &array[7]
Then, for sanity and symmetry:
&array[7] - &array[0] == 7
So that the answer is the same even on platforms where integers are different lengths.
Say you have an array of 10 integers:
int intArray[10] = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };
Then you take a pointer to intArray:
int *p = intArray;
Then you increment p:
p++;
What you would expect, because p starts at intArray[0], is for the incremented value of p to be intArray[1]. That's why pointer arithmetic works like that. See the code here.
"When you subtract two pointers, as long as they point into the same array, the result is the number of elements separating them"
Check for more here.
This way pointer subtraction behaves is consistent with the behaviour of pointer addition. It means that p1 + (p2 - p1) == p2 (where p1 and p2 are pointers into the same array).
Pointer addition (adding an integer to a pointer) behaves in a similar way: p1 + 1 gives you the address of the next item in the array, rather than the next byte in the array - which would be a fairly useless and unsafe thing to do.
The language could have been designed so that pointers are added and subtracted the same way as integers, but it would have meant writing pointer arithmetic differently, and having to take into account the size of the type pointed to:
p2 = p1 + n * sizeof(*p1) instead of p2 = p1 + n
n = (p2 - p1) / sizeof(*p1) instead of n = p2 - p1
So the result would be code that is longer, and harder to read, and easier to make mistakes in.
When applying arithmetic operations on pointers of a specific type, you always want the resulting pointer to point to a "valid" (meaning the right step size) memory-address relative to the original starting-point. That is a very comfortable way of accessing data in memory independently from the underlying architecture.
If you want to use a different "step-size" you can always cast the pointer to the desired type:
int a = 5;
int* pointer_int = &a;
double* pointer_double = (double*)pointer_int; /* totally useless in that case, but it works */
#fahad Pointer arithmetic goes by the size of the datatype it points.So when ur pointer is of type int you should expect pointer arithmetic in the size of int(4 bytes).Likewise for a char pointer all operations on the pointer will be in terms of 1 byte.
Recently, I wrote some code to compare pointers like this:
if(p1+len < p2)
however, some staff said that I should write like this:
if(p2-p1 > len)
to be safe.
Here,p1 and p2 are char * pointers,len is an integer.
I have no idea about that.Is that right?
EDIT1: of course,p1 and p2 pointer to the same memory object at begging.
EDIT2:just one min ago,I found the bogo of this question in my code(about 3K lines),because len is so big that p1+len can't store in 4 bytes of pointer,so p1+len < p2 is true.But it shouldn't in fact,so I think we should compare pointers like this in some situation:
if(p2 < p1 || (uint32_t)p2-p1 > (uint32_t)len)
In general, you can only safely compare pointers if they're both pointing to parts of the same memory object (or one position past the end of the object). When p1, p1 + len, and p2 all conform to this rule, both of your if-tests are equivalent, so you needn't worry. On the other hand, if only p1 and p2 are known to conform to this rule, and p1 + len might be too far past the end, only if(p2-p1 > len) is safe. (But I can't imagine that's the case for you. I assume that p1 points to the beginning of some memory-block, and p1 + len points to the position after the end of it, right?)
What they may have been thinking of is integer arithmetic: if it's possible that i1 + i2 will overflow, but you know that i3 - i1 will not, then i1 + i2 < i3 could either wrap around (if they're unsigned integers) or trigger undefined behavior (if they're signed integers) or both (if your system happens to perform wraparound for signed-integer overflow), whereas i3 - i1 > i2 will not have that problem.
Edited to add: In a comment, you write "len is a value from buff, so it may be anything". In that case, they are quite right, and p2 - p1 > len is safer, since p1 + len may not be valid.
"Undefined behavior" applies here. You cannot compare two pointers unless they both point to the same object or to the first element after the end of that object. Here is an example:
void func(int len)
{
char array[10];
char *p = &array[0], *q = &array[10];
if (p + len <= q)
puts("OK");
}
You might think about the function like this:
// if (p + len <= q)
// if (array + 0 + len <= array + 10)
// if (0 + len <= 10)
// if (len <= 10)
void func(int len)
{
if (len <= 10)
puts("OK");
}
However, the compiler knows that ptr <= q is true for all valid values of ptr, so it might optimize the function to this:
void func(int len)
{
puts("OK");
}
Much faster! But not what you intended.
Yes, there are compilers that exist in the wild that do this.
Conclusion
This is the only safe version: subtract the pointers and compare the result, don't compare the pointers.
if (p - q <= 10)
Technically, p1 and p2 must be pointers into the same array. If they are not in the same array, the behaviour is undefined.
For the addition version, the type of len can be any integer type.
For the difference version, the result of the subtraction is ptrdiff_t, but any integer type will be converted appropriately.
Within those constraints, you can write the code either way; neither is more correct. In part, it depends on what problem you're solving. If the question is 'are these two elements of the array more than len elements apart', then subtraction is appropriate. If the question is 'is p2 the same element as p1[len] (aka p1 + len)', then the addition is appropriate.
In practice, on many machines with a uniform address space, you can get away with subtracting pointers to disparate arrays, but you might get some funny effects. For example, if the pointers are pointers to some structure type, but not parts of the same array, then the difference between the pointers treated as byte addresses may not be a multiple of the structure size. This may lead to peculiar problems. If they're pointers into the same array, there won't be a problem like that — that's why the restriction is in place.
The existing answers show why if (p2-p1 > len) is better than if (p1+len < p2), but there's still a gotcha with it -- if p2 happens to point BEFORE p1 in the buffer and len is an unsigned type (such as size_t), then p2-p1 will be negative, but will be converted to a large unsigned value for comparison with the unsigned len, so the result will probably be true, which may not be what you want.
So you might actually need something like if (p1 <= p2 && p2 - p1 > len) for full safety.
As Dietrich already said, comparing unrelated pointers is dangerous, and could be considered as undefined behavior.
Given that two pointers are within the range 0 to 2GB (on a 32-bit Windows system), subtracting the 2 pointers will give you a value between -2^31 and +2^31. This is exactly the domain of a signed 32-bit integer. So in this case it does seem to make sense to subtract two pointers because the result will always be within the domain you would expect.
However, if the LargeAddressAware flag is enabled in your executable (this is Windows-specific, don't know about Unix), then your application will have an address space of 3GB (when run in 32-bit Windows with the /3G flag) or even 4GB (when run on a 64-bit Windows system).
If you then start to subtract two pointers, the result could be outside the domain of a 32-bit integer, and your comparison will fail.
I think this is one of the reasons why the address space was originally divided in 2 equal parts of 2GB, and the LargeAddressAware flag is still optional. However, my impression is that current software (your own software and the DLL's you're using) seem to be quite safe (nobody subtracts pointers anymore, isn't it?) and my own application has the LargeAddressAware flag turned on by default.
Neither variant is safe if an attacker controls your inputs
The expression p1 + len < p2 compiles down to something like p1 + sizeof(*p1)*len < p2, and the scaling with the size of the pointed-to type can overflow your pointer:
int *p1 = (int*)0xc0ffeec0ffee0000;
int *p2 = (int*)0xc0ffeec0ffee0400;
int len = 0x4000000000000000;
if(p1 + len < p2) {
printf("pwnd!\n");
}
When len is multiplied by the size of int, it overflows to 0 so the condition is evaluated as if(p1 + 0 < p2). This is obviously true, and the following code is executed with a much too high length value.
Ok, so what about p2-p1 < len. Same thing, overflow kills you:
char *p1 = (char*)0xa123456789012345;
char *p2 = (char*)0x0123456789012345;
int len = 1;
if(p2-p1 < len) {
printf("pwnd!\n");
}
In this case, the difference between the pointer is evaluated as p2-p1 = 0xa000000000000000, which is interpreted as a negative signed value. As such, it compares smaller then len, and the following code is executed with a much too low len value (or much too large pointer difference).
The only approach that I know is safe in the presence of attacker-controlled values, is to use unsigned arithmetic:
if(p1 < p2 &&
((uintptr_t)p2 - (uintptr_t)p1)/sizeof(*p1) < (uintptr_t)len
) {
printf("safe\n");
}
The p1 < p2 guarantees that p2 - p1 cannot yield a genuinely negative value. The second clause performs the actions of p2 - p1 < len while forcing use of unsigned arithmetic in a non-UB way. I.e. (uintptr_t)p2 - (uintptr_t)p1 gives exactly the count of bytes between the bigger p2 and the smaller p1, no matter the values involved.
Of course, you don't want to see such comparisons in your code unless you know that you need to defend against determined attackers. Unfortunately, it's the only way to be safe, and if you rely on either form given in the question, you open yourself up to attacks.
Hi I am new to C programming can anyone please tell me what this line of code would do:
i = (sizeof (X) / sizeof (int))
The code actually works with a case statement when it takes a value of bdata and compares it to different cases.
Generally, such a statement is used to calculate the number of elements in an array.
Let's consider an integer array as below:
int a[4];
Now, when sizeof(a) is done it will return 4*4 = 16 as the size. 4 elements and each element is of 4 bytes.
So, when you do sizeof(a) / sizeof(int), you will get 4 which is the length or size of the array.
It computes the number of elements of the array of int named X.
returns the length of the array X
it computes X's volume in memory divided by the size of an integer in your computer(2 bytes or 4 bytes). If i is integer than it is an integer division. If it is float and X has no even volume, it is real division.
int size can change. X depends on implementation. Division result depends on type of i.
All these means, it computes how many ints fit into X.
Besides common practice or personal experience there is no reason to think that this i = (sizeof (X) / sizeof (int)) computes the size of the array X. Most often probably this is the case but in theory X could be of any type, so the given expression would compute the ratio of the sizes of your var X and an int (how much more memory, in bytes, does your X var occupy with respect to an int)
Moreover, if X was a pointer to an array (float* X, the alternate way of declaring arrays in C) this expression would evaluate to 1 on a 32-bit architecture. The pointer would be 4 bytes and the int also 4 bytes => i = sizeof(X) / sizeof(int) (=1)