We have an array: int p[100].
Why p[i] is equivalent to *(p+i) and not *(p+i*sizeof(int)) ?
Why p[i] is equivalent to *(p+i) and not *(p+i*sizeof(int)) ?
Because *(p+i) is also the same as *((int *) ((char *) p + i * sizeof (int))). When you add an integer i to a pointer, the pointer is moved i times the size of the pointed object.
Why p[i] is equivalent to *(p+i) and not *(p+i*sizeof(int)) ?
Because some processor architectures cannot dereference a pointer that does not point to an address that is aligned by the size of its type. That basicly means that a pointer to a
4 byte integer should always point to an adress that is the multiple of 4.
When a program tries to dereference a misaligned pointer it might cause a "Bus error". You can read more about it here on Wikipedia.
What you are asking for is that p + 1 should increment the pointer by one byte instead of one element. If the language was designed that way writing p++ would no longer be valid for pointers of other types than char. It would also cause big problems with pointer alignment when a programmer forgets to write * sizeof(*p) to make the addition.
It might be confusing but there are very valid reasons for why the language was designed this way.
This is because before adding i to p compiler calculates internally the size of data type p points to and then add it i times to p.
Array elements are stored contiguously.
*(p+i)
Here p has the base address of array p and i ranges from 0 to 99.
So you can iterate over the elements of p by incrementing i.
Why p[i] is equivalent to *(p+i) and not *(p+i*sizeof(int))?
This is because of the way pointer arithmetics work: adding an integer n to a pointer yields a pointer to the nth element (not byte) from the first on (0-based).
Related
Can someone please help me understand how does following logic resolves in obtaining product of a and b?
int getProd(int a, int b){
return (uintptr_t)&((char (*) [a])0x0)[b];
}
Suppose we have a pointer p, which points to objects of size a.
If we then say p + b, we're asking for a pointer to the b'th object past where p points.
So the actual new pointer value (on a byte-addressed machine, anyway), is going to be scaled by a, that is, the size of the pointed-to objects. That is, "under the hood", the compiler is going to do something more like p + b * a.
So we can see the multiplication a * b is happening -- but then it's getting added to the original value of p.
So if we use an initial value of 0, we'll get just a * b. And that's what the hacky getProd function is doing.
Let's break it down:
0x0
The value 0, also known in pointer contexts as a null pointer. [Footnote: there's more complexity to this definition, but let's not worry about that for the moment.]
char (*) [a]
This is a type: "pointer to char array of size a.
(char (*) [a])0x0
This is a cast: take that null pointer, cast it to the type "pointer to array [a] of char".
((char (*) [a])0x0)[b]
Take that pointer, imagine it points to an array, and fetch the b'th element of that array. Since array indexing is the same as pointer arithmetic, this will end up computing 0 + a * b.
&((char (*) [a])0x0)[b];
We had a reference to the b'th element of the "array". Now compute a pointer to that element. That pointer should literally have the value 0 + a * b.
(uintptr_t)&((char (*) [a])0x0)[b];
Finally, take that pointer and cast it to an integer type.
Now, with all of this said, it must be pointed out that this is a hack. Writing code to perform arithmetic on null pointers in this way is highly problematic. It might be almost-but-not-quite-legal; it might be legal-but-just-barely-legal. You could argue for hours about which side of the line the answer falls on.
In this case, of course, it's an academic argument, because no one would ever seriously propose doing multiplication this way.
This code invokes undefined behavior by performing pointer arithmetic on an invalid pointer. That being said, here's what it's attempting to do.
(char (*) [a])0x0 is casting the value 0 to a pointer to an array of size a of char, giving you a pointer to an object that takes up a bytes.
Then with &((char (*) [a])0x0)[b] it uses array indexing to get the b element this pointer points to and takes its address.
Also, because an expression of the type E1[E2] is exactly the same as *(E1 + E2), this means the prior expression is the same as &(*((char (*) [a])0x0) + b), and because & followed by * cancel out this is the same as ((char (*) [a])0x0) + b. So there's no dereferencing of an invalid pointer.
Because pointer arithmetic increments the value of a pointer by the offset times the element size, you now have a pointer whose numeric value is a*b. That value is then converted to an integer type and returned.
Where the undefined behavior comes into play is in the implicit + operator in the array indexing. Pointer arithmetic is only valid if the original pointer and the result of the addition both point to valid object (or one element past the end of an array of objects). Since 0 is not a valid address, this is UB.
Technically this is undefined behavior. But the intended functionality that this code might resolve assuming a naive compiler logic is as following.
((char (*) [a])0x0) - this takes an address 0x0 and is casting it to a pointer to array of a char elements, that is a pointer to an object of size a bytes.
Now, according to C pointer arithmetic any operation (addition/subtraction) with this pointer will be performed in the multiples of a.
Next, it is taking the b offset of this pointer. As we know, p[b] is equivalent to *(p + b) for any pointer p. In our case p is equal to 0x0 and is a pointer to an object of size a. Therefore p + b will have a numerical value of 0x0 + b * sizeof(*p) or 0x0 + a * b. Which is exactly a * b.
I'm having trouble understanding pointer's arithmetic.
Let int B=0, *p=&B, **V=&p and sizeof(int)=4, sizeof(*int)=8
What does the instruction (*V)[1] do?
To me, what I see is that (*V)[1] is equivalent*(*V+1), so what should happen is, we dereference V (which is a pointer to a pointer to an int) and sum 1 to the content of that variable, which is an address. That variable is a pointer and we're assuming sizeof(*int)=8, so in theory we should sum 1 * sizeof(*int) (which is 8) to whatever address is stored in the pointer p to which the pointer V points.
The solution, however, says to sum 4 (1 + sizeof(int)). Is it wrong or is my thinking wrong?
The solution you reference is correct.
The expression *V has type int *, so it points to an array of 1 or more int. So because it points to an int, when pointer arithmetic happens the size of the datatype it point to (sizeof(int), i.e. 4) is multiplied by the given value (1). So if you were to print the values of *V and *V + 1 you would see that they differ by 4.
There is however a problem with (*V)[1], equivalently *(*V + 1). Since *V points to B, *V + 1 points one element past B. This is legal since a pointer can point to one element past the end of an array (or equivalently a single object which is treated as an array of size 1). What is not legal however is to dereference that pointer. Doing so invokes undefined behavior.
(*V)[1] is indeed equivalent to *(*V+1).
Since V is &p (by initialization), *V is p. So we have *(p+1).
Note that both *V and p have type int *. They point to an int, so p+1 points to “the next” int.
Since p points to B (by initialization), and B is a single int, p+1 points just past the end of B (where the “next int” would be if we had an array of int there instead of a single int).
This “just past the end of B” is allowed for a pointer, and it is the location your source refers to for the solution that (*V)[1] effectively adds four bytes to the location that *V points to.
However, while it is allowed to refer to one past the end of B, the C standard does not define the behavior of attempting to access an object there. (*V+1) is a defined pointer, but *(*V+1) is not a defined expression for an object at that location. Its behavior is not defined by the C standard.
#include<stdio.h>
int main(void){
int *ptr,a,b;
a = ptr;
b = ptr + 1;
printf("the vale of a,b is %x and %x respectively",a,b);
int c,d;
c = 0xff;
d = c + 1;
printf("the value of c d are %x and %x respectively",c,d);
return 0;
}
the out put value is
the vale of a,b is 57550c90 and 57550c94 respectively
the value of c d are ff and 100 respectively%
it turns out the ptr + 1 actually, why it behave this way?
Because pointers are designed to be compatible with arrays:
*(pointer + offset)
is equivalent to
pointer[offset]
So pointer aritmetic doesn't work in terms of bytes, but in terms of sizeof(pointer base type)-bytes sized blocks.
Consider what a pointer is... it's a memory address. Every byte in memory has an address. So, if you have an int that's 4 bytes and its address is 1000, 1001 is actually the 2nd byte of that int and 1002 is the third byte and 1003 is the fourth. Since the size of an int might vary from compiler to compiler, it is imperative that when you increment your pointer you don't get the address of some middle point in the int. So, the job of figuring out how many bytes to skip, based on your data type, is handled for you and you can just use whatever value you get and not worry about it.
As Basile Starynkvitch points out, this amount will vary depending on the sizeof property of the data member pointed to. It's very easy to forget that even though addresses are sequential, the pointers of your objects need to take into account the actual memory space required to house those objects.
Pointer arithmetic is a tricky subject. A pointer addition means passing to some next pointed element. So the address is incremented by the sizeof the pointed element.
Short answer
The address of the pointer will be incremented by sizeof(T) where T is the type pointed to. So for an int, the pointer will be incremented by sizeof(int).
Why?
Well first and foremost, the standard requires it. The reason this behaviour is useful (other than for compatibility with C) is because when you have a data structure which uses contiguous memory, like an array or an std::vector, you can move to the next item in the array by simply adding one to the pointer. If you want to move to the nth item in the container, you just add n.
Being able to write firstAddress + 2 is far simpler than firstAddress + (sizeof(T) * 2), and helps prevent bugs arising from developers assuming sizeof(int) is 4 (it might not be) and writing code like firstAddress + (4 * 2).
In fact, when you say myArray[4], you're saying myArray + 4. This is the reason that arrays indices start at 0; you just add 0 to get the first element (i.e. myArray points to the first element of the array) and n to get the nth.
What if I want to move one byte at a time?
sizeof(char) is guaranteed to be one byte in size, so you can use a char* if you really want to move one byte at a time.
A pointer is used to point to a specific byte of memory marking where an object has been allocated (technically it can point anywhere, but that's how it's used). When you do pointer arithmetic, it operates based on the size of the objects pointed to. In your case, it's a pointer to integers, which have a size of 4 bytes each.
Let consider a pointer p. The expression p+n is like (unsigned char *)p + n * sizeof *p (because sizeof(unsigned char) == 1).
Try this :
#include <stdio.h>
#define N 3
int
main(void)
{
int i;
int *p = &i;
printf("%p\n", (void *)p);
printf("%p\n", (void *)(p + N));
printf("%p\n", (void *)((unsigned char *)p + N * sizeof *p));
return 0;
}
int a[3];
int *j;
a[0]=90;
a[1]=91;
a[2]=92;
j=a;
printf("%d",*j);
printf("%d",&a[0])
printf("%d",&a[1]);
printf("%d",*(j+2));
here the pointer variable j is pointing to a[0],which is 90;and address of a[0] is -20 is on my machine. So j is holding -20.
And the address of a[1] is -18. So to get next variable I should use *(j+2). because j+2 will result in -18. but this is actually going on. To access a[1]. I have to use *(j+1). but j+1=-19. Why is j+1 resulting in -18 ?
Addresses are unsigned. You're printing them as if they were an int, but they're not an int. Use "%p" as the format specifier. That's how you print the address of a pointer.
Additionally, pointer arithmetic is different than the arithmetic you are used to. Internally, adding one to a pointer p increments the address by sizeof *p bytes, i.e., it increments the to the next object.
This is convenient as it saves the programmer from having to always use sizeof when performing arithmetic on a pointer (rarely do you actually want to increment by something other than sizeof *p. When you do, you cast to a char* first.)
pointer addition is not same as simple addition.
It depends on what type of variable the pointer is pointing.
In your case it's an int whose size is machine dependent (you can check of doing sizeof(int)).
So when adding a number to pointer like (j+i) it internally converts to (j+i*sizeof(datatype)) so when you type (j+2) the address is increased the 4 times (assuming int to be 2 bytes) which is not the intended result.
(j+1) will give you the right result (it's like saying point to the next element of int type of data)
Actually the pointer logic works on the basis of the pointer type since integer pointer type moves by size of integer on your machine it moves by the 2 bytes on your machine so p+1 = p+(sizeof(type of pointer))
*(x+y) is always exactly equivalent to x[y] (or y[x]). So to print a[1], you want *(j + 1) (or just j[1]). Note that, in a[1], a is converted to a pointer ... there's no difference between the way a is handled and the way j is handled here.
I was trying the following code
#include<stdio.h>
int main()
{
int A[3][4] = {{1,2,3,4},{5,6,7,8,},{9,10,11,12}};
int **t = &A[0]; //I do this or **t = A,I guess both are equivalent
printf("%d %p\n\n",*t,A[0]);
return 0;
}
What I expected to happen:
Now t is a 2d pointer (pointer to pointer) holding the address of A[0] which in turn holds the address of A[0][0]. So *t should give me the value of A[0] ,that is the address of A[0][0] and **t should give me the value of A[0][0] ,which in this case is 1.
What I got:
*t gave the value of 1. And trying to find **t was not possible as it resulted in a Segmentation Fault.
Can anyone please tell why this is happening ?
I tried the following explanation,but not sure whether it is the "correct" explanation.
t holds the address of A[0] ,but since A is an array and A[0] is an Array Pointer (which is "not exactly" a pointer),C doesn't allocate memory for pointer A or A[0] specially UNLIKE other pointer variables. It allocates memory only for the array as a whole . So the address of A[0] and A[0] (which is the address of A[0][0]) are essentially the same ,both belong under one roof and are not like 'separate' entities . As a result t in-turn indirectly holds the address of A[0][0] and *t gives the value of A[0][0],which is 1.
Is the above explanation correct ?Kind of looks weird.
Arrays are not pointers.
Well, even more...
Multiple-dimensional arrays are not double, triple, etc. pointers.
So all you have is wrong, your program invokes undefined behavior several times, and there's nothing you can expect.
Given that arrays are contiguous in memory, you can rewrite your example like this:
int A[3][4] = {{1,2,3,4},{5,6,7,8,},{9,10,11,12}};
int *p = &A[0][0];
printf("%d %d %p\n", A[0][0], *p, (void *)p);
I tried the following explanation,but not sure whether it is the "correct" explanation.
Not quite, but it's somewhat close.
t holds the address of A[0] ,but since A is an array and A[0] is an Array Pointer
A[0] is an array, specifically, its type is int[4].
(which is "not exactly" a pointer),C doesn't allocate memory for pointer A or A[0] specially UNLIKE other pointer variables.
Arrays and pointers are fundamentally different types of entities. Don't confuse them.
The fact that in most circumstances an expression of type array of T is converted into a value of type pointer to T (pointing to the array's first element) sure contributes to the confusion, but one must not forget that it is a conversion. In particular, for higher-dimensional arrays, or arrays of arrays, the element type of the array is itself an array type, so the result of the conversion is a pointer to an array.
It allocates memory only for the array as a whole. So the address of A[0] and A[0] (which is the address of A[0][0]) are essentially the same,
No, they are essentially different, one - A[0] - is an array, int[4], the other - &A[0] - is a pointer to an array of four int, int(*)[4]. Neither is &A[0][0].
But when A[0] is converted to a pointer to its first element, &A[0][0], the resulting address usually is the same as the address of A[0] (usually, a pointer to an object holds the address of the byte with the lowest address belonging to the object, and since A[0] belongs to (is part of) the object A, the one with the lowest address, the first byte that is part of A[0] is the first byte that is part of A).
So &A[0] and &A[0][0] usually have the same representation, but one is an int(*)[4], the other an int*.
both belong under one roof and are not like 'separate' entities . As a result t in-turn indirectly holds the address of A[0][0] and *t gives the value of A[0][0], which is 1.
That part is, apart from the type mismatch that makes dereferencing t undefined behaviour, more or less correct. Formally, the undefined behaviour allows anything to happen.
In practice, if sizeof(int) == sizeof(int*), dereferencing t interprets the int 1 that is A[0][0] as an address, and if you print that as an int (yet another undefined behaviour), you get 1 printed. If sizeof(int*) == 2*sizeof(int), as is common on 64-bit systems, dereferencing t would usually interpret the two ints A[0][0] and A[0][1] together as an address - 0x200000001 or 0x100000002 depending on endianness probably.