Why does this expression come out to 4 in C? - c

So this expression comes out to 4:
int a[] = {1,2,3,4,5}, i= 3, b,c,d;
int *p = &i, *q = a;
char *format = "\n%d\n%d\n%d\n%d\n%d";
printf("%ld",(long unsigned)(q+1) - (long unsigned)q);
I have to explain it in my homework and I have no idea why it's coming out to that value. I see (long unsigned) casting q+1, and then we subtract the value of whatever q is pointing at as a long unsigned and I assumed we would be left with 1. Why is this not the case?

Because q is a pointer the expression q+1 employs pointer arithmetic. This means that q+1 points to one element after q, not one byte after q.
The type of q is int *, meaning it points to an int. The size of an int on your platform is most likely 4 bytes, so adding 1 to a int * actually adds 4 to the raw pointer value so that it points to the next int in the array.

Try printing the parts of the expression and it becomes a bit clearer what is going on.
printf("%p\n",(q+1));
printf("%p\n",q);
printf("%ld\n",(long unsigned)(q+1));
printf("%ld\n",(long unsigned)q);
It becomes more clear that q is a pointer pointing to the zeroth element of a, and q+1 is a pointer pointing to the next element of a. Int's are 4 bytes on my machine (and presumably on your machine), so they are four bytes apart. Casting the pointers to unsigned values has no effect on my machine, so printing out the difference between the two gives a value of 4.
0x7fff70c3d1a4
0x7fff70c3d1a0
140735085269412
140735085269408

It's because sizeof(int) is 4.
This is an esoteric corner of C that is usually best avoided.
(If it doesn't make sense yet, add some temporary variables).
BTW, the printf format string is incorrect. But that's not why it's outputting 4.

Related

Why does pointer subtraction in C yield an integer?

Why if I subtract from a pointer another pointer (integer pointers) without typecasting the result will be 1 and not 4 bytes (like it is when I typecast to int both pointers). Example :
int a , b , *p , *q;
p = &b;
q = p + 1; // q = &a;
printf("%d",q - p); // The result will be one .
printf("%d",(int)q - (int)p); // The result will be 4(bytes). The memory address of b minus The memory address of a.
According to the C Standard (6.5.6 Additive operators)
9 When two pointers are subtracted, both shall point to elements of
the same array object, or one past the last element of the array
object; the result is the difference of the subscripts of the two
array elements....
If the two pointers pointed to elements of the same array then as it is said in the quote from the Standard
the result is the difference of the subscripts of the two array
elements
That is you would get the number of elements of the array between these two pointers. It is the result of the so-called pointer arithmetic.
If you subtract addresses stored in the pointers as integer values then you will get the number that corresponds to the arithmetic subtract operation.
Why If If I subtract from a pointer another pointer (integer pointers) without typecasting the result will be 1 and not 4 bytes
That's the whole point of the data type that a pointer pointing to. It's probably easier to look at an array context like below. The point is regardless of the underlying data type (here long or double), you can use pointer arithmetic to navigate the array without caring about how exactly the size of its element is. In other words, (pointer + 1) means point the next element regardless of the type.
long l[] = { 10e4, 10e5, 10e6 };
long *pl = l + 1; // point to the 2nd element in the "long" array.
double d[] = { 10e7, 10e8, 10e9 };
double *pd = d + 2; // point to the 3rd element in the "double" array.
Also note in your code:
int a , b , *p , *q;
p = &b;
q = p + 1; // q = &a; <--- NO this is wrong.
The fact that a and b are declared next to each other does not mean that a and b are allocated next to each other in the memory. So q is pointing to the memory address next to that of b - but what is in that address is undefined.
Because the ptrdiff_t from pointer subtraction is calculated relative to the size of the elements pointed to. It's a lot more convenient that way; for one, it tells you how many times you can increment one pointer before you reach the other pointer.
where you have
int a , b , *p , *q;
The compiler can put a and b anywhere. They don't have to even be near each other. Also, when you subtract two int pointers, the result is sized in terms of int, not bytes.
C is not assembly language. So pointers are not just plain integers -- pointers are special guys that know how to point to other things.
It's fundamental to the way pointers and pointer arithmetic work in C that they can point to successive elements of an array. So if we write
int a[10];
int *p1 = &a[4];
int *p2 = &a[3];
then p1 - p2 will be 1. The result is 1 because the "distance" between a[3] and a[4] is one int. The result is 1 because 4 - 3 = 1. The result is not 4 (as you might have thought it would be if you know that ints are 32 bits on your machine) because we're not interesting in doing assembly language programming or working with machine addresses; we're doing higher-level language programming with an array, and we're thinking in those terms.
(But, yes, at the machine address level, the way p2 - p1 is computed is typically as (<raw address value in p2> - <raw address value in p1>) / sizeof(int).)

typecasting a pointer to an int .

I can't understand the output of this program .
What I get of it is , that , first of all , the pointers p, q ,r ,s were pointing towards null .
Then , there has been a typecasting . But how the heck , did the output come as 1 4 4 8 . I might be very wrong in my thoughts . So , please correct me if I am wrong .
int main()
{
int a, b, c, d;
char* p = (char*)0;
int *q = (int *)0;
float* r = (float*)0;
double* s = (double*)0;
a = (int)(p + 1);
b = (int)(q + 1);
c = (int)(r + 1);
d = (int)(s + 1);
printf("%d %d %d %d\n", a, b, c, d);
_getch();
return 0;
}
Pointer arithmetic, in this case adding an integer value to a pointer value, advances the pointer value in units of the type it points to. If you have a pointer to an 8-byte type, adding 1 to that pointer will advance the pointer by 8 bytes.
Pointer arithmetic is valid only if both the original pointer and the result of the addition point to elements of the same array object, or just past the end of it.
The way the C standard describes this is (N1570 6.5.6 paragraph 8):
When an expression that has integer type is added to or subtracted
from a pointer, the result has the type of the pointer operand. If the
pointer operand points to an element of an array object, and the array
is large enough, the result points to an element offset from the
original element such that the difference of the subscripts of the
resulting and original array elements equals the integer expression.
[...]
If both the pointer operand and the result point to elements of the
same array object, or one past the last element of the array object,
the evaluation shall not produce an overflow; otherwise, the behavior
is undefined. If the result points one past the last element of the
array object, it shall not be used as the operand of a unary *
operator that is evaluated.
A pointer just past the end of an array is valid, but you can't dereference it. A single non-array object is treated as a 1-element array.
Your program has undefined behavior. You add 1 to a null pointer. Since the null pointer doesn't point to any object, pointer arithmetic on it is undefined.
But compilers aren't required to detect undefined behavior, and your program will probably treat a null pointer just like any valid pointer value, and perform arithmetic on it in the same way. So if the null pointer points to address 0 (this is not guaranteed, BTW, but it's very common), then adding 1 to it will probably give you a pointer to address N, where N is the size in bytes of the type it points to.
You then convert the resulting pointer to int (which is at best implementation-defined, will lose information if pointers are bigger than int, and may yield a trap representation) and you print the int value. The result, on most systems, will probably show you the sizes of char, int, float, and double, which are commonly 1, 4, 4, and 8 bytes, respectively.
Your program's behavior is undefined, but the way it actually behaves on your system is typical and unsurprising.
Here's a program that doesn't have undefined behavior that illustrates the same point:
#include <stdio.h>
int main(void) {
char c;
int i;
float f;
double d;
char *p = &c;
int *q = &i;
float *r = &f;
double *s = &d;
printf("char: %p --> %p\n", (void*)p, (void*)(p + 1));
printf("int: %p --> %p\n", (void*)q, (void*)(q + 1));
printf("float: %p --> %p\n", (void*)r, (void*)(r + 1));
printf("double: %p --> %p\n", (void*)s, (void*)(s + 1));
return 0;
}
and the output on my system:
char: 0x7fffa67dc84f --> 0x7fffa67dc850
int: 0x7fffa67dc850 --> 0x7fffa67dc854
float: 0x7fffa67dc854 --> 0x7fffa67dc858
double: 0x7fffa67dc858 --> 0x7fffa67dc860
The output is not as clear as your program's output, but if you examine the results closely you can see that adding 1 to a char* advances it by 1 byte, an int* or float* by 4 bytes, and a double* by 8 bytes. (Other than char, which by definition has a size of 1 bytes, these may vary on some systems.)
Note that the output of the "%p" format is implementation-defined, and may or may not reflect the kind of arithmetic relationship you might expect. I've worked on systems (Cray vector computers) where incrementing a char* pointer would actually update a byte offset stored in the high-order 3 bits of the 64-bit word. On such a system, the output of my program (and of yours) would be much more difficult to interpret unless you know the low-level details of how the machine and compiler work.
But for most purposes, you don't need to know those low-level details. What's important is that pointer arithmetic works as it's described in the C standard. Knowing how it's done on the bit level can be useful for debugging (that's pretty much what %p is for), but is not necessary to writing correct code.
Adding 1 to a pointer advances the pointer to the next address appropriate for the pointer's type.
When the (null)pointers+1 are recast to int, you are effectively printing the size of each of the types being pointed to by the pointers.
printf("%d %d %d %d\n", sizeof(char), sizeof(int), sizeof(float), sizeof(double) );
does pretty much the same thing. If you want to increment each pointer by only 1 BYTE, you'll need to cast them to (char *) before incrementing them to let the compiler know
Search for information about pointer arithmetic to learn more.
You're typecasting the pointers to primitive datatypes rather type casting them to pointers themselves and then using * (indirection) operator to indirect to that variable value. For instance, (int)(p + 1); means p; a pointer to constant, is first incremented to next address inside memory (0x1), in this case. and than this 0x1 is typecasted to an int. This totally makes sense.
The output you get is related to the size of each of the relevant types. When you do pointer arithmetic as such, it increases the value of the pointer by the added value times the base type size. This occurs to facilitate proper array access.
Because the size of char, int, float, and double are 1, 4, 4, and 8 respectively on your machine, those are reflected when you add 1 to each of the associated pointers.
Edit:
Removed the alternate code which I thought did not exhibit undefined behavior, which in fact did.

How is arithmetic with &array defined

I get most of pointer arithmetic, until I saw the following:
int x[5];
sizeof(x) // equals 20
sizeof(&x) // equals 4 -- sizeof(int))
So far I give this the semantic meaning of:
pointer to N-element array of T -- in the case of &x
However when doing x+1 we increment with sizeof(int) and when we do &x+1 we increment with sizeof(x).
Is there some underlying logic to this i.e. some equivalences, because this feels very unintuitive.
/edit, thanks to #WhozCraig I came to the conclusion that I made an error:
sizeof(&x) // equals 4 -- sizeof(int))
Should be
sizeof(&x) // equals 8 -- sizeof(int))
Lesson learned: Don't post code you haven't run directly
x is of type int[5], so &x is a pointer to an integer array of five elements, when adding 1 to &x you are incrementing to to next array of 5 elemnts.
Its called typed pointer math (or typed-pointer-arithmetic) and is intuitive when you get one thing engrained in your DNA: Pointer math adjusts addresses based on the type of a pointer that holds said-address.
In your example, what is the type of x? It is an array of int. but what is the type of the expression x ? Hmmm. According to the standard, the expression value of x is the address of the first element of the array, and the type is pointer-to-element-type, in this case, pointer-to-int.
The same standard dictates that for any data var (functions are a little odd) using the & operator results in an address with a type of pointer-to-type, the type being whatever the type of the variable is:
For example, given
int a;
the expression &a results in an address who's type is int *. Similarly,
struct foo { int x,y,z } s;
the expression &s results in an address who's type is struct foo *.
And now, the point of probable confusion, given:
int x[5];
the expression &x results in an address who's type is int (*)[5], i.e. a pointer to an array of five int. This is markedly different than simply x which is, per the standard, evals as an address who's type is a pointer to the underlying array element type
Why does it matter? Because all pointer arithmetic is based on that fundamental type of the expression address. Adjustments therein using typed pointer math are reliant on that fundamental concept.
int x[5];
x + 1
is effectively doing this:
int x[5];
int *p = x;
p + 1 // results is address of next int
Whereas:
&x + 1
is effectively doing this:
int x[5];
int (*p)[5] = &x;
p + 1 // results in address of next int[5]
// (which, not coincidentally, there isn't one)
Regarding the sizeof() differential, once again, those pesky types come home to roost, and in particular difference, it is important to note that sizeof is a compile-time operator; not run-time:
int x[5]
size_t n = sizeof(x);
In the above, sizeof(x) equates to sizeof(type-of x). Since x is int[5] and int is apparently 4 bytes on your system, the result is 20. Similarly,
int x[5];
size_t n = sizeof(*x);
results with sizeof(type-of *x) begin assigned to n. Because *x is of type int, this is synonymous with sizeof(int). The compile-time aspects, incidentally, make the following equally valid, though admittedly it looks a little dangerous at first glance:
int *p = NULL;
size_t n = sizeof(*p);
Just as before, sizeof(type-of *p) equates to sizeof(int)
But what about:
int x[5];
size_t n = sizeof(&x);
Here again, sizeof(&x) equates to sizeof(type-of &x). but we just covered what type &x is; its int (*)[5]. I.e. Its a data pointer type, and as such, its size will be the size of a pointer. On your rig, you apparently have 32bit pointers, since the reported size is 4.
An example of how &x is a pointer type, and that indeed all data pointer types result in a similar size, I close with the following example:
#include <stdio.h>
int main()
{
int x[5];
double y[5];
struct foo { char data[1024]; } z[5];
printf("%zu, %zu, %zu\n", sizeof(x[0]), sizeof(x), sizeof(&x));
printf("%zu, %zu, %zu\n", sizeof(y[0]), sizeof(y), sizeof(&y));
printf("%zu, %zu, %zu\n", sizeof(z[0]), sizeof(z), sizeof(&z));
return 0;
}
Output (Mac OSX 64bit)
4, 20, 8
8, 40, 8
1024, 5120, 8
Note the last value size reports are identical.
You said "I get most of pointer arithmetic, until I saw the following:"
int x[5];
sizeof(x) // equals 20
sizeof(&x) // equals 4 -- sizeof(int))
Investigating the first sizeof...
if ((sizeof(int) == 4) == true) {
then the size of five tightly packed ints is 5 * 4
so the result of (sizeof(int[5]) is 20.
}
However...
if (size(int)) == 4) is true
then when the size of the memory holding the value of another memory address is 4,
ie. when ((sizeof(&(int[5])) == 4) {
it is a cooincidence that memory addresses conveniently fit
into the same amount of memory as an int.
}
}
Don't be fooled, memory addresses have traditionally been the same size as int on some very popular platforms, but if you ever believe that they are the same size, you will prevent your code from running on many platforms.
To further drive the point home
it is true that (sizeof(char[4]) == 4),
but that does not mean that a `char[4]` is a memory address.
Now, in C, the offset operator for memory addresses "knows" the offset based on the type of pointer, char, int, or the implied address size. When you add to a pointer, the addition is translated by the compiler to an operation that looks more like this
addressValue += sizeof(addressType)*offsetCount
where
&x + 1
becomes
x += sizeof(x)*1;
Note that if you really want to have (some very unsafe programming) fun, you can cast your pointer type unsafely and specify offsets that really "don't work" the way they should.
int x[5];
int* xPtr = &x;
char* xAsCharPtr = (char*) xPtr;
printf("%d", xAsCharPtr + 2);
will print out a number comprised of about 1/2 the bits of numbers at x[0] and x[1].
It seems implicit conversion is at play, thanks to the excellent answer in some other pointer arithmetic question, I think it boils down to:
when x is an expression it can be read as &x[0] due to implicit conversion, adding 1 to this expression intuitively makes more sense that we want &x[1]. When doing sizeof(x) the implicit conversion does not occur giving the total size of object x. Arithmetic with &x+1 makes sense also when considering that &x is a pointer to a 5-element array.
The thing that does not become intuitive is sizeof(&x), one would expect it to also be of size x, yet it is the size of an element in the pointed-to array, x.

Add operation on pointer

int a[3];
int *j;
a[0]=90;
a[1]=91;
a[2]=92;
j=a;
printf("%d",*j);
printf("%d",&a[0])
printf("%d",&a[1]);
printf("%d",*(j+2));
here the pointer variable j is pointing to a[0],which is 90;and address of a[0] is -20 is on my machine. So j is holding -20.
And the address of a[1] is -18. So to get next variable I should use *(j+2). because j+2 will result in -18. but this is actually going on. To access a[1]. I have to use *(j+1). but j+1=-19. Why is j+1 resulting in -18 ?
Addresses are unsigned. You're printing them as if they were an int, but they're not an int. Use "%p" as the format specifier. That's how you print the address of a pointer.
Additionally, pointer arithmetic is different than the arithmetic you are used to. Internally, adding one to a pointer p increments the address by sizeof *p bytes, i.e., it increments the to the next object.
This is convenient as it saves the programmer from having to always use sizeof when performing arithmetic on a pointer (rarely do you actually want to increment by something other than sizeof *p. When you do, you cast to a char* first.)
pointer addition is not same as simple addition.
It depends on what type of variable the pointer is pointing.
In your case it's an int whose size is machine dependent (you can check of doing sizeof(int)).
So when adding a number to pointer like (j+i) it internally converts to (j+i*sizeof(datatype)) so when you type (j+2) the address is increased the 4 times (assuming int to be 2 bytes) which is not the intended result.
(j+1) will give you the right result (it's like saying point to the next element of int type of data)
Actually the pointer logic works on the basis of the pointer type since integer pointer type moves by size of integer on your machine it moves by the 2 bytes on your machine so p+1 = p+(sizeof(type of pointer))
*(x+y) is always exactly equivalent to x[y] (or y[x]). So to print a[1], you want *(j + 1) (or just j[1]). Note that, in a[1], a is converted to a pointer ... there's no difference between the way a is handled and the way j is handled here.

why different answers?

Below are 2 programs
First
#include<stdio.h>
void main()
{
int a[5]={1,2,3,4,5};
int *p;
p=&a;
printf("%u %u",p,p+1);
}
Second
#include<stdio.h>
void main()
{
int a[5]={1,2,3,4,5};
printf("%u %u",&a,&a+1);
}
Now, in the two programs..I have printed the values of &a using p in first code and directly in the second..
Why are the results different?
the answer i m getting are.
for first 3219048884 3219048888
for second 3219048884 3219048904
The type of &a is int (*) [5]. Therefore &a+1 is a pointer that's 5 ints further on than a. However the type of p is int *, therefore p+1 is a pointer that's 1 int further on than p.
When I run that, I get this:
1245036 1245040 1245036 1245040
1245036 1245040 1245036 1245056
with the only difference being in the last position, p+1 vs &a+1
p is a pointer to an integer, so p+1 is the address of the next integer. (i.e 4 byte further in memeory)
a is an array of 5 integers, so &a+1 is the address of the next array of 5 integers. (i.e., 20 bytes further in memeory)
In both programs, you are printing the memory addresses of the array.
This can vary each time you run the program. The memory that the OS chooses to give you can be different.
When your program declares an array of 5 integers, The OS promises to give you 5 consecutive integers, it does NOT promise what memory you will get, or that you will get the same memory every time.
You have two programs with different stack frames, no surprise the addresses of local variables are different. Il may change each time you run the program (that's what it does when I try it, code compiled with gcc on Linux).
But you'll get the same values with the program below one except for the last value of the serie (except for the last one, because of the way pointer arithmetic works).
#include<stdio.h>
void main()
{
int a[5]={1,2,3,4,5};
int *p;
p=&a;
printf("%u %u %u %u ",a,a+1,p,p+1);
printf("%u %u %u %u",a,a+1,&a,&a+1);
}
For a (quite) complete explanation of difference between pointers and arrays you can look at my answer here
Your first program is invalid. It is illegal to assign p = &a, since p has type int * and &a has type int (*)[5]. These types are not compatible. If your compiler is loose enough to allow this kind of assignment (did it at least warn you?), it probably interpreted it as p = (int *) &a, i.e. it forcefully reinterprets the value of &a as an int * pointer. Thus, all pointer arithmetic in your first program is int * arithmetic. This is why the first program produces the same output for a+1 and p+1 values.
In the second program the value of &a is not reinterpreted. It keeps its original type ``int ()[5]and it follows the rules of normal pointer arithmetic for typeint ()[5], meaning that when you do&a + 1, the pointer is movedsizeof(int[5])` bytes, as it should. This is why the result is different from the first program.
You should use p = a and not p = &a. An array like a is already assumed a constant pointer. You do not need to dereference it with the & operator.

Resources