C array variable & address computation - c

int a[2][2]={{2,3},{1,6}};
printf(“%d”,&a[1][0] - &a[0][1]);
Here, a[0][1] and a[1][0] are two consecutive integer items.As each integer will take 4 bytes then it should have 4 bytes difference between them.So,answer should be 4.
But I think,address subtraction is illegal.And in Dev-C++,it generates compiler error also.But the given output is 1.How come it becomes possible?

You're doing substraction on int pointers, so you get a result in "sizeof(int) units".
If you run your current code, it'll print 1, because those integers are indeed next to each other.
What you probably want to do is arithmetic on the addresses as numbers :
int a[2][2]={{2,3},{1,6}};
printf("%" PRIiPTR,(intptr_t)&a[1][0] - (intptr_t)&a[0][1]);
Casting the pointers to intptr_t (in header stdint.h) is a way to do that.
PRIiPTR is a macro (from header inttypes.h) used to output an inptr_t variable with printf.

No, it should not be 4.
Your assumption is incorrect: Pointer arithmetic is done in units of the type being pointed at (i.e. sizeof (int) here), not in bytes.
Your array looks like this in memory:
[ 2 | 3 | 1 | 6 ]
You are printing the difference between the addresses of the 1 and the 3, which are adjacent, i.e. there's exactly 1 int's worth of bytes between them.
Also, you're incorrect to print a pointer difference as if it's an int (with %d). The proper way is to use "%" PRIdPTR and cast to intptr_t.

Related

Unexpected result when doing subtraction of addresses of array elements

I am on a x32-based processor where char = 1 byte, short = 2 bytes and int = 4 bytes.
When I create an array of type char with 20 elements in it, I expect to see 20 memory spaces allocated to that array with the addresses differing by only 1 byte because of the type of the array.
If I take two consecutive elements from the array and subtract their addresses, should I then not get 1 in this case?
And in the case of arrays with types short and int, I am expecting to get 2 and 4. This due to the fact that the short and int elements need be aligned in memory. short elements will be on even addresses (diff 2) and int elements will be on addresses divisible by 4.
Though, how come when I run the following code I get 1,1,1 and not 1,2,4?
I suspect I am missing some cruical detail when it comes to pointer arithmetics.
char vecc[20];
printf("%i\n", &vecc[1]-&vecc[0]);
short vecs[20];
printf("%i\n", &vecs[1]-&vecs[0]);
int veci[20];
printf("%i\n", &veci[1]-&veci[0]);
Pointer subtraction yields the result as difference in the indexes, not the size of the gap between the addresses.
Quoting C11, chapter 6.5.6, (emphasis mine)
When two pointers are subtracted, both shall point to elements of the same array object, or one past the last element of the array object; the result is the difference of the subscripts of the two array elements. [...]
If you write the code in this way:
printf("%i\n", (char*)(&vecs[1]) - (char*)(&vecs[0]));
printf("%i\n", (char*)(&veci[1]) - (char*)(&veci[0]));
the output will be 2 and 4.

Offset between two array element addresses in C

I have a question asking me to find the offset in bytes between two array element addresses:
double myArray[5][7];
If C stored data in column-major order the offset (in bytes) of &myArray[3][2] from &myArray[0][0] would be:
In column major order, I think elements would be laid out as such:
[0][0] -- [1][0] -- [2][0] -- [3][0] -- ..... -- [3][2]
So in my mind to get the offset in bytes is to count the number of jumps between [0][0] and [3][2] and times that by 8 since it's an array of doubles. However, what's confusing me is that it's asking for the offset using the & operator. Would this somehow change the answer since it's asking between two addresses or is the process still the same? I think it'd be the same but I'm not 100% certain.
If my thinking is correct would this then be 8*15 bytes?
The memory lay out for the 2d array would be a contiguous chunk of memory.(Based on your question)
int x[2][3] = {{0,1,2},{3,4,5}};
That will be layed out in (Your question)
--+--+--+--+--+--+
0| 3| 1| 4|2 |5 |
--+--+--+--+--+--+
But in C this is stored like
--+--+--+--+--+--+
0| 1| 2| 3|4 |5 |
--+--+--+--+--+--+
Now you are absolutely right, that you can consider jumps between [0][0] and [3][2] but there is a better way to do that without thinking about all this, you can be sure that their offset will be their address differences.
You can simply get their addresses and subtract them.
ptrdiff_t ans = &a[3][2]-&a[0][0];(this is basically the gaps between the two elements)
That yields the answer. printf("answer = %td",ans*sizeof(a[0][0]); (One gap = sizeof(a[0][0])) [In your case double]
Or even better way would be to
ptrdiff_t ans = (char*)&a[3][2] - (char*)&a[0][0];//number of bytes between them.
I will explain a bit why char* is important here:
(char*)&a[0][0] and &a[0][0] both contain the same thing value-wise.(this is not general enough)
But it matters in pointer arithmetic. (Interpretation is different).
When not using the cast, the interpretation is of the data type of array elements. That means now it consider the difference in doubles. When you cast it, it spits the result in or difference in char-s.
And why this works? Because all data memory is byte addressable and char is of single bytes.
There is something more to this than expected , first let's see what is an array in C? †
C does not really have multi-dimensional arrays. In C it is realized as an array of arrays. And yes those multidimensional array elements are stored in row-major order.
To clarify a bit more we can look into an example of standard §6.5.2.1
Consider the array object defined by the declaration
int x[3][5];
Here x is a 3 x 5 array of ints; more precisely, x is an array of
three element objects, each of which is an array of five ints. In the
expression x[i], which is equivalent to (*((x)+(i))), x is first
converted to a pointer to the initial array of five ints. Then i is
adjusted according to the type of x, which conceptually entails
multiplying i by the size of the object to which the pointer points,
namely an array of five int objects. The results are added and
indirection is applied to yield an array of five ints. When used in
the expression x[i][j], that array is in turn converted to a pointer
to the first of the ints, so x[i][j] yields an int.
So we can say double myArray[5][7]; here myArray[3][2] and myArray[0][0] are not part of the same array.
Now that we are done here - let's get into something else:
From standard §6.5.6.9
When two pointers are subtracted, both shall point to elements of the
same array object, or one past the last element of the array object;
the result is the difference of the subscripts of the two array
elements.
But here myArray[3] and myArray[0] are denoting two different arrays. And that means myArrayp[3][2] and myArray[0][0] both belong to different arrays. And they are not one past the last element. So the behavior of the subtraction &myArray[3][2] - &myArray[0][0] will not be defined by the standard.
†Eric (Eric Postpischil) pointed out this idea.
In a row-major traversal, the declaration is array[height][width], and the usage is array[row][column]. In row-major, stepping to the next number gives you the next column, unless you exceed the width and "wrap" to the next row. Each row adds width to your index, and each column adds 1, making rows the "major" index.
In order to get the column-major equivalent, you assume the next value is the next row, and when the row exceeds the height, it "wraps" to the next column. This is described by index = column * height + row.
So, for an array array[5][7] of height 5, the index [3][2] yields 2*5 + 3 = 13.
Let's verify with some code. You can get column-major behavior simply by switching the order of the indices.
#include <stdio.h>
int main() {
double array[7][5];
void *root = &array[0][0];
void *addr = &array[2][3];
size_t off = addr - root;
printf("memory offset: %d number offset: %d\n", off, off/sizeof(double));
return 0;
}
Running this program yields an address offset of 104, or 13 doubles.
EDIT: sorry for wrong answer
The Simple Answer
C does not have multidimensional arrays, so we have to interpret double myArray[5][7] as one-dimensional array of one-dimensional arrays. In double myArray[5][7], myArray is an array of 5 elements. Each of those elements is an array of 7 double.
Thus, we can see that myArray[0][0] and myArray[0][1] are both members of myArray[0], and they are adjacent members. Thus, the elements proceed [0][0], [0][1], [0][2], and so on.
When we consider myArray[1], we see it comes after myArray[0]. Since myArray[0] is an array of 7 double, myArray[1] starts 7 double after myArray[0].
Now we can see that myArray[3][2] is 3 arrays (of 7 double) and 2 elements (of double) after myArray[0][0]. If a double is 8 bytes, then this distance is 3•7•8 + 2•8 = 184 bytes.
The Correct Answer
To my surprise, I cannot find text in the C standard that specifies that the size of an array of n elements equals n times the size of one element. Intuitively, it is “obvious”—until we consider that an implementation in an architecture without a flat address space might have some issues that require it to access arrays in complicated ways. Therefore, we do not know what the size of an array of 7 double is, so we cannot calculate how far myArray[3][2] is from myArray[0][0] in general.
I do not know of any C implementations in which the size of an array of n elements is not n times the size of one element, so the calculation will work in all normal C implementations, but I do not see that it is necessarily so according to the C standard.
Calculating the Distance in a Program
It has been suggested the address can be calculated using (char *) &myArray[3][2] - (char *) &myArray[0][0]. Although this is not strictly conforming C, it will work in common C implementations. It works by converting the addresses to pointers to char. Subtracting these two pointers then gives the distance between them in units of char (which are bytes).
Using uintptr_t is another option, but I will omit discussion of it and its caveats as this answer is already too long.
A Wrong Way to Calculate Distance
One might think that &myArray[3][2] is a pointer to double and &myArray[0][0] is a pointer to double, so &myArray[3][2] - &myArray[0][0] is the distance between them, measured in units of double. However, the standard requires that pointers being subtracted must point to elements of the same array object or to one past the last element. (Also, for this purpose, an object can act as an array of one element.) However, myArray[3][2] and myArray[0][0] are not in the same array. myArray[3][2] is in myArray[3], and myArray[0][0] is in myArray[0]. Further, neither of them is an element of myArray, because its elements are arrays, but myArray[3][2] and myArray[0][0] are double, not arrays.
Given this, one might ask how (char *) &myArray[3][2] - (char *) &myArray[0][0] can be expected to work. Isn’t it also subtracting pointers to elements in different arrays? However, character types are special. The C standard says character pointers can be used to access the bytes that represent objects. (Technically, I do not see that the standard says these pointers can be subtracted—it only says that a pointer to an object can be converted to a pointer to a character type and then incremented successively to point to the remaining bytes of an object. However, I think the intent here is for character pointers to the bytes of an object to act as if the bytes of the object were an array.)

issue in double pointer address addition

I have got one issue from a open source code in pointers side, which i have tried to replicate in this below small snippet.
int main()
{
int **a=0x0;
printf ("a = %d Add = %d\n", a, a+75);
return 1;
}
Expectation is to get 75/0x4B but this code gives 300 in 32 bit and 600 in 64 bit machines.
Output:
a = 0 Add = 600
But the ideology behind to access the added position i.e 75th position in Hash table.
So it should be
printf ("a = %d Add = %d\n", a, sizeof (a)+75);
But i couldn't able to guess why this 300 or 600 output. could anyone please point out?
I went till a point where there is some left shift internally happening since:
75 - 1001011
600 - 1001011000.
Solutions are appreciated. Thanks in advance.
Pointer arithmetic is always done using the size of what is pointed to. In your case a is a pointer to a pointer to int, so the unit size is sizeof(int*) which in your case seems to be 4 (32 bits). 4 * 75 = 300.
More precisely, a + 75 adds the byte offset sizeof(*a) * 75 (note the dereferencing of a) to the pointer. What happens is that you are effectively doing &a[75], i.e. you're getting a pointer to the 75:th element.
On a slightly related note, when you print pointers with printf you should be using the format "%p", and casting the pointers to void *. See e.g. this printf (and family) reference.
As for the different size on 32 and 64 bit systems, it's to be expected. A pointer on a 32-bit system is typically 32 bits, while on a 64-bit system its 64 bits.
The program behaviour is undefined:
The format specifier %d is not valid for pointer types: use %p instead.
Pointer arithmetic is only valid within and one past the last element for arrays, or one past the address of the scalar for scalars. You can't read a + 75.
First of all, use %p for printing pointers and %zu for a sizeof result.
That said, check the type of a, it is int **, which is the size of a pointer. And, it depends on the platform / compiler.
Pointer arithmetic honors the data type, so the initial pointer is always incremented based on the LHS data type.

How are pointers stored in memory?

I'm a little confused about this.
On my system, if I do this:
printf("%d", sizeof(int*));
this will just yield 4. Now, the same happens for sizeof(int). Conclusion: if both integers and pointers are 4 bytes, a pointer can be safely "converted" to an int
(i.e. the memory it points to could be stored in an int). However, if I do this:
int* x;
printf("%p", x);
The returned hex address is far beyond the int scope, and thus any attempt to store the value in an int fails obviously.
How is this possible? If the pointer takes 4 bytes of memory, how can it store more than 232?
EDIT:
As suggested by a few users, I'm posting the code and the output:
#include <stdio.h>
int main()
{
printf ("%d\n", sizeof(int));
printf ("%d\n", sizeof(int*));
int *x;
printf ("%d\n", sizeof(x));
printf ("%p\n", x);
}
The output:
4
4
4
0xb7778000
C11, 6.3.2.3, paragraphs 5 and 6:
An integer may be converted to any pointer type. Except as previously specified, the
result is implementation-defined, might not be correctly aligned, might not point to an
entity of the referenced type, and might be a trap representation.
Any pointer type may be converted to an integer type. Except as previously specified, the
result is implementation-defined. If the result cannot be represented in the integer type,
the behavior is undefined. The result need not be in the range of values of any integer
type.
So the conversions are allowed, but the result is implementation defined (or undefined if the result cannot be stored in an integer type). (The "previously specified" is referring to NULL.)
In regards to your print statement for a pointer printing something larger than what 4 bytes of data can represent, this is not true, as 0xb7778000 is within range of a 32 bit integral type.
The returned hex address is far beyond the int scope, and thus any attempt to store the value in an int fails obviously.
4
4
4
0xb7778000
And 0xb7778000 is a 32-bit value, so an object of 4 bytes can hold it.
No, they cannot be "safely" converted. Certainly they use the same amount of storage space, but there is no guarantee that they interpret a number of set bits in the same manner.
As for the second question (and one question per question please), there is no guaranteed size for int, or for a pointer. An int is roughly the optimum size of data transfer on the bus (also known as a word). It can differ on different platforms, but must be relatively (equal or) larger than a short or char. This is why there are standard definitions for MAX_INT, but not a standard "value" for the definition.
A pointer is roughly the number of bits wide as necessary to access a memory location. The old original PC's had a 8 bit bus, but a 12 bit pointer (due to some fancy bit-shifting) to extend it's memory range past its bus size.

C array address confusion

Say we have the following code:
int main(){
int a[3]={1,2,3};
printf(" E: 0x%x\n", a);
printf(" &E[2]: 0x%x\n", &a[2]);
printf("&E[2]-E: 0x%x\n", &a[2] - a);
return 1;
}
When compiled and run the results are follows:
E: 0xbf8231f8
&E[2]: 0xbf823200
&E[2]-E: 0x2
I understand the result of &E[2] which is 8 plus the array's address, since indexed by 2 and of type int (4 bytes on my 32-bit system), but I can't figure out why the last line is 2 instead of 8?
In addition, what type of the last line should be - an integer or an integer pointer?
I wonder if it is the C type system (kinda casting) that make this quirk?
You have to remember what the expression a[2] really means. It is exactly equivalent to *(a+2). So much so, that it is perfectly legal to write 2[a] instead, with identical effect.
For that to work and make sense, pointer arithmetic takes into account the type of the thing pointed at. But that is taken care of behind the scenes. You get to simply use natural offsets into your arrays, and all the details just work out.
The same logic applies to pointer differences, which explains your result of 2.
Under the hood, in your example the index is multiplied by sizeof(int) to get a byte offset which is added to the base address of the array. You expose that detail in your two prints of the addresses.
When subtracting pointers of the same type the result is number of elements and not number of bytes. This is by design so that you can easily index arrays of any type. If you want number of bytes - cast the addresses to char*.
When you increment the pointer by 1 (p+1) then pointer would points to next valid address by adding ( p + sizeof(Type)) bytes to p. (if Type is int then p+sizeof(int))
Similar logic holds good for p-1 also ( of course subtract in this case).
If you just apply those principles here:
In simple terms:
a[2] can be represented as (a+2)
a[2]-a ==> (a+2) - (a) ==> 2
So, behind the scene,
a[2] - a[0]
==> {(a+ (2* sizeof(int)) ) - (a+0) } / sizeof(int)
==> 2 * sizeof(int) / sizeof(int) ==> 2
The line &E[2]-2 is doing pointer subtraction, not integer subtraction. Pointer subtraction (when both pointers point to data of the same type) returns the difference of the addresses in divided by the size of the type they point to. The return value is an int.
To answer your "update" question, once again pointer arithmetic (this time pointer addition) is being performed. It's done this way in C to make it easier to "index" a chunk of contiguous data pointed to by the pointer.
You may be interested in Pointer Arithmetic In C question and answers.
basically, + and - operators take element size into account when used on pointers.
When adding and subtracting pointers in C, you use the size of the data type rather than absolute addresses.
If you have an int pointer and add the number 2 to it, it will advance 2 * sizeof(int). In the same manner, if you subtract two int pointers, you will get the result in units of sizeof(int) rather than the difference of the absolute addresses.
(Having pointers using the size of the data type is quite convenient, so that you for example can simply use p++ instead of having to specify the size of the type every time: p+=sizeof(int).)
Re: "In addtion,what type of the last line should be?An integer,or a integer pointer??"
an integer/number. by the same token that the: Today - April 1 = number. not date
If you want to see the byte difference, you'll have to a type that is 1 byte in size, like this:
printf("&E[2]-E:\t0x%x\n",(char*)(&a[2])-(char*)(&a[0]))

Resources